'2008/05'에 해당되는 글 9건
- 2008/05/27
- 2008/05/19
- 2008/05/19
- 2008/05/19
- 2008/05/18
- 2008/05/17
- 2008/05/17
- 2008/05/17
- 2008/05/05
This is written in response to a post by Anne Zelenka.
I've been talking about the coming "semantic graph" for quite some time now, and it seems the meme has suddenly caught on thanks to a recent article by Tim Berners-Lee in which he speaks of an emerging "Giant Global Graph" or "GGG." But if the GGG emerges it may or may not be semantic. For example social networks are NOT semantic today, even though they contain various kinds of links between people and other things.
So what makes a graph "semantic?" How is the semantic graph different from social networks like Facebook for example?
Many people think that the difference between a social graph and a semantic graph is that a semantic graph contains more types of nodes and links. That's potentially true, but not always the case. In fact, you can make a semantic social graph or a non-semantic social graph. The concept of whether a graph is semantic is orthogonal to whether it is social.
A graph is "semantic" if the meaning of the graph is defined and exposed in an open and machine-understandable fashion. In other words, a graph is semantic if the semantics of the graph are part of the graph or at least connected from the graph. This can be accomplished by representing a social graph using RDF and OWL, the languages of the Semantic Web.
Today most social networks are non-semantic, but it is relatively easy to transform them into semantic graphs. A simple way to make any non-semantic social graph into a semantic social graph is to use the FOAF ontology to define the entities and links in the graph.
FOAF stands for "friend of a friend" and is a simple ontology of people and social relationships. If a social network links its data to the FOAF ontology, and exposes these linkages to other applications on the Web, then other applications can understand the meaning of the data in the network in an unambiguous manner. In other words it is now a semantic social graph because its semantics are visible to other applications.
As illustrated by the FOAF example above, one way to make a graph semantic is to use the W3C open standards for the Semantic Web (RDF and OWL) to represent, and define the meaning of, the nodes and links in the graph. By using the Semantic Web, the graph becomes machine-understandable and thus more easily navigated, imported by, searched, and integrated by other applications.
For example, let's say that social network Application A comes along and wants to use the dataset of social network Application B. App A sees the graph of nodes and links in B, and it sees something called a "has team" link connecting various nodes in the graph together. What does that mean? What kinds of things can or cannot be connected with this link? What can be inferred if things are connected this way?
The meaning of "has team" is ambiguous to App A because it's not defined anywhere that the software can see. The only way App A can use App B's data correctly is if the programmer of App A speaks to the programmer of App B (or reads something they wrote such as documentation of some sort) that defines what they meant by the "has team" link.
Only by knowing what was intended by the programmer of App B, can App A treat App B's data appropriately, without any misinterpretation that might lead to mistakes or inconsistencies. This is important because, for example, if a user searches for "Yankees Players" should people who are linked by the "has team" link to sports teams called "Yankees" be returned, or does "has team" mean "a connection from a person to a sports team they support," or does it mean "a connection from a person to a sports team they play on," or does it mean "a connection from a person to a workgroup they participate in?" In short, App A has no idea what to do with data that is linked by App B's "has team" link unless it is explicitly programmed to make use of it.
The OWL language (Web Ontology Language) provides a way for the programmers of App A and App B to define what the links in their graphs mean in an unambiguous and machine-understandable way. So App A just has to look up this definition and it can instantly start to use App B's data correctly, without any new programming or difficult integration.
How is this accomplished? The programmer of App B simply uses OWL to define an ontology of social relationships for their service: for example they define the "has team" link to be a link that connects a person to a sports team they play on. They also define what they mean by a "sports team" (for example, "a group of two or more people that play a sport" and a sport is one of "baseball, basketball, football, soccer, hockey, tennis" and they link these terms to another ontology of sports somewhere else on the Web.) The ontology file that defines App B's data is added to the Website of App B, and linked from it's data, so that other applications can see it.
Now when another application such as App A comes along and looks at App B's data it can reference App B's ontology to see for itself what was intended by the "has team" link -- it can see exactly what that link implies and what can be inferred by it. It understands how to use App B's data set, and how to correctly make new links using that data set which are consistent with the meaning of the links it contains.
This is the real point of the Semantic Web open standards -- RDF enables data to be represented in a database independent manner, and OWL enables the semantic of that data to be defined in an open machine-understandable way so that other applications can use that data without having to first be programmed to do so. As long as they speak RDF/OWL, applications can use any data they find and lookup the meaning of any data they need to use so they can use the data appropriately.
For example, suppose another application, App C, that is OWL-aware application but has never seen App B's data-set before and was not programmed specifically to use it, pulls some data out from App B's API. App C can immediately begin to use this data correctly and consistently with how App B uses it, because all that is necessary for understanding how to use B's data is encoded in the OWL ontology that App B's data refers to.
The point is here that using Semantic Web open standards such as RDF and OWL to encode what data means is a giant leap beyond just putting raw data onto the Web in an open format. It doesn't just put the data itself on the Web, it also puts the definition of what the data means and how to use it, on the Web in an open format. A semantic graph is far more reusable than a non-semantic graph -- it's a graph that carries its own meaning.
The semantic graph is not merely a graph with links to more kinds of things than the social graph. It's a graph of interconnected things that is machine-understandable -- it's meaning or "semantics" is explicitly represented on the Web, just like its data. This is the real way to make social networks open. Merely opening up their API's is just the first step.
Only when the semantics of data is defined and shared in an open way can any graph truly be said to be semantic. Once data around the Web is defined in a machine-understandable way, a whole new world of easy, instant mashups becomes possible. Applications can start to freely and instantly mix and match each other's data, including new data they were not programmed in advance to understand. This opens up the door to the Web truly becoming a giant database and eventually an integrated operating system in which all applications are able to more easily interoperate and share data.
The Giant Global Graph may or may not be a semantic graph. That depends on whether it is implemented with, or at least connected to, W3C standards for the Semantic Web.
I believe that because the Semantic Web makes data-integration easier, it will ultimately be widely adopted. Simply put, applications that wish to access or integrate data in the Age of the Web can more easily do so using RDF and OWL. That alone is reason enough to use these standards.
Of course there are many other benefits as well, such as the ability to do more sophisticated reasoning across the data, but that is less important. Simply making data more accessible, connectable, and reusable across applications would be a huge benefit.
Tim Berners-Lee just posted his thoughts about the importance of Linked Data on the Semantic Web. Linked data support is built-into Twine. All the data in Twine is accessible as open-standard RDF and OWL today and will be accessible to other applications via several API's including SPARQL. You can learn more about Twine's support for Linked Data and see some examples here.
Tim says:
In all this Semantic Web news, though, the proof of the pudding is in the eating. The benefit of the Semantic Web is that data may be re-used in ways unexpected by the original publisher. That is the value added. So when a Semantic Web start-up either feeds data to others who reuse it in interesting ways, or itself uses data produced by others, then we start to see the value of each bit increased through the network effect.
So if you are a VC funder or a journalist and some project is being sold to you as a Semantic Web project, ask how it gets extra re-use of data, by people who would not normally have access to it, or in ways for which it was not originally designed. Does it use standards? Is it available in RDF? Is there a SPARQL server?
Twine provides RDF and supports SPARQL (although while we are in beta we have not opened our SPARQL API yet, but we will...). At the same time Twine also protects privacy by only providing its data according to permissions. Apps can only get Twine data they permission to see such as their own data or their owner's or users's data, data that has been shared with them, or public data in Twine.
Twine is also designed to consume external Linked Data via it's APIs. Twine will be able to consume external RDF and OWL ontologies, as a means to enable other applications and users to extend its functionality and add new data to it.
Earlier this month I had the opportunity to visit, and speak at, the Digital Enterprise Research Institute (DERI), located in Galway, Ireland. My hosts were Stefan Decker, the director of the lab, and John Breslin who is heading the SIOC project.
DERI has become the world's premier research institute for the Semantic Web. Everyone working in the field should know about them, and if you can, you should visit the lab to see what's happening there.
Part of the National University of Ireland, Galway. With over 100 researchers focused solely on the Semantic Web, and very significant financial backing, DERI has, to my knowledge, the highest concentration of Semantic Web expertise on the planet today. Needless to say, I was very impressed with what I saw there. Here is a brief synopsis of some of the projects that I was introduced to:
In summary, my visit to DERI was really eye-opening and impressive. I recommend that major organizations that want to really see the potential of the Semantic Web, and get involved on a research and development level, should consider a relationship with DERI -- they are clearly the leader in the space.
Associative Search and the Semantic Web: The Next Step Beyond Natural Language Search
Our present day search engines are a poor match for the way that our brains actually think and search for answers. Our brains search associatively along networks of relationships. We search for things that are related to things we know, and things that are related to those things. Our brains not only search along these networks, they sense when networks intersect, and that is how we find things. I call this associative search, because we search along networks of associations between things.
Human memory -- in other words, human search -- is associative. It works by "homing in" on what we are looking for, rather than finding exact matches. Compare this to the the keyword search that is so popular on the Web today and there are obvious differences. Keyword searching provides a very weak form of "homing in" -- by choosing our keywords carefully we can limit the set of things which match. But the problem is we can only find things which contain those literal keywords.
There is no actual use of associations in keyword search, it is just literal matching to keywords. Our brains on the other hand use a much more sophisticated form of "homing in" on answers. Instead of literal matches, our brains look for things things which are associatively connected to things we remember, in order to find what we are ultimately looking for.
For example, consider the case where you cannot remember someone's name. How do you remember it? Usually we start by trying to remember various facts about that person. By doing this our brains then start networking from those facts to other facts and finally to other memories that they intersect. Ultimately through this process of "free association" or "associative memory" we home in on things which eventually trigger a memory of the person's name.
Both forms of search make use of the intersections of sets, but the associative search model is exponentially more powerful because for every additional search term in your query, an entire network of concepts, and relationships between them, is implied. One additional term can result in an entire network of related queries, and when you begin to intersect the different networks that result from multiple terms in the query, you quickly home in on only those results that make sense. In keyword search on the other hand, each additional search term only provides a linear benefit -- there is no exponential amplification using networks.
Keyword search is a very weak approximation of associative search because there really is no concept of a relationship at all. By entering keywords into a search engine like Google we are simulating an associative search, but without the real power of actual relationships between things to help us. Google does not know how various concepts are related and it doesn't take that into account when helping us find things. Instead, Google just looks for documents that contain exact matches to the terms we are looking for and weights them statistically. It makes some use of relationships between Web pages to rank the results, but it does not actually search along relationships to find new results.
Basically the problem today is that Google does not work the way our brains think. This difference creates an inefficiency for searchers: We have to do the work of translating our associative way of thinking into "keywordese" that is likely to return results we want. Often this requires a bit of trial and error and reiteration of our searches before we get result sets that match our needs.
A recently proposed solution to the problem of "keywordese" is natural language search (or NLP search), such as what is being proposed by companies like Powerset and Hakia. Natural language search engines are slightly closer to the way we actually think because they at least attempt to understand ordinary language instead of requiring keywords. You can ask a question and get answers to that question that make sense.
Natural language search engines are able to understand the language of a query and the language in the result documents in order to make a better match between the question and potential answers. But this is still not true associative search. Although these systems bear a closer resemblance to the way we think, they still do not actually leverage the power of networks -- they are still not as powerful as associative search.
A natural language search can understand the meaning of a query like "books about Harry Potter" and it knows this is not the same as "Books by Harry Potter." But ultimately what is happening is that a linguistic expression is being converted into a more sophisticated keyword search. The language in the query is being mapped to documents that contain text that answers a question, or to data objects that match the thing being asked for. This is certainly better than keyword search but it is still ultimately just a smarter form of literal matching. It is not really making use of associative search along networks of semantic relationships in the data (other than linguistic relationships between words in the query) or any sort of sophisticated reasoning.
By comparison, associative search doesn't merely understand the meaning of the query, it understands and can reason about relationships in the data. This is an important distinction.
An associative search returns documents that represent things that are related, via various forms of associations (semantic links), to the things in the query. An associative search looks through a network of associations for the things that are most connected to the items in the query. By specifying more specific starting points, the set of things which are connected to all those starting points is narrowed. Thus an associative search is an intersection of multiple networks. The items that are most strongly intersected are the results that are most likely to matter.
Associative search is a very different approach to search from keyword search (which merely looks for things with the keywords in them) and natural language search (which merely looks for things that contain content that matches the meaning of the question). It also happens to be more similar to how our brains actually think.
On its own, associative search represents an important advance in the way we search. But by adding some simple reasoning to an associative search it becomes even more powerful. Reasoning adds the ability to generalize or get more specific, and to weight various paths through the network of relationships in more sophisticated ways, such as based on logical relationships or inferences through the network.
A simple example of reasoning is transitivity -- for example, if A is a part of B and B is a part of C, then A is a part of C. If we know that the "part of" relationship is transitive, then whenever we see chains of "part of" links between things we can make transitive inferences. In an associative search these inferences are quite useful. For example, we can search for all the parts of a 747 jet. Using transitive reasoning along networks of relationships we can find all the parts, even those things that are "parts of parts." Similarly we could find "all products of Sony" including products of subsidiaries and business units of Sony. Transitive inferences across transitive links is just one type of reasoning; there are many other variations that are possible, which when combined together become even more useful.
Our current search tools -- whether they are keyword based or natural language based do not support true associative search, let alone reasoning. But we do see associative search starting to appear in a very different breed of application: social networks. A search in LinkedIn for example, is an associative search. Will social networks do an end-run around traditional search engines to provide the next-generation of search? It's quite possible. Facebook and LinkedIn are far better positioned than Google today for associative search. In fact, I would venture that this is how Facebook could give Google some serious competition. But they have to hurry if they are going to do this -- Google has clearly realized the power of "social search" and is rapidly moving to leverage it in their own search results.
Ultimately associative search is more than just social search however. To be really effective, associative search engines need to understand and leverage the full spectrum of relationships between things, not just social relationships. They need to see and understand more types of relationships between more types of things. In order to accomplish this, associative search engines need the Semantic Web.
The Semantic Web provides exactly what is needed to enable associative search, with reasoning, on the Web-at-large. Using RDF and OWL, content can be marked up with metadata that specifies not only its intended meaning and structure, but also the various kinds of semantic relationships it has to other content and to other concepts. In other words, these standards provide a way to add a new network of semantically defined associations to the data on the Web. For example a document about Microsoft can be linked to the concepts "Software Company," "Software Manufacturer," and "Redmond." It can also be linked to a data record that represents "Microsoft" and the properties that define it as a company. The "Microsoft" object can then link to companies that are "suppliers" and "customers" and "competitors" as well as to things which are connected as "products" or "services."
This rich network of relationships between things goes far beyond documents. It contains relationships to people, places, other organizations, products, events, services, etc. It's similar to a social network, but instead of just containing people and social relationships, it contains more types of things and relationships between them. This is really what the Semantic Web enables. One can imagine that as this new semantic data becomes visible on the Web (which is rapidly happening in fact), the power of search will be dramatically improved. Associative search is coming soon to a Web near you!
With that in mind, here is an example of how Semantic Web enabled associative search will work in the future.
PROBLEM: I am trying to remember name of the organizer of a conference I once attended.
WHAT I ALREADY KNOW:
l I know this person and have corresponded with them in the past.
l The conference was related to government and the Internet.
l It took place in a town near
l The organizer of the conference once introduced me to a male celebrity, but I can't remember the celebrity's name.
l I gave a talk at the Conference about Web 3.0.
l My friend, Sue Smith, also spoke at the conference.
l The conference I attended took place in the Spring, but I am not sure if it was last year or two years ago.
In the above example, I cannot remember the specific keywords that will help me generate a query to find the answer. Instead, I remember a number of relationships and generalizations about the answer. Present day search engines cannot see these relationships, and they have no ability to understand a generalization and look at things it contains.
The ability to intersect the sets formed by relationships and generalizations is a fundamental feature of human memory and search. But our present day tools don't have these capabilities. Thus we have to spend time translating our questions into keywordese, rather than just asking our questions in the actual language of human thought.
There are two ways to approach solving this.
The first way is to create artificial intelligence which, given a question in natural language English, can understand it and reason about the question as well as understand and reason about the information in the set of documents being searched, in order to intelligently arrive at candidate answers. This is computationally intensive, and very hard to program. This is why AI hasn't quite happened yet on this scale.
A perhaps easier approach is to use the Semantic Web. In the Semantic Web approach, metadata is embedded into content that describes the meaning of the content, it's various important properties, and its relationships to other concepts. On the basis of this metadata, the problem becomes much simpler to solve. Instead of doing high-level AI it becomes essentially a statistical search.
Now let's look at how using the Semantic Web could help us solve the above problem via an associative search:
Items are connected to more general or specific concepts by virtue of semantic linkages between concepts. For example, the conference I am looking for is related to the concepts "Government" and "Technology." If I can at least remember that then I can find conferences related to government and technology. Furthermore, since the concept "Policy" is a subset of government it may be related to that topic as well.
Likewise, things are connected to things that are "near" them via geographic links. Because the conference was near
The organizer of the conference introduced me to a male celebrity. There are several celebrities in my social network. If the fact that I met certain people via introductions from other people was stored using semantic links, then this too would be searchable. For example, "find all celebrities I was introduced to by my connections" would be a solvable query. Similarly, "find people who introduced me to celebrities" would also be solvable.
The fact that I gave a talk at the conference could also be semantically represented on a data record describing the conference, as well as on my own profile. Thus there could exist a link such as "speaker at" which links me to various conferences I have spoken at. I could then get a list of all the conference I have spoken at. I could also look for all the conferences where both myself and Sue Smith were speakers.
Or, better yet, there could be a link called "Gave talk about" which links me to an instance describing each talk I have given. From such an instance there could then be "Gave talk at" links to all the events where I have given that talk. So I could look up my "Web 3.0" talk and then see all the conferences where I gave that talk.
Temporal relations can also be generalized and semantically represented. For example, the conference I am looking for took place in the spring. Therefore only look for conferences that took place in or near months that are considered to be in the spring season.
By intersecting the results of the above searches we narrow down very precisely to a set of people I might be looking for, or just to a single qualifying person.
For example the answer I was seeking for was that the organizer was named Robert Jones, and the conference was about Government and Technology Policy in Carmel-by-the-Sea last spring.
This result should be easily findable via associative search starting from the above set of things I remember. But if for some reason the answer is still not there, there is another capability which the brain uses that we need to add to our search engines:
1단계 : Perturbation, or what could be called "prospecting."
Perturbation = 섭동 : 수학에서 해가 알려진 다른 비슷한 문제와 비교하여 문제를 푸는 방법. 보통 이 방법으로 구한 해는 근사값일 뿐이다.
The query I entered is comprised of a question and a set of facts related to the answer I am seeking. But there is a possiblity that I asked the question incorrectly, or some of the facts I added were incorrect, or insufficient. Perturbation can correct for this by introducing variations into the question and the facts in order to explore the space of answers that are "near" them as well.
There are many ways to go about adding perturbation to the system -- for example, we can search more than one hop out from every link, or we can search for other types of relationships that are highly correlated with relationships we are asking for explicitly, or we can include results for things which are strongly connected to things that are found.
From a user-interface standpoint perturbation can be controlled with a simple "sliding lever" in the user interface for "Precision." If the user sets very high Precision as a requirement then there is no perturbation -- the results are exact matches to the query and facts. If there is low Precision as a requirement then there can be more perturbation, thus the results are fuzzy and may include things that are near what I asked for but not exactly what I specified, enabling me to discover things via relevant relationships that I could not even remember to mention as facts.
2단계 : Reasoner
Finally, using a reasoner, the results found by the above search can be analyzed such that those results which are most likely to be what I am looking for, given the facts I have included as constraints, are presented first. Reasoning becomes the ranking algorithm in the system, rather than something like Pagerank. The answers that actually make the most sense in the context of my question are delivered first.
The above illustration describes how searches that are powered by the Semantic Web will work, once this technology is widely adopted. This is how the brain works, and how our search engines should work as well.
This is not a pipedream -- in fact it is already happening in research settings and in the government. Within 15 years, if not a lot sooner, we will see these capabilities emerge in consumer-grade search interfaces.
If you're new to developing social applications, it can be difficult to immediately grasp how good applications facilitate fun and meaningful social experiences. To accelerate your learning, we've come up with a list of a few light-hearted recommendations around building good social applications. Not all of these "best practices" are necessary in every case, but they might spark thoughts about finding new users, keeping old ones, and leveraging the social graph for fresh content and viral spread.
Across containers, there's a common tendency for a user to take a chance on an unknown application, and shortly thereafter remove it if no immediate value is found. The lesson to be learned from this interaction is that first impressions really do matter, and it's necessary to engage the user quickly before attention is lost. To this end, we suggest you focus on the 30-second experience; before distracting the user with expert features or sending invites, slow down and give the user a simpler taste of what your application is about. Try the following:
Across OpenSocial containers there can be a lot of variation in the look and feel of pages and profiles. When designing your application, it can help to attempt consistency with the container UI by using similar fonts, tabs and buttons.
In cases where applications strive for stronger identity, it can be good to create a UI look and feel which is slightly distinct but still aesthetically strong to play on a user's tastes and need for self expression.
The profile page in a container is often a representation of a user's identity, interests and tastes. From the perspective of the owner, it's a means for self expression and a starting point for exploring the social graph. From the perspective of viewers, it's a place to learn, communicate, and find shared interests. Applications best take advantage of the profile by enabling self expression through common interests around entertainment, brands and groups. Self expression is also enabled through specific forms of communication like gestures and gifts or conversations around special topics.
Good social applications aren't only static badges of self expression; they dynamically change to provide an interesting experience across sessions. Change can be derived from the social graph as friends interact with the application to change its state. Change can also occur as the application internally generates new content. In both cases, the day-to-day changes can help to keep an application interesting and desired over time.
A particularly easy way to make an application dynamic and social is to record and present the activities of friends who are using the application. This could be thought of as an application-specific activity stream in which the news and updates of friends are always presented in the context of the application itself. From these activities, users become more aware of how others are using the application, driving increased use and change.
Exposing the activities of friends is one method among many for passively browsing the social graph. Users are often interested in low-effort interactions like viewing a friend's most recent activity, comparing content and choices, and indirectly interacting through their own activity. In supporting this style of interactions, it's essential to make it easy to browse what friends are doing. This is often achieved by linking names to a user's container profile or even creating application-specific user profiles which provide an overview of a user's activity and content.
Browsing the graph can also certainly extend beyond just friends. In some circumstances, it can be interesting to see and interact friends-of-friends, especially when drawn together by shared interests. Creating ways for a user to grow his/her social circle adds value to an application from the user's perspective by unearthing opportunities for new friends and content.
Browsing friends' activities and content often flows well into conversation, creating an opportunity to develop deeper social interaction. In places where communication can happen, it's good practice to make the option explicitly available. This can be done in a more persistent, public manner through a comment system or sharing wall. It can also be done in private by linking into a container's messaging, email or instant messaging systems, or even through an internal communication layer like pokes or other simple gestures and messages.
A container's entire social graph is often huge, and even a user's immediate social circle might be too large for a user to easily track. By growing smaller communities and making them accessible, an application can provide rich and interesting functionality that enhances the overall social experience. There are three categories of communities which applications commonly build and utilize:
Self expression and communication are often fun and entertaining alone, but OpenSocial is also a platform that can be leveraged to solve real world tasks where the social graph assists us in making decisions. For example, while some might be prone to grab a book at random off the shelf, there are many who appreciate a good recommendation from a friend. With a variety of possibilities in entertainment and interests, it can be useful to facilitate meetings, purchases, recommendations, information management and learning to create a richer, more lasting experience across your application.
|
Semantic W |