'2008/05/19'에 해당되는 글 3건
- 2008/05/19
- 2008/05/19
- 2008/05/19
This is written in response to a post by Anne Zelenka.
I've been talking about the coming "semantic graph" for quite some time now, and it seems the meme has suddenly caught on thanks to a recent article by Tim Berners-Lee in which he speaks of an emerging "Giant Global Graph" or "GGG." But if the GGG emerges it may or may not be semantic. For example social networks are NOT semantic today, even though they contain various kinds of links between people and other things.
So what makes a graph "semantic?" How is the semantic graph different from social networks like Facebook for example?
Many people think that the difference between a social graph and a semantic graph is that a semantic graph contains more types of nodes and links. That's potentially true, but not always the case. In fact, you can make a semantic social graph or a non-semantic social graph. The concept of whether a graph is semantic is orthogonal to whether it is social.
A graph is "semantic" if the meaning of the graph is defined and exposed in an open and machine-understandable fashion. In other words, a graph is semantic if the semantics of the graph are part of the graph or at least connected from the graph. This can be accomplished by representing a social graph using RDF and OWL, the languages of the Semantic Web.
Today most social networks are non-semantic, but it is relatively easy to transform them into semantic graphs. A simple way to make any non-semantic social graph into a semantic social graph is to use the FOAF ontology to define the entities and links in the graph.
FOAF stands for "friend of a friend" and is a simple ontology of people and social relationships. If a social network links its data to the FOAF ontology, and exposes these linkages to other applications on the Web, then other applications can understand the meaning of the data in the network in an unambiguous manner. In other words it is now a semantic social graph because its semantics are visible to other applications.
As illustrated by the FOAF example above, one way to make a graph semantic is to use the W3C open standards for the Semantic Web (RDF and OWL) to represent, and define the meaning of, the nodes and links in the graph. By using the Semantic Web, the graph becomes machine-understandable and thus more easily navigated, imported by, searched, and integrated by other applications.
For example, let's say that social network Application A comes along and wants to use the dataset of social network Application B. App A sees the graph of nodes and links in B, and it sees something called a "has team" link connecting various nodes in the graph together. What does that mean? What kinds of things can or cannot be connected with this link? What can be inferred if things are connected this way?
The meaning of "has team" is ambiguous to App A because it's not defined anywhere that the software can see. The only way App A can use App B's data correctly is if the programmer of App A speaks to the programmer of App B (or reads something they wrote such as documentation of some sort) that defines what they meant by the "has team" link.
Only by knowing what was intended by the programmer of App B, can App A treat App B's data appropriately, without any misinterpretation that might lead to mistakes or inconsistencies. This is important because, for example, if a user searches for "Yankees Players" should people who are linked by the "has team" link to sports teams called "Yankees" be returned, or does "has team" mean "a connection from a person to a sports team they support," or does it mean "a connection from a person to a sports team they play on," or does it mean "a connection from a person to a workgroup they participate in?" In short, App A has no idea what to do with data that is linked by App B's "has team" link unless it is explicitly programmed to make use of it.
The OWL language (Web Ontology Language) provides a way for the programmers of App A and App B to define what the links in their graphs mean in an unambiguous and machine-understandable way. So App A just has to look up this definition and it can instantly start to use App B's data correctly, without any new programming or difficult integration.
How is this accomplished? The programmer of App B simply uses OWL to define an ontology of social relationships for their service: for example they define the "has team" link to be a link that connects a person to a sports team they play on. They also define what they mean by a "sports team" (for example, "a group of two or more people that play a sport" and a sport is one of "baseball, basketball, football, soccer, hockey, tennis" and they link these terms to another ontology of sports somewhere else on the Web.) The ontology file that defines App B's data is added to the Website of App B, and linked from it's data, so that other applications can see it.
Now when another application such as App A comes along and looks at App B's data it can reference App B's ontology to see for itself what was intended by the "has team" link -- it can see exactly what that link implies and what can be inferred by it. It understands how to use App B's data set, and how to correctly make new links using that data set which are consistent with the meaning of the links it contains.
This is the real point of the Semantic Web open standards -- RDF enables data to be represented in a database independent manner, and OWL enables the semantic of that data to be defined in an open machine-understandable way so that other applications can use that data without having to first be programmed to do so. As long as they speak RDF/OWL, applications can use any data they find and lookup the meaning of any data they need to use so they can use the data appropriately.
For example, suppose another application, App C, that is OWL-aware application but has never seen App B's data-set before and was not programmed specifically to use it, pulls some data out from App B's API. App C can immediately begin to use this data correctly and consistently with how App B uses it, because all that is necessary for understanding how to use B's data is encoded in the OWL ontology that App B's data refers to.
The point is here that using Semantic Web open standards such as RDF and OWL to encode what data means is a giant leap beyond just putting raw data onto the Web in an open format. It doesn't just put the data itself on the Web, it also puts the definition of what the data means and how to use it, on the Web in an open format. A semantic graph is far more reusable than a non-semantic graph -- it's a graph that carries its own meaning.
The semantic graph is not merely a graph with links to more kinds of things than the social graph. It's a graph of interconnected things that is machine-understandable -- it's meaning or "semantics" is explicitly represented on the Web, just like its data. This is the real way to make social networks open. Merely opening up their API's is just the first step.
Only when the semantics of data is defined and shared in an open way can any graph truly be said to be semantic. Once data around the Web is defined in a machine-understandable way, a whole new world of easy, instant mashups becomes possible. Applications can start to freely and instantly mix and match each other's data, including new data they were not programmed in advance to understand. This opens up the door to the Web truly becoming a giant database and eventually an integrated operating system in which all applications are able to more easily interoperate and share data.
The Giant Global Graph may or may not be a semantic graph. That depends on whether it is implemented with, or at least connected to, W3C standards for the Semantic Web.
I believe that because the Semantic Web makes data-integration easier, it will ultimately be widely adopted. Simply put, applications that wish to access or integrate data in the Age of the Web can more easily do so using RDF and OWL. That alone is reason enough to use these standards.
Of course there are many other benefits as well, such as the ability to do more sophisticated reasoning across the data, but that is less important. Simply making data more accessible, connectable, and reusable across applications would be a huge benefit.
Tim Berners-Lee just posted his thoughts about the importance of Linked Data on the Semantic Web. Linked data support is built-into Twine. All the data in Twine is accessible as open-standard RDF and OWL today and will be accessible to other applications via several API's including SPARQL. You can learn more about Twine's support for Linked Data and see some examples here.
Tim says:
In all this Semantic Web news, though, the proof of the pudding is in the eating. The benefit of the Semantic Web is that data may be re-used in ways unexpected by the original publisher. That is the value added. So when a Semantic Web start-up either feeds data to others who reuse it in interesting ways, or itself uses data produced by others, then we start to see the value of each bit increased through the network effect.
So if you are a VC funder or a journalist and some project is being sold to you as a Semantic Web project, ask how it gets extra re-use of data, by people who would not normally have access to it, or in ways for which it was not originally designed. Does it use standards? Is it available in RDF? Is there a SPARQL server?
Twine provides RDF and supports SPARQL (although while we are in beta we have not opened our SPARQL API yet, but we will...). At the same time Twine also protects privacy by only providing its data according to permissions. Apps can only get Twine data they permission to see such as their own data or their owner's or users's data, data that has been shared with them, or public data in Twine.
Twine is also designed to consume external Linked Data via it's APIs. Twine will be able to consume external RDF and OWL ontologies, as a means to enable other applications and users to extend its functionality and add new data to it.
Earlier this month I had the opportunity to visit, and speak at, the Digital Enterprise Research Institute (DERI), located in Galway, Ireland. My hosts were Stefan Decker, the director of the lab, and John Breslin who is heading the SIOC project.
DERI has become the world's premier research institute for the Semantic Web. Everyone working in the field should know about them, and if you can, you should visit the lab to see what's happening there.
Part of the National University of Ireland, Galway. With over 100 researchers focused solely on the Semantic Web, and very significant financial backing, DERI has, to my knowledge, the highest concentration of Semantic Web expertise on the planet today. Needless to say, I was very impressed with what I saw there. Here is a brief synopsis of some of the projects that I was introduced to:
In summary, my visit to DERI was really eye-opening and impressive. I recommend that major organizations that want to really see the potential of the Semantic Web, and get involved on a research and development level, should consider a relationship with DERI -- they are clearly the leader in the space.