'시멘틱 웹'에 해당되는 글 24건
- 2008/07/05
- 2008/06/17
- 2008/06/05
- 2008/06/02
- 2008/06/02
- 2008/05/27
- 2008/05/19
- 2008/05/19
- 2008/05/19
- 2008/05/18

What can the plan be with Microsoft's purchase of hot startup Powerset? The 3-year old company, founded by Dr Barney Pell, recently launched a semantic search experience for Wikipedia.
It is doubtful that Microsoft bought the company just to enhance Live Search. Possibly the plan is to replicate the Wikipedia solution, then incorporate Powerset into Internet Explorer. In this post we look at what the thinking behind the acquisition might be.
Most initial reviews found the Powerset product release underwhelming. Critics appreciated the innovative semantic UI and recognized its potential, but believed it didn't vastly improve Wikipedia. So in view of the lukewarm reviews, the acquisition by Microsoft was unexpected. The 100M price tag is around 5x the 12M Series A + 8M investment put into the company. Microsoft execs must believe Powerset can be a weapon in its battle with Google.
Given a set of unstructured information, Powerset applies Natural Language Processing techniques to extract concepts and the key semantic concepts out of the text. It then builds a semantic index (similar to Google's) as well as a conceptual graph of relationships between entities. This graph is typically expressed in RDF triples.

One of the Powerset innovations is surfacing of semantics to the user interface. The contextual gadget is overlaid to help navigate the unstructured information.
Many thought Powerset to be a generic semantic search engine, but its first product is limited to Wikipedia. It is not trivial to scale the technology to the entire web.
When semantic technologies emerged a few years ago, people started talking about how semantic web and/or semantic search might be a Google killer. The talk was supported by logic that semantic search can deliver more relevant results because it "knows" the content.
Industry realizes that isn't the case. Semantic search has no huge advantage over the statistical approach used by Google. We discussed this in the post Semantic Search - Myth and Reality.
What is powerful about Powerset? Precisely that it doesn't try to search the web as a whole. Right now, the solution works on Wikipedia, but the infrastructure is generic, so any other site could also be enhanced. The contextual outline developed can be used to navigate any content.
Instead of dealing with the whole web, the idea may be firstly to build solutions for specific sites.

Powerset as it is today is no Google killer. At this point only something with huge traction and momentum would stand a chance.
In the search market, Google has a strong hold - potentially stronger if the Yahoo deal goes through. People are conditioned to Google: it's simple and, yes, imperfect, but it's good enough and the results are still better than Live Search.
If Microsoft bought Powerset with the goal to incorporate it into Live Search, then it's likely to be another acquisition to make little impact on the bottom line. In fact, the announcement on the Live Search blog states just that. The number one reason is acquiring talent; the second is the belief that NLP and semantic algorithms will be able to patch holes in today's search.
Today Powerset brings only interesting technology; it doesn't bring traction. So what were they thinking up in Redmond? There may be more subtle play, leveraging the fact Powerset works well on knowledge sets like Wikipedia.
Possibly Microsoft plans to deploy Powerset across its own sites, then perhaps incorporate Powerset into Internet Explorer.
Imagine going to Wikipedia and having a semantic overlay on each page. Now imagine scaling this experience across major information sources around the web.
Providing contextual, semantic experience allows Microsoft to retain eyes longer, shaving off the time people spend searching Google.
This is an important point because Google doesn't make money on search - it makes money on advertising.
The real problem Microsoft is seeking to solve is advertising. Until now the web has figured out two fundamentals for advertising - portals and search.
Portals show ads on each page; the more people browse the content, the more ads are shown and the more money is made. The search model emerged as an alternative, now more successful, path to advertising dollars.
With Powerset and other semantic technologies, there's another model: contextual information exploration overlaid on existing content.
If Microsoft can figure how to keep eyes off Google's home page, the game will shift dramatically. The browser is one of Microsoft's most powerful tools - and the default box is Live Search.
If Microsoft wants to win over advertisers, it might just do more with the browser. Incorporating aspects of Powerset's semantic navigator into the browser by default could be a game changer. This is not a straightforward play. A large company with bureaucracy and execution problems is unlikely to be able to merge semantics into the browser quickly and elegantly.
The Powerset acquisition is an interesting move by Microsoft. This hot semantic startup was on everyone's radar.
What can the plan be? It is doubtful that Microsoft bought the company just to enhance Live Search. Possibly the plan is to replicate the Wikipedia solution, then incorporate Powerset into Internet Explorer.
That is a bold play requiring exact execution - not the kind Redmond has shown lately.
What do you think Microsoft is going to do with Powerset? What are the other applications of this technology that you can think of?
The Rensselaer interactive debate on the future of the Web is just starting now and is being webcast live here (note: Silverlight required). You can watch the debate and submit questions - including while it is happening - by clicking here. Update: The debate is now over, see our comprehensive notes below...
Here's a screenshot from Tim Berners-Lee's opening presentation:

The first question is about the Semantic Web -- is it still just a dream? Nova Spivack of Twine says simply: yes! He says that the problem gets harder as more information comes on the web. You need to "disambiguate data". The AI approach puts burden on the software. The Semantic Web approach puts the burden on the data itself, so it's not about making smarter software - but smarter data.
He says there are technical and social challenges. Nova asks, rhetorically: is there an alternative to the Semantic Web, as the Web grows and scales? He says the HAL-9000 AI approach is an alternative, but it is not progressing much. Another solution is to "use the crowd", and he says while this approach shows promise, it doesn't scale to solve the problem. So both machine and human approaches won't scale -- hence the Semantic Web is the solution. He says "it's a huge cultural project" and is a long term goal.

Deborah McGuinness from Rensselaer introduces the debate, including a mention of ReadWriteWeb as part of the debate's "social media twist".
Nigel Shadbolt is a Professor of Artificial Intelligence and Deputy Head of the School of Electronics and Computer Science at the University of Southampton, UK. He kind of defends AI, although he says we can expect "a very different form of AI" to what has gotten most of the press thus far. He says that we can expect "augmented collective intelligence". He says "fragments of micro-intelligence" will evolve into an ecosystem, so AI will contribute to the future of the Web in that way.
The next question is about a multi-lingual Web. Wendy Hall is a Professor of Computer Science in the School of Electronics and Computer Science at the University of Southampton, UK. She talks about the Chinese Web, which has 5 billion pages that Google doesn't index. She says that soon the dominant language on the Web will be Chinese, and much of that data will be via mobile phones. She says that a lot of "key sites" in the english language Web are not accessible in China, so the role of government will in a big way determine what will happen. She says that educating government all over the world will be important. The Web is already fragmented, but the solution isn't to teach everyone english. So the Web Science Lab has been established in China and progress is being made there.
Nigel Shadbolt points out that different cultures frame information quite differently. So he says there must be multiple ways to represent content on the Web. Nova says that each culture has to map their content to a conceptual representation, and then globally we need to map different culture representations together.
Tim Berners-Lee says that sometimes that will work and sometimes it won't. He says "the diversity of culture is really important" and he lists a kind of gradation of content that we can understand -- e.g. 10% of Web content everybody can understand, x% we can't understand, etc.
A question from the audience about democracy. Wendy replies that the Web does have the potential to change the way we select politicians to represent us. So she says it's been interesting to watch the US presidential election - Obama has been using new technology and could bring young people into the process. So the Web brings a wider representation of people to be involved in democracy. She says it has the potential to "dramatically change" the way we select governments, mentioning the self-organizing of Wikipedia. "You can well imagine something which can shift what we do" in democracy.
Nigel chimes in that the Web can both mobilize democracy, but also there is danger of "cyber vigilantism". He mentions the dangers of a "rampant blogosphere".
Nova also mentions the "threat to democracy" of the Web - he says "freedom is actually at risk". He notes data logging and privacy issues. So he says it's too early to tell how the Web will affect democracy - he says the Web was built on trust, but that nowadays the Web is being mis-used in some ways. So we need to be worried about that - encryption, privacy, etc are things to look out for.
There is a question about ontologies, but (ironically) the answer got rather complicated and so I lost the thread :-)
Next an audience member asks: what's the difference between Web Science and Computer Science? Wendy replies that Web Science is inter-disciplinary and it's designed to get more people than just computer geeks into studying this domain.
The next, rather rambling, question from the audience is addressed to Nova and is about data. The question (when it eventually comes) is: what is the Semantic Web when it comes to different kinds of data (multi-modal, subtleties, nuances). Nova's first response to this involved question: whoa! He then says that we have to start with the simple cases. He talks about time and calendars, semantic representation of events, etc. But he says the Semantic Web won't capture the nuances of human interaction any time soon. He says "we're not trying to replace human intelligence", but "free up" human intelligence.
Nigel, coming from AI angle, says that behavior is a key part of the puzzle. He says understanding behavior, on a Web scale, is happening. He talks about sensors - where sensors report on data in our environment. 'Ambiently intelligent environments' is his term for this.
Tim's response: "an ontology does not represent the same thing as a haiku does." The audience laughs appreciatively. He interprets that as mening there are different languages for different things.
The next question talks about "an innovation in logic". For example "visual logics" will need innovation if the Web is to understand multi-modal information. Tim's response is that logic has given us formalized reasoning, but it doesn't describe how people think. So he doesn't believe we need to formalize some kind of data as logic (e.g. the way people dance).
Nova says that we don't have an equivalent of a functional MRI for the Web, in other words there aren't enough ways to measure things on the Web. He says you need sensors and math to do measurements, rather than logic.
The final question is about data: how do you validate it, and if it's to be shared, how to you manage privacy etc. Nigel points to Wikipedia as a way to validate information, self-correcting by people. Nova says reasoning is where the Semantic Web can help validate data, as well as the human self-correcting (e.g. wikipedia).
And that ends the debate. Feel free to contribute comments below!
| 참조할 페이지 : http://nanet.empas.com/search/nanet_detail.html?vt=A&i=621070846&sn=KDMT1200716684&q=&q2= 제공 : 국회도서관 논문명/저자명 : 지능형 이미지 검색 시스템을 위한 추론 기반의 웹 온톨로지 구축 / 김수경 |
|
표제지 목차 국문요약 12 I. 서론 14 1.1 연구 필요성 및 목적 14 1.2 연구 내용 및 방법 19 1.3 논문 구성 20 II. 관련 연구 및 기술 21 2.1 온톨로지와 시맨틱 웹 21 2.1.1 온톨로지의 정의 21 2.1.2 온톨로지 기능 23 2.1.3 온톨로지의 종류 24 2.1.4 시맨틱 웹과 웹 온톨로지 25 2.1.5 온톨로지 개발 툴 27 (1) 프로떼제 28 (2) Composer 29 2.2 온톨로지 표현 언어 32 2.2.1 시맨틱 웹 온톨로지 표현 언어 33 (1) RDF 34 (2) OWL 36 (3) SWRL 38 2.2.2 서술 논리 40 2.3 온톨로지 구축 기법 42 2.3.1 국외 온톨로지 구축 기법 43 (1) Cyc 43 (2) KATUS 44 (3) Gruninger & Fox 방법론 45 (4) METONTOLOGY 46 (5) Ontology Development 101 48 (6) OTKM 49 (7) DOLCE 51 (8) Lifecycle of a Casual Web Ontology Development 52 2.3.2 국내 온톨로지 구축 기법 53 III. 시맨틱 웹에 적합한 웹 온톨로지 구축 기법 56 3.1 온톨로지 구축 기법 분석 56 3.1.1 기존 온톨로지 구축 기법 비교 분석 56 3.1.2 온톨로지 구축을 위한 참조 요소 59 3.2 시맨틱 웹에 적합한 웹 온톨로지 구축 기법 제안 61 3.2.1 웹 온톨로지 구축을 위한 참조 요소 61 3.2.2 웹 온톨로지 구축 기법 제안 64 (1) 단계 1 : 온톨로지 구축 목적 설정 64 (2) 단계 2 : 온톨로지 전체 구조 설정 67 (3) 단계 3 : 온톨로지 정보 확보 및 분석 단계 68 (4) 단계 4 : 온톨로지 내부 구조 설계 단계 70 (5) 단계 5 : 온톨로지 생성과 편집 단계 73 (6) 단계 6 : 온톨로지 유지 보수 73 3.2.3 웹 온톨로지 특징에 따른 온톨로지 구축 기법 비교 75 IV. 제안된 기법을 이용한 웹 온톨로지 구축 78 4.1 온톨로지 설계 78 4.1.1 온톨로지 구축 목적 설정 78 (1) 온톨로지 구축 대상 선택-주제 선택 79 (2) 온톨로지 구축 범위 결정 80 4.1.2 온톨로지 전체 구조 설정 82 4.1.3 온톨로지 정보 확보 및 분석 83 (1) 동물 분류 온톨로지 84 (2) 양 온톨로지 86 (3) 용어 온톨로지 90 (4) 이미지 정보 프레임 온톨로지 95 4.1.4 온톨로지 내부 구조 설계 96 (1) 동물 분류 온톨로지 내부 구조 설계 96 (2) 양 온톨로지 내부 구조 설계 98 (3) 용어 온톨로지 내부 구조 설계 101 (4) 이미지 정보 프레임 온톨로지 내부 구조 설계 104 4.2 도메인 온톨로지 생성과 검증 108 4.2.1 개별 도메인 온톨로지 생성과 편집 108 (1) 온톨로지 개발 툴과 물리적 환경 108 (2) 동물 분류 온톨로지 111 (3) 양 온톨로지 생성 113 (4) 용어 온톨로지 생성 116 (5) 이미지 정보 프레임 온톨로지 생성 117 4.2.2 개별 도메인 온톨로지 검증 121 (1) 동물 분류 온톨로지의 검증 122 (2) 양 온톨로지의 검증 124 (3) 용어 온톨로지 검증 124 (4) 이미지 정보 프레임 온톨로지 검증 125 (5) 일관성 검사를 통한 온톨로지들의 검증 126 4.2.3 SWRL 기반의 임의 규칙 추론 검증 126 (1) CQ 목록 정의 127 (2) 인간 가독 문법 정의 127 (3) 규칙 언어 정의 128 V. 실험 및 성능 평가 130 5.1 실험 시스템의 목적 및 구조 130 5.1.1 실험 시스템 시나리오 131 5.1.2 실험 시스템 구조 132 5.1.3 이미지 주석과 이미지 파일 연결 133 5.2 실험 시스템 환경 및 구현 134 5.2.1 지능형 이미지 검색 시스템 세부 구조 135 5.2.2 이미지 주석 정보와 이미지 파일 등록 136 5.2.3 이미지 내용 검색 138 5.2.4 용어 온톨로지 인스턴스 등록 140 5.2.5 이미지 지식 기반 온톨로지 조회 141 5.3 실험 시스템 성능 비교 평가 143 5.3.1 측정 요소 143 5.3.2 비교 대상 시스템 개요 145 5.3.3 비교 검색 실험과 분석 148 VI. 결론 154 VII. 참고문헌 156 Abstract 162 |
|
This is written in response to a post by Anne Zelenka.
I've been talking about the coming "semantic graph" for quite some time now, and it seems the meme has suddenly caught on thanks to a recent article by Tim Berners-Lee in which he speaks of an emerging "Giant Global Graph" or "GGG." But if the GGG emerges it may or may not be semantic. For example social networks are NOT semantic today, even though they contain various kinds of links between people and other things.
So what makes a graph "semantic?" How is the semantic graph different from social networks like Facebook for example?
Many people think that the difference between a social graph and a semantic graph is that a semantic graph contains more types of nodes and links. That's potentially true, but not always the case. In fact, you can make a semantic social graph or a non-semantic social graph. The concept of whether a graph is semantic is orthogonal to whether it is social.
A graph is "semantic" if the meaning of the graph is defined and exposed in an open and machine-understandable fashion. In other words, a graph is semantic if the semantics of the graph are part of the graph or at least connected from the graph. This can be accomplished by representing a social graph using RDF and OWL, the languages of the Semantic Web.
Today most social networks are non-semantic, but it is relatively easy to transform them into semantic graphs. A simple way to make any non-semantic social graph into a semantic social graph is to use the FOAF ontology to define the entities and links in the graph.
FOAF stands for "friend of a friend" and is a simple ontology of people and social relationships. If a social network links its data to the FOAF ontology, and exposes these linkages to other applications on the Web, then other applications can understand the meaning of the data in the network in an unambiguous manner. In other words it is now a semantic social graph because its semantics are visible to other applications.
As illustrated by the FOAF example above, one way to make a graph semantic is to use the W3C open standards for the Semantic Web (RDF and OWL) to represent, and define the meaning of, the nodes and links in the graph. By using the Semantic Web, the graph becomes machine-understandable and thus more easily navigated, imported by, searched, and integrated by other applications.
For example, let's say that social network Application A comes along and wants to use the dataset of social network Application B. App A sees the graph of nodes and links in B, and it sees something called a "has team" link connecting various nodes in the graph together. What does that mean? What kinds of things can or cannot be connected with this link? What can be inferred if things are connected this way?
The meaning of "has team" is ambiguous to App A because it's not defined anywhere that the software can see. The only way App A can use App B's data correctly is if the programmer of App A speaks to the programmer of App B (or reads something they wrote such as documentation of some sort) that defines what they meant by the "has team" link.
Only by knowing what was intended by the programmer of App B, can App A treat App B's data appropriately, without any misinterpretation that might lead to mistakes or inconsistencies. This is important because, for example, if a user searches for "Yankees Players" should people who are linked by the "has team" link to sports teams called "Yankees" be returned, or does "has team" mean "a connection from a person to a sports team they support," or does it mean "a connection from a person to a sports team they play on," or does it mean "a connection from a person to a workgroup they participate in?" In short, App A has no idea what to do with data that is linked by App B's "has team" link unless it is explicitly programmed to make use of it.
The OWL language (Web Ontology Language) provides a way for the programmers of App A and App B to define what the links in their graphs mean in an unambiguous and machine-understandable way. So App A just has to look up this definition and it can instantly start to use App B's data correctly, without any new programming or difficult integration.
How is this accomplished? The programmer of App B simply uses OWL to define an ontology of social relationships for their service: for example they define the "has team" link to be a link that connects a person to a sports team they play on. They also define what they mean by a "sports team" (for example, "a group of two or more people that play a sport" and a sport is one of "baseball, basketball, football, soccer, hockey, tennis" and they link these terms to another ontology of sports somewhere else on the Web.) The ontology file that defines App B's data is added to the Website of App B, and linked from it's data, so that other applications can see it.
Now when another application such as App A comes along and looks at App B's data it can reference App B's ontology to see for itself what was intended by the "has team" link -- it can see exactly what that link implies and what can be inferred by it. It understands how to use App B's data set, and how to correctly make new links using that data set which are consistent with the meaning of the links it contains.
This is the real point of the Semantic Web open standards -- RDF enables data to be represented in a database independent manner, and OWL enables the semantic of that data to be defined in an open machine-understandable way so that other applications can use that data without having to first be programmed to do so. As long as they speak RDF/OWL, applications can use any data they find and lookup the meaning of any data they need to use so they can use the data appropriately.
For example, suppose another application, App C, that is OWL-aware application but has never seen App B's data-set before and was not programmed specifically to use it, pulls some data out from App B's API. App C can immediately begin to use this data correctly and consistently with how App B uses it, because all that is necessary for understanding how to use B's data is encoded in the OWL ontology that App B's data refers to.
The point is here that using Semantic Web open standards such as RDF and OWL to encode what data means is a giant leap beyond just putting raw data onto the Web in an open format. It doesn't just put the data itself on the Web, it also puts the definition of what the data means and how to use it, on the Web in an open format. A semantic graph is far more reusable than a non-semantic graph -- it's a graph that carries its own meaning.
The semantic graph is not merely a graph with links to more kinds of things than the social graph. It's a graph of interconnected things that is machine-understandable -- it's meaning or "semantics" is explicitly represented on the Web, just like its data. This is the real way to make social networks open. Merely opening up their API's is just the first step.
Only when the semantics of data is defined and shared in an open way can any graph truly be said to be semantic. Once data around the Web is defined in a machine-understandable way, a whole new world of easy, instant mashups becomes possible. Applications can start to freely and instantly mix and match each other's data, including new data they were not programmed in advance to understand. This opens up the door to the Web truly becoming a giant database and eventually an integrated operating system in which all applications are able to more easily interoperate and share data.
The Giant Global Graph may or may not be a semantic graph. That depends on whether it is implemented with, or at least connected to, W3C standards for the Semantic Web.
I believe that because the Semantic Web makes data-integration easier, it will ultimately be widely adopted. Simply put, applications that wish to access or integrate data in the Age of the Web can more easily do so using RDF and OWL. That alone is reason enough to use these standards.
Of course there are many other benefits as well, such as the ability to do more sophisticated reasoning across the data, but that is less important. Simply making data more accessible, connectable, and reusable across applications would be a huge benefit.
Tim Berners-Lee just posted his thoughts about the importance of Linked Data on the Semantic Web. Linked data support is built-into Twine. All the data in Twine is accessible as open-standard RDF and OWL today and will be accessible to other applications via several API's including SPARQL. You can learn more about Twine's support for Linked Data and see some examples here.
Tim says:
In all this Semantic Web news, though, the proof of the pudding is in the eating. The benefit of the Semantic Web is that data may be re-used in ways unexpected by the original publisher. That is the value added. So when a Semantic Web start-up either feeds data to others who reuse it in interesting ways, or itself uses data produced by others, then we start to see the value of each bit increased through the network effect.
So if you are a VC funder or a journalist and some project is being sold to you as a Semantic Web project, ask how it gets extra re-use of data, by people who would not normally have access to it, or in ways for which it was not originally designed. Does it use standards? Is it available in RDF? Is there a SPARQL server?
Twine provides RDF and supports SPARQL (although while we are in beta we have not opened our SPARQL API yet, but we will...). At the same time Twine also protects privacy by only providing its data according to permissions. Apps can only get Twine data they permission to see such as their own data or their owner's or users's data, data that has been shared with them, or public data in Twine.
Twine is also designed to consume external Linked Data via it's APIs. Twine will be able to consume external RDF and OWL ontologies, as a means to enable other applications and users to extend its functionality and add new data to it.
Earlier this month I had the opportunity to visit, and speak at, the Digital Enterprise Research Institute (DERI), located in Galway, Ireland. My hosts were Stefan Decker, the director of the lab, and John Breslin who is heading the