Community:Proliferation of URIs, Managing Coreference

From Odp

(Difference between revisions)
Jump to: navigation, search
Line 1: Line 1:
{{TitleDescription Template
{{TitleDescription Template
-
|Title=GI Proliferation of URIs, Managing Coreference
+
|Title=Proliferation of URIs, Managing Coreference
|Description=General Issue: There are some negative consequences to the current proliferation of new URIs being minted for the same things. The issue is how to avoid or manage this.
|Description=General Issue: There are some negative consequences to the current proliferation of new URIs being minted for the same things. The issue is how to avoid or manage this.
}}
}}
Line 15: Line 15:
-
This discussion is taken from a [http://lists.w3.org/Archives/Public/semantic-web/2008May/0078.html thread] on the [http://lists.w3.org/Archives/Public/semantic-web/ W3C Semantic Web Discussion List]. A vigorous discussion tool initially off the list, and then was moved to the list for the record. Here is a summary of the first part.
+
This discussion is taken from a thread called [http://lists.w3.org/Archives/Public/semantic-web/2008May/0078.html Managing Co-reference (Was: A Semantic Elephant?)] on the [http://lists.w3.org/Archives/Public/semantic-web/ W3C Semantic Web Discussion List]. A vigorous discussion tool initially off the list, and then was moved to the list for the record. Here is a summary of the first part.

Revision as of 00:48, 12 April 2010

Proliferation of URIs, Managing Coreference

Title: Proliferation of URIs, Managing Coreference

Description: General Issue: There are some negative consequences to the current proliferation of new URIs being minted for the same things. The issue is how to avoid or manage this.

Diagram (this article has no graphical representation)

About

Users MichaelUschold
Domains General
Competency Questions
Scenarios
Proposed Solutions (OWL files)
Related patterns


Additional information

Issue: There are some negative consequences to the current proliferation of new URIs being minted for the same things. The issue is how to avoid or manage this.


This issue is related to modelling issue: Overloading OWL sameAs


This discussion is taken from a thread called Managing Co-reference (Was: A Semantic Elephant?) on the W3C Semantic Web Discussion List. A vigorous discussion tool initially off the list, and then was moved to the list for the record. Here is a summary of the first part.


In Uschold's original post to selected individuals, it was noted that a proliferation of different URIs for the same resource was occurring, and that it was causing two specific problems:

  1. it is hard to find when two things should be the same
  2. even if you can find the links, prolific use of owl:sameAs will create computational problems.

Below is a summary of the responses.

Chris Bizer:

Problem 1 is not really so bad, for there is much matching technology is out there that can be used, albeit there will be some limits on precision. Problem 2 is not a problem either because noone is going to load everything into a single store.


Frank van Harmelen

Problem 1 is very real, but is only recently becoming a problem with the recent surge of semantic web data coming on line. Frank disagrees with Chris Bizer's optimism. Also, matching at the schema/class level is handled differently than matching instance. Frank refers to some good work going on in addressing these issues, not by matching after the fact, but by elminiting the proliferation at source.

  1. http://sindice.com/
  2. http://www.sindice.com/pdf/sindice-ijmso2008.pdf
  3. http://www.okkam.org/
  4. http://www.okkam.org/IRSW2008

Chris Bizer:

My optimism was more about instance level identity links than at the class level. Within the LOD effort we repeatedly run into situations where it is really easy to generate owl:sameAs links based on some simple domain-dependent rules.


Kinsgley Idehan:

The URL problems are being addressed, e.g. in the UMBEL project. Wikipedia, OpenCye, WordNet and Yago Ideitifiers are being rationalized. See: http://www.umbel.org/announcement.xhtml

Fred Giasson:

There are edge cases when it is not immediately clear, even for a human, to decide what deserves a unique URI.


Jim Hendler:

"So what you are really saying is scaling is a technology/research challenge now that there's much more out there. We need to go beyond just triple stores and get some fast inferencing at Web scales. Makes sense to me."


Michael Uschold:

The computational issue of owl:sameAs proliferation is a major problem, even if noone is going to load all the semantic web data into a single store. For today's triple stores that do limited inference, owl:sameAs "has a significant run time" according to the developers of OpenLink's Virtuoso triplestore. It can easily double query times.

Chris Bizer's remark that there is no need to worry because noone is going to load all the data misses two important facts. First, companies that build and delivering software products using public data will have to bring the data they are using in house to control it. Second, you don't have to load all the data before computational issues arise. Do you really think that, for example, Powerset relies on the data sitting on the DBpedia servers. Proliferation of URIs on a large scale will cause performance issues and should be avoided where possible.


Soren Auer:

Even with such proliferation, people will be able to build useful applications. Once, certain information sources are established (and for that page rank inspired data rank algorithms could be developed) - people will automatically tend to reuse established identifiers and this will counteract the proliferation.


Tim Berners-Lee

So multiple URIs for the same thing is life, a constant tradeoff, but life is, on balance good.

References

Add a reference

List of Modeling Issues | Post a new modeling issue | Add a comment in the discussion page
Personal tools
Quality Committee
Content OP publishers