Community:Overloading OWL sameAs

From Odp

Revision as of 23:15, 12 April 2010 by MichaelUschold (Talk | contribs)
Jump to: navigation, search

Overloading OWL sameAs

Title: General Issue: Overloading of owl:sameAs

Description: General Issue: owl:sameAs is being used in the linked data community in a way that is inconsistent with its semantics.

Diagram (this article has no graphical representation)

About

Users MichaelUschold
Domains General
Competency Questions
Scenarios
Proposed Solutions (OWL files)
Related patterns


Additional information

Issue: owl:sameAs is being used in the linked data community in a way that is inconsistent with its semantics.

Source:

Related Discussions:

Related Modeling Issues:

Examples:

Conclusions:

Correct Usages: A common case of a correct use is to link one resource that denotes a particular real world object to another resource that denotes the same object. Typically this will be done in two different collections of resources from two different namespaces. Examples:

  • the DBpedia resource referring to a music album should have a sameAs link to a MusicBrainz resource that denotes that same album
  • the DBpedia resource referring to a company should have a sameAs link to a DailyMed resource referring to that same company
  • an instance of a foaf:Person denoting Tim Berners-Lee should have a sameAs link to the resource in DBLP corresponding to Tim Berners Lee.

Possibly correct usage:

  • the DBpedia resource referring to a place may have a sameAs link to a GeoNames resource corresponding to that same place
    • <http://dbpedia.org/resource/Bangalore> <http://www.w3.org/2002/07/owl#sameAs> <http://sws.geonames.org/1277333/> .
    • Note that the geonames reference is to a web page, not necessarily to an RDF resource.

Probably incorrect usages:

  1. relating a foaf:Person instance to the person's home page. This should be done using the relationship: foaf:homepage, not owl:sameAs.
  2. relating a resource denoting a book to a resource that is the Amazon page for the book.
  3. relating a geographical region with a political entity. For example, the physical area that a city occupies with the city itself.

The first two are similar in that there is a confusion between a resource denoting a real world entity and a web page that contains information about that real world entity.


The books example is interesting because for most intents and purposes, people just want to say: yes, the books we are talking about in each case are the same. So owl:sameAs seems highly appropriate. This likely work much of the time. It could be a problem for someone wanting to analyze an Amazon web page for say its structure and layout. Then the book that the web page is featuring is an attribute of the web page, not the web page itself.


Issue: even if the usage is strictly incorrect, in some cases it may not matter. It depends on the intended use. Using owl:sameAs in a variety of ways with somewhat vague semantics can actually be an advantage.


We can have best of both worlds if we have a more generic relation that is being used the way owl:sameAs is now (vaguely) and having sub-relations that have more specific meanings.


Issue: If we move to a new vocabulary of similarity relationships that includes the current sameAs (as strictly defined) as well as a looser relationship (as sameAs is often used in practice), then there is a communication and migration challenge getting people to use them correctly.

Maybe what is now called sameAs should be renamed to literalSameAs and then the strictly correct uses of sameAs could be changed to literalSameAs, and the loose ones would remain the same.


Ideas / Proposals

More vocabulary needed to cover different cases. The vocabulary should provide clear and formal semantics, in the way that sameAs has, the semantics will be weaker.

Essentially the same thing: The main thing being talked about by one resource is for all non-technical intents and purposes the same thing that is being talked about by the other resource. Examples

  • DBpedia country related to the corresponding GeoNames URI.
  • a person or company related to their home page. There already is a foaf:page relationship for this, so this could be set as a sub-property of EssentiallyTheSameThing
  • A book like, On the Road and various ISBNs for various manifestations of that book.

Use existing properties like rdfs:seeAlso and skos:related to denote similarity, which is what owl:sameAs is often used for.

This may have some merit, but often these have too weak a semantics.


Distinguish a set of core assertions about a resource from ancillary assertions. See: http://dbooth.org/2007/uri-decl/


Blog Summary: http://blog.hubjects.com/2007/07/using-owlsameas-in-linked-data.html

It's been a very long and interesting thread on Linking Open Data forum and elsewhere, about the use and semantics of owl:sameAs. I just suggested the following best practices :

  1. Assertions such as "a:foo owl:sameAs b:bar" should be grounded on some form of agreement of the owners of a:foo and b:bar, on whichever basis they both decide to agree.
  2. For outsiders (owning neither a: or b: domains), such agreement could be shown by the presence of the assertion in symmetrical way in both domains, each domain using its own URI/resource on subject side, and the other's on object side, that is :(a) asserts "a:foo owl:sameAs b:bar"(b) asserts "b:bar owl:sameAs a:foo".
  3. If one side (a) pushes the assertion first, the other side (b) should be at least made aware of it by (a), and is entitled to say she agrees or not : (a) says that "a:foo owl:sameAs b:bar", but as the owner of (b), I do not necessarily agree. Such lack of agreement could be implicitly entailed from the absence of the reciprocal assertion on (b) side.

Granted, from a pure logical viewpoint, those assertions are strictly equivalent since owl:sameAs is a symmetrical property, but from a social/trust viewpoint, having each side declaring it in a specific direction could be interpreted as a formal proof of agreement. It's what have been done e.g. between DBpedia and GeoNames. The title thread shows once again by its sheer length, and if necessary, that there is no universal way to ground such agreement, which belongs to the realm of language and social communication.


Good points raised:

  • Most problems of co-referencing and identity seem to arise from the collapsing of the distinction between entities and information that denotes those entities. Thus the metalevel we need to address is the semiotic one. [Aldo's] proposal is that such metalevel is not necessarily "outside formal semantics", and some work from my group and elsewhere is proceeding in the direction of a reconciliation between a semiotic, social meaning, and the formal encoding of meaning.
  • A URN and a non-URN URI can be linked by owl:sameAs. Example:urn:ietf:rfc:3187 (URN) owl:sameAs http://tools.ietf.org/html/rfc3187.html (URL).
  • if too many things are sameAs to lots of other things, then the amount of intelligent conclusions you can draw from the LOD cloud will be limited. (Martin Hepp)

References

Add a reference

List of Modeling Issues | Post a new modeling issue | Add a comment in the discussion page
Personal tools
Quality Committee
Content OP publishers