Community:Overloading OWL sameAs

From Odp

Jump to: navigation, search

Overloading OWL sameAs

Title: General Issue: Overloading of owl:sameAs

Description: General Issue: owl:sameAs is being used in the linked data community in a way that is inconsistent with its semantics.

Diagram (this article has no graphical representation)

About

Users MichaelUschold
Domains General
Competency Questions
Scenarios
Proposed Solutions (OWL files)
Related patterns


Additional information

Concise Summary

Issue: owl:sameAs is being used in the linked data community in a way that is inconsistent with its semantics.

Source: Numerous, this issue has been discussed over and over on various lists. The summary so far is mainly based on a discussion that was originally about the proliferation of URIs and managing co-reference, and evolved into a discussion about owl:sameAs per se.

Related Discussions:

Related Modeling Issues:

Examples:

  • relating a foaf:Person instance to the person's home page.
  • relating a geographical region with a political entity. For example, the physical area that a city occupies with the city itself.
  • relating the DBpedia resource referring to a place with to a GeoNames resource corresponding to that same place

Conclusions:

There is a lot of confusion about how owl:sameAs should be used in the linked open data community. It is being used in ways that are semantically incorrect and can give incorrect inferences. A number of points and suggestions came up.

  1. There is frequent tendency to use sameAs to link resources that provide information about something to resources that represent the thing. E.g. relating a resource denoting a book to a resource that is the Amazon page for the book.
  2. There is a tradeoff between formal accuracy on the one hand and pragmatic usefulness on the other hand. It often arises that treating things as the same has the desired behavior. Rather than being harmful, the vagueness can be an advantage.
  3. It was proposed that a weaker similarity relationship be created to be used instead of sameAs when there is not true identity between the two resources. Some argued that there already are alternatives, e.g. skos:related and rdfs:seeAlso
  4. Arguments were given pro and con, as to whether the new relationship should have a formal semantics. One proposal creates a mechanism that removes it from the logic entirely See: Managing URI Synonymity to Enable Consistent Reference on the Semantic Web. If the formal semantics is important, should the similarity relation
    1. be a relation in the logical vocabulary of OWL, as sameAs is? -or-
    2. be just a relation in an ontology?
  5. Having too many ways to specify similarity might be confusing and hinder uptake of the technology.
  6. A suggestion was made to have owl:sameAs links made in separate files so that they can easily be excluded.
  7. A suggestion was made that there be specific guidelines and practices between owners of data in how they reach agreement on what should be linked. See: Bernard Vatant suggested some good practice of mutual linking


Background

In May 2008, Michael Uschold kicked off a discussion with the subject "A Semantic Elephant" describing the unnecessary and costly proliferation of URIs and owl:sameAs links. This discussion evolved to be mostly about managing co-reference. Intitialy private, this discussion was moved to the W3C Semantic Web Discussion List at Tim Berners-Lee's request. See: Managing Co-reference (Was: A Semantic Elephant?).

The original discussion evolved into a discussion about owl:sameAs per se, which has been a recurring topic on various lists over the years. Aldo Gangemi provided a concise summary on May 16, 2008.

In October 2008, Uschold started a closely related discussion focused more on challenges with Versioning and URIs. It was under the subject: URIs and Unique IDs.

From all this, Uschold teased out three distinct but closely related modeling isssues that are in this ODP Wiki:

  1. Overloading OWL sameAs: sameAs is being used in the linked data community in a way that is inconsistent with its semantics
  2. Proliferation of URIs, Managing Coreference: How to avoid or manage two negative consequences to the current proliferation of new URIs being minted for the same things. Specifically:
    1. it is hard to find when two things should be the same
    2. even if you can find the links, prolific use of owl:sameAs will create computational problems.
  3. Versioning and URIs: When and whether to make new URIs for different versions of things.


July 2007 Thread

In July 2007, Bernard Vatant suggested some good practice of mutual linking quoted below:

It's been a very long and interesting thread on Linking Open Data forum and elsewhere, about the use and semantics of owl:sameAs. I just suggested the following best practices :

  1. Assertions such as "a:foo owl:sameAs b:bar" should be grounded on some form of agreement of the owners of a:foo and b:bar, on whichever basis they both decide to agree.
  2. For outsiders (owning neither a: or b: domains), such agreement could be shown by the presence of the assertion in symmetrical way in both domains, each domain using its own URI/resource on subject side, and the other's on object side, that is :(a) asserts "a:foo owl:sameAs b:bar"(b) asserts "b:bar owl:sameAs a:foo".
  3. If one side (a) pushes the assertion first, the other side (b) should be at least made aware of it by (a), and is entitled to say she agrees or not : (a) says that "a:foo owl:sameAs b:bar", but as the owner of (b), I do not necessarily agree. Such lack of agreement could be implicitly entailed from the absence of the reciprocal assertion on (b) side.

Granted, from a pure logical viewpoint, those assertions are strictly equivalent since owl:sameAs is a symmetrical property, but from a social/trust viewpoint, having each side declaring it in a specific direction could be interpreted as a formal proof of agreement. It's what have been done e.g. between DBpedia and GeoNames. The title thread shows once again by its sheer length, and if necessary, that there is no universal way to ground such agreement, which belongs to the realm of language and social communication.


May 2008 Thread

In May 2008, after a vigorous discussion, Aldo Gangemi summarized the issues on May 16, 2008 as follows:

  • Issue 1: managing to suggest the rationale of owl:sameAs appropriately, i.e. in a harmless way for future usages (Aldo, Michael)
  • Issue 2: distinguishing "data provision" vs. "representational" usages of owl:sameAs (Yves)
  • Issue 3: need for another operator, e.g. representing equality under a closed set of properties (Geoff, Harry), or some relaxed rdfs:sameAs (Jim)
  • Issue 3a: using another existing relation, such as skos:related or rdfs:seeAlso, but these are either too weak (rdfs:seeAlso), or constrained (skos:related)
  • Issue 4: need for a semiotic grasp over co-reference, maybe outside formal semantics (Bernard, Peter)

On issue (1), it seems that either most people agree, or they tend to prefer a discussion on issue (2), i.e. that when data provision is the intended usage, the referential vagueness introduced by owl:sameAs in many cases is not harmful, but an advantage. As Hugh puts it: "we consider coreference as more knowledge about things, which can be represented in the SW, and can be used by applications if and when they see fit. And as someone said, there is no truth, only opinions. So we need an infrastructure for opinions, but that is the SW."


But at this point, others switch to issue (3), and say (including me) that, if this is the case, it would be better to choose/define a different operator that ensures a safe semantics, instead of relying on an actual identity operator like owl:sameAs. Finally, some nasty :) semioticians subtly suggest that we need some way out of formal semantics, in order to represent a kind of "similarity semantics". As Peter puts it: "Everyone talks about meaning without saying what it is they are trying to achieve by agreeing on a formal meaning".


My position, which I had when proposing the thread, is that we need to use things as efficiently as possible, without creating areas for useless wrong inferences. Another operator would be perfect, but (as Michael, Geoff, Harry require) it should provide the user with some serious features, comparable to what owl:sameAs does. Therefore, we need to talk about similarity, equality, etc. at the metalevel, just as owl:sameAs does, since it is a relation in the logical vocabulary of OWL, not a relation from a specific ontology. Most problems of co-referencing and identity seem to arise from the collapsing of the distinction between entities and information that denotes those entities (as noticed by Bernard, Harry, Aldo), be it dependent on some "meaning" or not. The metalevel we need to address is therefore the semiotic one, as correctly pointed in this discussion.


My proposal is that such metalevel is not necessarily "outside formal semantics", and some work from my group and elsewhere is proceeding in the direction of a reconciliation between a semiotic, social meaning, and the formal encoding of meaning.


Summary and Synthesis

Correct Usages: A common case of a correct use is to link one resource that denotes a particular real world object to another resource that denotes the same object. Typically this will be done in two different collections of resources from two different namespaces. Examples:

  • the DBpedia resource referring to a music album should have a sameAs link to a MusicBrainz resource that denotes that same album
  • the DBpedia resource referring to a company should have a sameAs link to a DailyMed resource referring to that same company
  • an instance of a foaf:Person denoting Tim Berners-Lee should have a sameAs link to the resource in DBLP corresponding to Tim Berners Lee.

Possibly correct usage:

  • the DBpedia resource referring to a place may have a sameAs link to a GeoNames resource corresponding to that same place
    • <http://dbpedia.org/resource/Bangalore> <http://www.w3.org/2002/07/owl#sameAs> <http://sws.geonames.org/1277333/> .
    • Note that the geonames reference is to a web page, not necessarily to an RDF resource.

Probably incorrect usages:

  1. relating a foaf:Person instance to the person's home pageThis should be done using the relationship: foaf:homepage, not owl:sameAs.
  2. relating a resource denoting a book to a resource that is the Amazon page for the book.
  3. relating a geographical region with a political entity. For example, the physical area that a city occupies with the city itself.

The first two are similar in that there is a confusion between a resource denoting a real world entity and a web page that contains information about that real world entity.

The books example is interesting because for most intents and purposes, people just want to say: yes, the books we are talking about in each case are the same. So owl:sameAs seems highly appropriate. This likely work much of the time. It could be a problem for someone wanting to analyze an Amazon web page for say its structure and layout. Then the book that the web page is featuring is an attribute of the web page, not the web page itself.


Issue: even if the usage is strictly incorrect, in some cases it may not matter. It depends on the intended use. Using owl:sameAs in a variety of ways with somewhat vague semantics can actually be an advantage.

We can have best of both worlds if we have a more generic relation that is being used the way owl:sameAs is now (vaguely) and having sub-relations that have more specific meanings.


Issue: If we move to a new vocabulary of similarity relationships that inludes the curent sameAs (as strictly defined) as well as a looser relationship (as sameAs is often used in practice), then there is a communication and migration challenge getting people to use them correctly.

Maybe what is now called “sameAs: should be renamed to “literalSameAs” and then the strictly correct uses of sameAs could be changed to literalSameAs, and the loose ones would remain the same.


Ideas / Proposals

More vocabulary needed to cover different cases. The vocabulary should provide clear and formal semantics, in the way that sameAs has, the semantics will be weaker.

Essentially the same thing: The main thing being talked about by one resource is for all non-technical intents and purposes the same thing that is being talked about by the other resource.Examples

    • DBpedia country related to the corresponding GeoNames URI.
    • a person or company related to their home page. There already is a foaf:page relationship for this, so this could be set as a sub-property of EssentiallyTheSameThing
    • A book like, On the Road and various ISBNs for various manifestations of that book.

Use existing properties like rdfs: seeAlso and skos:related to denote similarity, which is what owl:sameAs is often used for.

This may have some merit, but often these have too weak a semantics.


Distinguish a set of core assertions about a resource from ancillary assertions. See: http://dbooth.org/2007/uri-decl/


Good points raised:

  • Most problems of co-referencing and identity seem to arise from the collapsing of the distinction between entities and information that denotes those entities. Thus the metalevel we need to address is the semiotic one. [Aldo's] proposal is that such metalevel is not necessarily "outside formal semantics", and some work from my group and elsewhere is proceeding in the direction of a reconciliation between a semiotic, social meaning, and the formal encoding of meaning.
  • A URN and a non-URN URI can be linked by owl:sameAs. Example:urn:ietf:rfc:3187 (URN) owl:sameAs http://tools.ietf.org/html/rfc3187.html (URL).
  • if too many things are sameAs to lots of other things, then the amount of intelligent conclusions you can draw from the LOD cloud will be limited. (Martin Hepp)

References

Add a reference

List of Modeling Issues | Post a new modeling issue | Add a comment in the discussion page
Personal tools
Quality Committee
Content OP publishers