Title: General Issue: Overloading of owl:sameAs
Diagram (this article has no graphical representation)
Users | MichaelUschold |
---|---|
Domains | General |
Competency Questions | |
Scenarios | |
Proposed Solutions (OWL files) | |
Related patterns |
Issue: owl:sameAs is being used in the linked data community in a way that is inconsistent with its semantics.
Source: Numerous, this issue has been discussed over and over on various lists. The summary so far is mainly based on a discussion that was originally about the proliferation of URIs and managing co-reference, and evolved into a discussion about owl:sameAs per se.
Related Discussions:
Related Modeling Issues:
Examples:
Conclusions:
There is a lot of confusion about how owl:sameAs should be used in the linked open data community. It is being used in ways that are semantically incorrect and can give incorrect inferences. A number of points and suggestions came up.
In July 2007, Bernard Vatant suggested some good practice of mutual linking quoted below:
It's been a very long and interesting thread on Linking Open Data forum and elsewhere, about the use and semantics of owl:sameAs. I just suggested the following best practices :
Granted, from a pure logical viewpoint, those assertions are strictly equivalent since owl:sameAs is a symmetrical property, but from a social/trust viewpoint, having each side declaring it in a specific direction could be interpreted as a formal proof of agreement. It's what have been done e.g. between DBpedia and GeoNames. The title thread shows once again by its sheer length, and if necessary, that there is no universal way to ground such agreement, which belongs to the realm of language and social communication.
In May 2008, after a vigorous discussion, Aldo Gangemi summarized the issues on May 16, 2008 as follows:
On issue (1), it seems that either most people agree, or they tend to prefer a discussion on issue (2), i.e. that when data provision is the intended usage, the referential vagueness introduced by owl:sameAs in many cases is not harmful, but an advantage. As Hugh puts it: "we consider coreference as more knowledge about things, which can be represented in the SW, and can be used by applications if and when they see fit. And as someone said, there is no truth, only opinions. So we need an infrastructure for opinions, but that is the SW."
But at this point, others switch to issue (3), and say (including me) that, if this is the case, it would be better to choose/define a different operator that ensures a safe semantics, instead of relying on an actual identity operator like owl:sameAs. Finally, some nasty :) semioticians subtly suggest that we need some way out of formal semantics, in order to represent a kind of "similarity semantics". As Peter puts it: "Everyone talks about meaning without saying what it is they are trying to achieve by agreeing on a formal meaning".
My position, which I had when proposing the thread, is that we need to use things as efficiently as possible, without creating areas for useless wrong inferences. Another operator would be perfect, but (as Michael, Geoff, Harry require) it should provide the user with some serious features, comparable to what owl:sameAs does. Therefore, we need to talk about similarity, equality, etc. at the metalevel, just as owl:sameAs does, since it is a relation in the logical vocabulary of OWL, not a relation from a specific ontology. Most problems of co-referencing and identity seem to arise from the collapsing of the distinction between entities and information that denotes those entities (as noticed by Bernard, Harry, Aldo), be it dependent on some "meaning" or not. The metalevel we need to address is therefore the semiotic one, as correctly pointed in this discussion.
My proposal is that such metalevel is not necessarily "outside formal semantics", and some work from my group and elsewhere is proceeding in the direction of a reconciliation between a semiotic, social meaning, and the formal encoding of meaning.
Correct Usages: A common case of a correct use is to link one resource that denotes a particular real world object to another resource that denotes the same object. Typically this will be done in two different collections of resources from two different namespaces. Examples:
Possibly correct usage:
Probably incorrect usages:
The first two are similar in that there is a confusion between a resource denoting a real world entity and a web page that contains information about that real world entity.
The books example is interesting because for most intents and purposes, people just want to say: yes, the books we are talking about in each case are the same. So owl:sameAs seems highly appropriate. This likely work much of the time. It could be a problem for someone wanting to analyze an Amazon web page for say its structure and layout. Then the book that the web page is featuring is an attribute of the web page, not the web page itself.
Issue: even if the usage is strictly incorrect, in some cases it may not matter. It depends on the intended use. Using owl:sameAs in a variety of ways with somewhat vague semantics can actually be an advantage.
We can have best of both worlds if we have a more generic relation that is being used the way owl:sameAs is now (vaguely) and having sub-relations that have more specific meanings.
Issue: If we move to a new vocabulary of similarity relationships that inludes the curent sameAs (as strictly defined) as well as a looser relationship (as sameAs is often used in practice), then there is a communication and migration challenge getting people to use them correctly.
Maybe what is now called “sameAs: should be renamed to “literalSameAs†and then the strictly correct uses of sameAs could be changed to literalSameAs, and the loose ones would remain the same.
Ideas / Proposals
More vocabulary needed to cover different cases. The vocabulary should provide clear and formal semantics, in the way that sameAs has, the semantics will be weaker.
Essentially the same thing: The main thing being talked about by one resource is for all non-technical intents and purposes the same thing that is being talked about by the other resource.Examples
Use existing properties like rdfs: seeAlso and skos:related to denote similarity, which is what owl:sameAs is often used for.
This may have some merit, but often these have too weak a semantics.
Distinguish a set of core assertions about a resource from ancillary assertions. See: http://dbooth.org/2007/uri-decl/
Good points raised: