Community:Versioning and URIs

From Odp

Revision as of 21:33, 25 February 2010 by MichaelUschold (Talk | contribs)
Jump to: navigation, search

Versioning and URIs

Title: General Issue: Versioning and URIs

Description: General Issue: When and whether to make new URIs for different versions of things.


About

Users MichaelUschold
Domains General
Competency Questions
Scenarios
Proposed Solutions (OWL files)
Related patterns


Issue: When and whether to make new URIs for different versions of things.

Example 1: WordNet

The WordNet maintainers published a new version in OWL format and used a new namespace which contained the new version number in it. This means that every single resource in Wordnet now has a different URI, even though many are exactly the same as before.

Example 2: SKOS

In 2008, the SKOS working group is updated the SKOS vocabulary/ontology. They have decided that two terms have the wrong semantics, and the new version will reflect the correct semantics. The majority of the terms are unchanged. They chose not to mint all new URIs, but rather to use the same URI for terms with different semantics.

Example 3: Open Biometical Ontologies

There is an explicit policy of not changing URIs as new versions of the ontology are released - for one thing that would be impractical – some of them are updated daily. Rather there is a policy on deprecation - terms that are deprecated are marked as such and kept in the ontology so as not to leave dangling pointers. (Alan Ruttenberg)


Continuous vs. Discrete Ontology Evolution

Examples 1 and 2 relate to the case where an ontology is undergoing discrete evolution - i.e. a specific effort is undertaken to update the ontology resulting in a new version. Example 3 relates to a the case where an ontology is undergoing (more or less) continuous evolution - i.e. it is changing all the time, and no attempt is made to create and name a new version of the ontology, per se. These cases may be sufficiently different to warrant different approaches.


Approaches:

  1. Mint new URIs for all terms, even when the semantics does not change (Example 1:Wordnet)
  2. Keep the URIs the same but change the semantics (Example 2: SKOS)
  3. Do not change URIs as new versions of the ontology are released. Depracate old terms and leave them in the ontology marked as such. (Example 3: Open Biomedical Ontologies)

There is another alternative in principle:

  1. Mint new URIs for all and only the terms whose semantics changed. In the case of SKOS, this would mean new URIs for the small number of terms that changed. The ontology would presumably also get a new URI because it changed. (No examples).


Tradeoffs:

Option 1 has the apparent advantage of starting with a clean slate, and would be attractive to anyone using the ontology for the first time. However it causes a proliferation of URIs, with consequent maintenance and performance problems. Maintenance becomes problematic because the developer of the ontology-based applications needs to carefully ensure things are in order. If they are, then owl:sameAs can be used to link the old and new URIs, which is a performance problem.

Option 2 is also attractive to anyone who is using SKOS for the first time. However it breaks a trust in the use of a URI as a unique resource. It creates a new beast with a new semantics which should have a new name. This runs the risk of breaking applications that rely on that trust.

Option 3: is attractive because it retains backwards compatibility, but does so at the cost of extra infrastructural support.

Option 4 is attractive to anyone who wants to do things that are 'semantically proper'. This was verified by an informal poll at ISWC-2008. I asked Aldo Gangemi, Sean Bechofer, Ian Horrocks, Jeff Pan, Peter Mika, Ora Lassila, Terri Payne, Martin Hepp, Fausto G. One of these polled nevertheless preferred option 2, believing that some practical concerns trumped the semantic ones. This option is unattractive because it seems messy to have one ontology with two namespaces. It could lead in the future to arbitrarily many namespaces for the same ontology, as it evolves over time. This could all be solved with proper change management infrastructure, as yet to be invented.


Root Causes

Basically, people often know what the right thing to do is, but it is often too easy and expedient to do the wrong thing. This is exacerbated by the lack of necessary infrastructure to do the right thing. There are some key root causes.

  1. lack of a good technological infrastructure for ontology maintenance
  2. lack of a good technological infrastructure for change management
  3. overloading of URIs


Maintenance: Currently, URIs are not sacred, they can change at the whims of their authors/maintainers. The only way to be sure is to mint and manage your own URIs and link them to other ones as a courtesy. This leads to a proliferation of URIs.

Change management: Initially, the W3C consciously (and probably sensibly) decided not to address the change management and versioning issue. A stop gap solution using annotation properties was provided instead. This is not going to be solved in the short term.

URI overloading: Take a simple example from Wordnet. The following URL is being use for the many different things: http://wordnet.princeton.edu/wordnet/v2.2/pickle. This is being used to denote the authoring organization, the name of the ontology, the version of both the ontology and the ontology term, the URL, the name of the term as well as being a unique identifier. Most of these things should be able to change, leaving the URI the same. This being a common way to do things makes it too tempting to do things the 'wrong way'.

Alan Ruttenberg points out that this likely contributed to the SKOS problem. The terms that changed were broader and its inverse. They are no longer transitive. If the URIs were UIDs only, then you could have simply changed the label on the old broader to broader transitive and moved on. This is a key reason for a movement to use opaque numeric UIDs for concepts in the Open Biomedical Ontologies.

Personal tools
Quality Committee
Content OP publishers