Tutorial | Training:Advanced Ontology Engineering at FOI - 2011 |
---|---|
Title | Ontology Testing |
The task of the exercise is presented at the bottom of this page. First, read through the instructions below.
Solutions:
Instructions for the exercise:
Below you find general descriptions of the three testing methods that you will apply during this exercise, the same as you hear about in the lecture. Read through the descriptions of the methods, then proceed with solving the tasks that are described at the bottom of this page. Try to make sure you have enough time to try out all three methods, i.e. if you don't have time to complete the testing using one method, proceed anyway so that you have time to try all three. Don't forget to document what you are doing while you solve the task!
Parts of the ontology requirements are usually expressed as competency questions (CQs), i.e. natural language questions that the ontology should be able to provide answers to. One way to practically allow the ontology to answer such questions is to reformulate them as SPARQL queries, in order to retrieve the appropriate facts from the knowledge base. This also introduces a way of testing your ontology module!
Assuming that your ontology attempts to solve the two CQs, CQ1 and CQ2. Then each of those should be possible to formulate as a SPARQL query, Q1 and Q2, over the ontology module, such that the retrieved instances and/or literal values constitute the answer to the CQs. Q1 and Q2 could be considered as unit tests for the module, since they assure that the requirements CQ1 and CQ2 are actually met.
This method of testing contains the following steps:
Example
I have built an ontology for the theater domain. In the ontology there is a module modelling plays and theater productions, that realize the CQ "What play is this production based on?", i.e. modelling the distinction between plays (as abstract literary works) and theater productions (where an ensemble has set up a certain play). The module specializes the Information Realization ODP and contains the classes "Play" (as a subclass of InformationObject) and "Production" (as a subclass of InformationRealization), as well as the properties "producesPlay"/"playProducedIn" as subproperties of the pattern properties. To test this using the method described above, I formulate the CQ as the following SPARQL query:
SELECT ?production ?play WHERE { ?production :producesPlay ?play . }
Next, I create a new empty ontology, where I import the module about plays and productions. I add some concrete instances of plays and productions, and relate them through the available properties, for instance triples like merchantOfVeniceProduction producesPlay merchantOfVenice. I then list the "correct answers" that the SPARQL query should return, in the case of the triple above it would be the variable bindings ?production - merchantOfVeniceProduction and ?play - merchantOfVenice. Then I run the query and check that that is actually what I get. Most often you will notice that it is actually at query-writing time you will discover the mistakes, i.e. things that makes it impossible or at least difficult to pose that query, however, it is still important to add instances and actually run the query as well, it gives you empirical evidence that the requirement is actually solved.
Although CQs express what information the ontology should be able to provide, they usually do not say exactly how this information is produced, i.e. if it is input into the ontology explicitly as assertions or how it can be produced from other facts through some inference mechanism. Therefore an ontology can have additional requirements complementing the CQs, or explaining complex CQs, describing some desired inferences that the ontology should support. For instance, in an ontology about people we may define the class "parent" and say that any person who has at least one child is a parent. If we are not expecting the information about being a parent to be explicitly entered into the ontology's knowledge base, we may instead expect it to be derived from the presence of "hasChild" relations. This is a reasoning requirement that requires the ontology to include the appropriate axioms to make this inference.
To test the desired inferences of the ontology module, follow these steps:
Example
I have built an ontology for the theater domain. In the ontology there is a module modeling persons, including their roles as actors and authors of plays. The requirements include a reasoning requirement, saying that I want to be able to classify persons as either authors or actors, based on if on one hand they have written a play or on the other hand have participate in any production. The module contains the classes "Person", with "Author" and "Actor" as subclasses, "Play" and "Production". The property "wrote" relates a person to a play, and the property "playsIn" relates a person to a production. Additionally I have defined Author as equivalent to instances that are persons and wrote some play, and similarly Actor as equivalent to instances that are persons and playsIn some production.
The input needed for this reasoning requirement are triples relating persons to either plays or productions through the "wrote" and "playsIn" properties respectively. The expected output are triples of the form <person-instance> rdf:type :Actor/:Author. I proceed to create a new test case module, i.e. an empty ontology, where I import the module described above. Then I add a number of triples using the "wrote" and "playsIn" properties, e.g. Anne playsIn merchantOfVeniceProduction. I list the expected inferences, i.e. for the example triple it would be the new triple Anne rdf:type Actor. I run the reasoner and check that this triple is actually inferred, and that no other strange things appear as inference results.
Although the ability to perform correct inferences, and provide the resulting information as a result of queries, allow us to see that the ontology actually realizes its requirements, another important characteristic of an ontology is to allow as few erroneous facts and/or inferences as possible. A high-quality ontology allows exactly the desired inferences and queries, while avoiding to produce irrelevant or erroneous side-effects. It may also be desirable to be able to check input data against some constraints and business rules.
This category of testing can be compared to software testing, when usually a system is fed random data, data known to be incorrect, or data considered as "boundary values" for the input data, e.g. the extremes of value ranges, in order to check its robustness and capability to handle unexpected input. When dealing with ontologies, we are not evaluating how good error messages and recovery strategies are, as for software, but rather that erroneous facts and data is detected in the first place, or at least that it doesn't produce any unexpected side-effects. One way to detect such problems is by using the consistency checking facilities of a reasoning engine. A high-quality ontology facilitates the reasoner to detect inconsistencies in most cases when inappropriate or erroneous facts are entered.
For instance, in a user models an ontology with the classes "female user" and "male user", where this information is going to be collected from a web form with a radio button, the classes should most likely be disjoint, i.e. since no user can select both alternatives from the form any given user can only be an instance of one class. After detecting such implicit requirements, or common sense constraints, and expressing them as contextual statements, they can be tested by entering obviously inconsistent facts, and checking that the reasoner is able to detect the inconsistency.
To test this ability of the ontology, the following steps can be used:
Example
I have built an ontology for the theater domain. In the ontology there is a module modeling persons acting in theater productions. There is also a contextual statement saying that each production has exactly one lead actor, i.e. the person that plays the main character of that drama. The module contains the classes "Person" and "Production", and the property "hasLeadActor" (which might be inferred based on a role playing situation, but that doesn't matter in this example). There is also an axiom stating that a production is related to exactly one person through the hasLeadActor property.
For this contextual statement, and its realization as an axiom, I can see that "acceptable values" of the number of facts using the hasLeadActor property is 0 and 1 (0 is accepted since we have an open world assumption, so what is not entered might still be true). From this I can see that 2 would be an unacceptable number of facts using the hasLeadActor property, for the same production, i.e. this value is outside the range of acceptable input. If I explicitly state that those two instances of person (that the facts relate to) are different (owl:differentIndividuals) this should result in the reasoner discovering an inconsistency. I create a new ontology where I import the module described above, and add two facts about the same production, e.g. triples like merchantOfVeniceProduction hasLeadActor Peter and merchantOfVeniceProduction hasLeadActor Steve. Then I assert that Peter and Steve are different individuals. Now, I run the reasoner on the module and check that it actually detects the inconsistency.
Assume that the ontology was constructed with the following context in mind, and use the three methods described above to test the ontology.
Context: An online music database wishes to semantically represent their data about musicians, albums, and performances, in order to be able to provide better search functions to their users, i.e. by querying the knowledge base instead of using keyword-based queries. Below is an example of what they typically would like to store, and at the bottom you find the competency questions developed as requirements for the ontology.
Additionally, assume that the ontology requirements were based on the following user story:
Story - music and bands: The current configuration of the “Red Hot Chili Peppers” is: Anthony Kiedis (vocals), Flea (bass, trumpet, keyboards, and vocals), Johs Klinghoffer (guitar), and Chad Smith (drums). The line-up has changed a few times during they years, Frusciante replaced Hillel Slovak in 1988, while Jack Irons was replaced by Chad Smith. In addition to playing guitars for Red hot Chili Peppers Frusciante also contributed to the band “The Mars Volta” as a vocalist for some time. In 2004, the Red Hot Chili Peppers started recording the album “Stadium Arcadium”. The album contains 28 tracks and was released in 2006. It includes a track of the song “Hump de Bump”, a funk-rock song composed in 2004.
For testing you need the ontological requirements that were the basis of the ontology development, they can be found below. Consider that it is against these requirements that you are performing the testing, the scenario and context above only gives you some background information. Also consider that you do not have to modify the ontology during this exercise, your goal is to identify modeling mistakes and document them.
Requirements:
Competency questions (CQs) and contextual statements of music and bands
Contextual statement:
Reasoning requirement: