Introduction to Semantic Web
„Data! Data! Data!” he cried impatiently. „I can’t make bricks without clay.”
Sherlock Holmes Quote
-The Adventure of the Copper Beeches
Have you ever thought how Sherlock’s work would look like if his adventures happened nowadays? No doubt he would be pleased seeing how much data can be found in the Internet and social media that would probably become his most efficient sources of information.
But Sherlock would need to store all the important data. There might be a lot of different kinds of facts to put together and a lot of information to process. In addition, Watson might need to use the data in his software as well. Although they both are known for their cleverness, they would get quickly confused and overwhelmed by primary keys, tables and all that relational stuff, especially if they had to put together personal details of a suspect with the information about the murder scene and the characteristic of different kinds of weapon. And what if Sherlock wanted his computer to do some part of deductive reasoning for him?
Semantic Web
There is a concept of Semantic Web that is an extended version of the existing Web. The main added value of such an extended web is standardization of a way of expressing the relationships which allows computers to understand the information and process it in a way commensurate to its meaning. Therefore, information exchange should be available regardless of application boundaries – that can be achieved by providing proper technical standards. The primary standards for Semantic Web are RDF (Resource Description Framework), OWL (Ontology Web Language) and SPARQL (SPARQL Protocol and RDF Query Language). Sitting alone with his laptop, Sherlock would no longer be the only one that is concerned with the meaning instead of the data structure.
What are the meanings of being?
The question is not only commonly asked by adolescents searching for the purposes of their existences. From ancient times the greatest brainboxes have been looking for the answer – ontology comes from a Greek word on that means everything what exists. Therefore, ontology can be defined as a study of everything and the phenomenon of being. It is also interested in kinds of things that can exist and all relations between everything. The word has been adapted to information technology as a formal specification of a shared conceptualization [3] (Tom Gruber’s definition), which means the conceptual schema consisting of some concepts and the relationships between them. Such an ontology describes a particular field of knowledge. There are a lot of ontologies and vocabularies already defined and described on semanticweb.org – the most popular ones are Dublin Core, FOAF and TrackBack. The first and the third ones are described in RDF, whereas the second one has been created using OWL DL ("DL" is for description logic). The DublinCore ontology includes metadata elements that are helpful in describing any types of documents and resources – i.a. title, subject, description, language and identifier [4, 10]. For example, Sherlock would be able to create a report and add the following triples to the database to describe it:
<http://sherlock.example/reports/13_04_2015> | rdf:type | <http://sherlock.example/ontologies/Report> |
<http://sherlock.example/reports/13_04_2015> | dc:language | "English" |
<http://sherlock.example/reports/13_04_2015> | dc:description | "This is a report regarding the curious case of murder of John Ontologison. Extremely confidential!" |
Those are examples of triples – the statements created according to Resource Description Framework (RDF) that is a model for describing the resources identified by URI. Each statement consists of subject, predicate and object. The predicate is a specification of the property that is defined as an object [2]. For example, the second of the above triples can be understood as:
The report (subject) is written in a language (predicate) that is English (object).
That can be written in various formats like Turtle, N3, RDF/XML and others. We will use the first one because of its human-friendliness. To add all the three triples to the database Sherlock would need to create a .ttl file with the following content:
@prefix rdf: <http:
@prefix dc: <http:
<http:
rdf:type <http:
dc:language "English";
dc:description "This is a report regarding the curious case of murder of John Ontologison. Extremely confidential!".
There are two prefixes defined at the beginning of the file that allow to avoid repeating long parts of URIs – thanks to the prefix we can write "rdf:type" instead of "<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>”. Next, we should have the triples in the following format:
subject predicate object.
There is a dot at the end of the triple and in the example we have semicolons. Semicolon informs us about the end of a single triple as well as dot but with the important difference. The data after semicolon is expected not to be a brand new whole triple but a predicate and object for the same subject, so we don’t have to repeat it:
subject_1
predicate_1 object_1;
predicate_2 object_2;
predicate_3 object_3.
If we do not want to use semicolon for some reasons, the code should look as follows:
subject_1 predicate_1 object_1.
subject_1 predicate_2 object_2.
subject_1 predicate_3 object_3.
After finding a witness Sherlock might be told the following story:
I found a man lying on the grass in Semantic Park! He was dead and there was an incised wound in his back! I used to know him. It’s John Ontologison from Triples Street, he’s my far neighbour.
Quite a lot of data, but Sherlock would be able to store all of that in one database. Let’s assume he already had both Triple Street and Semantic Park stored in his database (with all triples describing them as the city where they were located in and so on). Some kinds of damages have already been defined as well.. Sherlock used his own prefixes not to write long URIs:
@prefix rdf: <http:
@prefix sh: <http:
@prefix sha: <http:
@prefix shp: <http:
@prefix shr: <http:
@prefix sho: <http:
<http:
rdf:type sho:Case;
sho:reportedDate 2015-04-13;
shr:describedIn <http:
sho:status sho:Unresolved.
<http:
rdf:type sha:Person;
sha:givenName "John";
sha:familyName "Ontologison";
shr:livingIn shp:TripleStreet;
shr:foundIn shp:SemanticPark.
<http:
rdf:type sha:Person;
sha:givenName "Hilary";
sha:familyName "Tripler";
shr:livingIn shp:TripleStreet
.
<http:
rdf:type sho:Crime;
rdf:type sho:Murder;
sho:victim <http:
sho:witness <http:
sho:damage sho:Incision;
sho:damageDetail "An incised wound in the back";
sho:weapon sho:Unknown;
sho:crimeScene shp:SemanticPark;
sho:inCase <http:
It is worth focusing on the fact that there is no need to store the triple saying that the victim was witness’s neighbour – that might be concluded from the fact they both lived in Triple Street. If there were triples about different kinds of weapons in the database, Sherlock would be even able to find a list of tools that might have been used for injuring mister Ontologison. However, for that he would have used a proper query language – SPARQL.
Elementary!
SPARQL is a query language for RDF. The simplest query looks like the following one:
SELECT ?a WHERE
{
(given_subject) ?a (given_object)
}
That will find all predicates for given subject and object. The variable is a string starting with "?" that replaces an unknown part of triples we want to find. Before the SELECT statement, there should be definitions of all used prefixes – they are omitted in the example. And between SELECT and WHERE we put the variables that we want to be returned – in a query we can use more of them but if we do not want to see their values in the result, we do not put them in the first line of a query. For example, Sherlock would want to find all people living in Triple Street who have already been added to his database.
SELECT ?a ?name ?surname WHERE
{
?a rdf:type sha:Person.
?a sha:givenName ?name.
?a sha:familyName ?surname.
}
That would return the table with three columns – ?a which refers to the id of a person, ?name – to a given name of a person and ?surname – to a family name of a person. The person is identified by the type sha:Person. However, it is not exactly what Sherlock was looking for – he wanted to filter the list of names and surnames to see only the ones of the poeple living in Triple Street and not to see the ids:
SELECT ?name ?surname WHERE
{
?a rdf:type sha:Person.
?a sha:givenName ?name.
?a sha:familyName ?surname.
?a shr:livingIn shp:TripleStreet.
}
Please note ?a variable is no longer present between SELECT and WHERE. If there was only the witness and the victim in his database and some other people living somewhere else, the query would return:
?name | ?surname |
John | Ontologison |
Hilary | Tripler |
There are much more possibilities given by the language. The only constraint is the quantity of data gained and stored. And who knows – maybe after constructing the proper query over the massive data store Sherlock would get only one triple, with the name of a person most likely to be a perpetrator?
There is a vision of an entire Web that is built of interoperable data. Although that has not been achieved yet, more and more significant companies and organizations start to use Semantic Web standards. According to Ramanathan V. Guha, who heads an initiative of schema.org, in 2013 there were about 4 million web domains using schema.org markup that support an interoperability of structured data [8]. There is also an idea of automatic reasoning systems but they meet some problems as vastness, uncertainty, inconsistency, etc. Logical contradictions are common for enormous ontologies or the ones created as combinations of two other ontologies [6]. Therefore, t is one of the main reasons why Sherlock would be still needed as a detective instead of an operator of Sherlock Reasoning System. At the moment, there are many reasoners providing various implementations of reasoning algorithms that are useful in different kinds of situations. Those systems support classification, querying and other reasoning tasks and are often based on tableau algorithms [11]. For example, there are engines such as FaCT++, ELK, HermiT which use hypertableau algorithm that is less nondeterministic than tableau, and Oroboro that extracts data in a way based on the rules. There is also AllegroGraph which is designed for geotemporal reasoning and analyzing the social network [4].
References
[1] F. S. Parreiras, Semantic Web and model-driven engineering
[2] J. Bąk, C. Jędrzejek, Semantic Web – technologie, zastosowania, rozwój
[3] What is an Ontology? – http://www-ksl.stanford.edu/kst/what-is-an-ontology.html
[4] SemanticWeb.org – http://semanticweb.org
[5] Introduction to Semantic Web – http://www.cambridgesemantics.com/semantic-university/introduction-semantic-web
[6] Wikipedia – http://en.wikipedia.org/wiki/Semantic_Web
[7] SPARQL – http://www.w3.org/TR/rdf-sparql-query/
[8] The 12th International Semantic Web Conference – http://iswc2013.semanticweb.org/
[9] Linked Data Tools – http://www.linkeddatatools.com/index.php
[10] Dublin Core – http://dublincore.org
[11] Comparison of Reasoners for large Ontologies in the OWL 2 EL Profile – http://www.semantic-web-journal.net/sites/default/files/swj120_2.pdf
[12] Sherlock Holmes Quotes – http://sherlockholmesquotes.com