Introduction
My 5-years-old son enjoys using my android phone to search by voice and listen to answers narrated by a lovely semi-human semi-machine voice. For example, He asks “What is the capital of France?”, and the phone replied “Paris”, “What is the largest country in Africa?” “Algeria”. However, one day he was so frustrated when he asked “Which is bigger United states or China” and didn’t get an answer!
This situation shows the limitation of the current search mechanism which use keywords and tags to find the webpages related to your search. The accuracy and quality of search depends mainly on the popularity of the question you ask. If someone else answered your question before (which is likely with this large pool of people), the search engine will retrieve this page and you have your answer; however, if you ask a relatively sophisticated question, you are out of luck. In another scenario, if you have a compound question (e.g. List all states in USA and the how many presidents were born in each state), you might have to go through multiple webpages to get the answer (again if no one answers the question before).
Semantic web
This limitation in search feature relates to the way the web has been structured. It is simply a bunch of text files that is readable by humans but not by machines. A machine cannot collect data from different webpages to form an answer. Consequently, it has been suggested to use a new structure, known as Semantic Web, which is readable by humans and more importantly by machine.
There are many wonderful articles that explain the semantic web in details, but here I will focus more on searching and querying feature which is one of the most powerful features. In this demo, I will use two web sites that use Semantic Web (Dbpedia.org and wikidata.org). These two websites convert the giant encyclopedia Wikipedia to RDF/OWL format which allows machine to merge data from different webpages to answer compound questions. There are some technical differences between these two websites but they are outside the scope of this article.
In order to search Semantic web, we use a query language called SPARQL Protocol and RDF Query Language (SPARQL), yet it is weird the S refers recursively to SPARQL. SPARQL is very similar to SQL. In order to write a query, either use a SPARQL endpoint which is a simple webpage to write a query and display results (think of Google homepage), or use semantic web library to write a custom application. There are many good semantic web libraries such as Jena (Java), and dotNetRDF (C#). In this article, I will demonstrate both endpoints and Jena framework.
SPARQL endpoint
Each one of these websites provides a SPARQL endpoint to write queries, let’s go to https://query.wikidata.org and write the following query to get all American presidents with their signatures.
SELECT ?president ?president_name ?signature
WHERE {
?president wdt:P39 wd:Q11696.
?president wdt:P109 ?signature.
OPTIONAL {?president rdfs:label ?president_name
filter (lang(?president_name) = "en") .}
}
Now let’s go to the other endpoint, http://dbpedia.org/sparql, and use this query to retrieve all public Canadian universities with their cities and populations.
SELECT *
WHERE {
?Univeristy dbo:type dbr:Public_university.
?Univeristy dbp:country dbr:Canada.
?Univeristy dbp:city ?city.
?city dbo:populationTotal ?population
} ORDER BY DESC(?population)
Jena
Beside using SPARQL endpoints, you can also use semantic web library to write a custom application. Here I will use Jena which is one of the most powerful libraries that supports Semantic Web technology.
Jena can be downloaded from its website, https://jena.apache.org/, or by adding the following to your maven pom file.
<dependency>
<groupId>org.apache.jena</groupId>
<artifactId>apache-jena-libs</artifactId>
<version>3.0.1</version>
</dependency>
The following code runs a query with dbpedia website and the attached sample demonstrates wikidata and linkedmdb as well.
public static void main(String[] args) {
String queryString =
"PREFIX dbont: <http://dbpedia.org/ontology/> " +
"PREFIX dbp: <http://dbpedia.org/property/>" +
"PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>" +
" SELECT ?musician ?place" +
" WHERE { " +
" ?musician dbont:birthPlace ?place ." +
" }";
Query query = QueryFactory.create(queryString);
QueryExecution qexec = QueryExecutionFactory.sparqlService("http://dbpedia.org/sparql", query);
try {
ResultSet results = qexec.execSelect();
ResultSetFormatter.out(System.out, results, query);
} catch (Exception ex) {
System.out.println(ex.getMessage());
} finally {
qexec.close();
}
}
Limitations
No one can challenge the advantages of semantic web over the current web mode. Nonetheless, semantic web has not become popular yet, nor it reaches the critical mass to gain enough momentum. This can be attributed to radical required change to the current model, which is not usually welcome by many mindset; and obviously, the query’s syntax must be correct to run, while the simple keyword search has zero requirements.
Conclusion
Semantic web provides a new approach to deal with data that is “understandable” by machines. Although, there are many difficulties associated with this new technology, it will evolve and definitely supersede the current web structure in the future.
History