Introduction
Resource Description Framework (RDF) is a method for expressing "triples" of knowledge (statements in
subject-predicate-object format) in a way that is easily serialized as XML. Different "terms" in RDF are defined by different vocabularies that people make available online in RDF
schema documents, and the system is specifically designed to let people build on each other's vocabulary definitions: you can assign different vocabularies
as namespaces in your RDF XML document and use the terms that they define. As a result, a typical RDF document includes elements from a large number of namespaces. Because of the way
that RDF information is represented in XML, it is common for every single element tag to be qualified by a namespace prefix and every parent element to have child elements from
multiple namespaces.
This presents a problem in PHP when trying to use their SimpleXML
module to parse RDF XML. It provides a SimpleXMLElement
class that is easy and fun to use, as long as you are not dealing with namespaces.
It can be adequate to use when the namespace handling is very simple: for example, when the child elements of an element all belong to the same namespace.
But there is no easy way to get the namespace prefix or portion of a particular element, and it is difficult to handle elements with child elements
from multiple namespaces. This makes what should be a very simple piece of code—converting RDF XML into a representation of the "triples" (subject, predicate,
object) represented by the XML—mind-numbingly complex.
As a result, I present the SimpleRDFElement
class: a class that extends the built-in PHP SimpleXMLElement
class with a few extra methods designed
to make it easier to use when working with RDF XML.
(Note: The rest of this article is written with the assumption that you are familiar with the basics of RDF, XML, and PHP, and know what terms like
"triple", "namespace", and "object method" mean and how they are represented.)
Background
As part of a project that I have been working on, I needed to be able to convert strings of RDF/XML text into objects representing each of the tags (or "nodes")
in the XML, and then to determine what RDF Triples were represented by those XML elements and their sub-trees. I wanted to make use of the built-in
functionality of the SimpleXML module in PHP, but when I tried, I encountered a number of problems. This is just a brief list of some of the
issues I encountered when trying to use the SimpleXMLElement
class to represent RDF/XML:
- Because all children of the root element are qualified with namespace prefixes, they cannot be accessed as object properties using the
->
operator - Because of the way that the child nodes array is created, qualified elements also cannot be viewed as array elements using methods like
print_r()
- Because the
children()
method, when called without arguments, only returns unqualified (i.e., no namespace prefix) elements, it returns nothing and so cannot be iterated over - As a result, the object appears to be completely empty; it even evaluates as "false" if you try to use it
as a Boolean (e.g. adding an "
or die()
" clause after the assignment) - When you call the
children()
method with a namespace argument, it will only retrieve children (and their sub-children) that have that namespace prefix - As a result, if you expect the children of an element to come from any of a number of namespaces, you have to iterate over every namespace
(If you are curious, I have a detailed blog post about some of my failed attempts and problems I was experiencing
here: http://talkingowlproject.blogspot.com/2011/06/simplexml-and-namespace-quirks.html.)
After extensive Google searching for a solution to this problem, I found nothing that fit my needs. Either I could download
extensive RDF "frameworks" that require installing a dozen or more PHP class files (….but all I want to do is parse an RDF string into
triples! I don't need all that!), or I could follow the suggestions of some "hack" that literally were unworkable. (For example, one person suggested I simply replace the
":" character in the RDF string with "_" in order to get rid of namespaces entirely. This doesn't work because the namespaces prefixes in an XML document are arbitrary,
intended merely to be "shortcuts" to the longer URIs defined in the header of the document. Different people can use
different prefixes to represent the same namespace URI, and it should not make a difference.)
So I decided to create my own solution as a "lightweight" alternative. It is literally one file with one main class (the SimpleRDFElement
class) and
one helper class (the SimpleRDFTriple
class). All it really does is add a few helper methods to the built-in SimpleXMLElement
class in PHP. But these methods
make all the difference in the world when you are handling RDF XML.
Because this solution is short and simple, there is a lot it doesn't do. That is on purpose: it is not supposed to do a lot. It is a simple solution to a
simple problem. It will let you parse an RDF document as an object and will let you access namespace information. It also gives you a method that will extract
triples from the top-level element represented by the object and its direct children. (This method is not recursive, so you will have to do any recursion yourself.)
I cannot guarantee that it will absolutely function for every valid RDF/XML document. However, I am open to making
(some) additions and improvements, and fixing anything that you find broken. Please contact me with your comments, suggestions, and complaints.
Using the Code
This code is a single file that contains two PHP class definitions.
The first class is merely a helper class, SimpleRDFTriple
, which literally is an object with no methods and three properties: tripleSubject
,
triplePredicate
, and tripleObject
. The only reason this class is here is so that the SimpleRDFElement
class can have a method,
getTriples()
, that returns an array of objects of that type.
The second class, SimpleRDFElement
, extends the class SimpleXMLElement
which is built into PHP as part of the SimpleXML library.
Because the class extends the SimpleXMLElement
, you can create a new SimpleRDFElement
from a string variable
that contains RDF/XML text using the built-in function simplexml_load_string()
:
$xmlobj = simplexml_load_string($xmltext,'SimpleRDFElement');
The first parameter is the variable containing the RDF/XML text that you want to parse, and the second parameter is a string: the name of our extended class,
SimpleRDFElement
. This will return an object of type SimpleRDFElement
, which means that it can be manipulated exactly like
a SimpleXMLElement
object, but that you can also use the new elements provided by the extension class.
The new methods provided by SimpleRDFClass
are:
$xmlobj->getPrefix()
Returns the namespace prefix of the root element of the object, based on the namespace definitions defined by the XML text.
$xmlobj->getNamespace()
Returns the full URI of the namespace of the root element of the object, based on the namespace definitions defined by the XML text.
$xmlobj->getFullName()
Returns the fully qualified name of the root element, using the prefix-colon-tagname format, e.g., rdfs:Class
.
$xmlobj->getFullURI()
Returns the full URI of the root element, using the expanded URI of the namespace followed by the element tag name, e.g. http://www.w3.org/2000/01/rdf-schema#Class.
$xmlobj->getChildNodes()
Returns an array of all of the child elements (as SimpleRDFElement
objects) of the current top-level element. Unlike the built-in children()
method,
this returns all child elements regardless of namespace.
$xmlobj->getAttributes()
Returns an array of all of the attributes (as individual SimpleRDFElement
objects) of the current top-level element. Unlike the built-in attributes()
method,
this returns all attributes regardless of namespace.
$xmlobj->getTriples()
Returns an array of SimpleRDFTriple
objects. This is a simple helper class that defines an object with three properties: tripleSubject
, triplePredicate
,
tripleObject
. This method parses the top level element and constructs triples based on that element, its attributes, and its immediate child elements. It is not recursive.
Most of the methods are simple and their usage is self-evident if you are familiar with RDF, XML, and namespaces.
The only complex method is getTriples()
, which returns an array of SimpleRDFTriple
objects based on the root element represented by $xmlobject
.
You should keep in mind that getTriples()
is not recursive, and therefore will assume that the root node represents an RDF element that contains information about
the subject of the triples, and the immediate child elements (and attributes) predicate, and object information about that subject. This means that if you have initially created your
$xmlobj
from a full RDF/XML document, so that the root element is the RDF element, you will have to iterate over the children to extract triples.
For example, the following code provides a very simple RDF/XML string and will show how to extract all of its triples:
$xmltext =
'<rdf:rdf
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">
<rdf:Description rdf:id="#someperson">
<rdfs:label>Bob</rdfs:label>
<rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person" />
</rdf:Description>
</rdf:rdf>';
$xmlobj = simplexml_load_string($xmltext,'SimpleRDFElement');
foreach ($xmlobj->getChildNodes() as $child)
{
foreach ($child->getTriples() as $trip)
{
print_r( $trip );
}
}
This will produce the following output text:
SimpleRDFTriple Object
(
[tripleSubject] => #someperson
[triplePredicate] => http:
[tripleObject] => Bob
)
SimpleRDFTriple Object
(
[tripleSubject] => #someperson
[triplePredicate] => http:
[tripleObject] => http:
)
Points of Interest
The code in the source file is deliberately kept very simple, so that instead of simply using it like some kind of "black box", you can see exactly how
it is done and (if you would like) modify it.
If you come up with a particularly clever extension or additional method, let me know about it and I will add it (and your name, with credit) to the source code that is linked to above.
History
Updates on this class or anything related to it will appear on the blog: http://talkingowlproject.blogspot.com/.