Introduction
The objective is to provide one API that can be used to read all XML file formats. The goal is to be able to access the data in the XML as you would any C# object
model, where an XML element would be an object and it's attributes & child elements would be properties of that object i.e. Object.Property.Subproperty.Name
. The API consists of just one class, DynamicXmlNode
, and it’s based on the dynamic dictionary sample from Microsoft. I’ll be using the book store example xml format commonly used on W3Schools as a means of describing how the API
works. XML is parsed using the XDocument API
Background
This is my first time publishing anything up here so go easy on me :). A lot of my work involves parsing XML documents and I nearly always end up throwing together a custom API to read these files. Recently I’ve been playing around with C# dynamics and had the idea of creating a dynamic XML API. I did, and here it is.
Before publishing this article here I had a Google around to see what already existed
out there. I didn’t find anything quite like what I’ve put together (in particular the handling of arrays) so I've decided it’s worth sharing.
Handling Repeated Elements (Arrays)
The dynamic API can handle repeating elements by detecting sibling elements with the same name and grouping them into a collection, but it can’t tell if an element should be added to a collection if there’s only one of them present in the xml instance document.
Solution
Let the user tell us at runtime. The user of the API knows the format of the file they’re trying to parse, and which elements belong in collections. So by introducing a property naming convention for accessing arrays we can have the user tell us which elements belong in collections. The convention I’ve
chosen is __Array
(double underscore). Anytime a property is requested ending with
__Array
, the API will always return an array based on the property name before __Array
, even if no such property exists. This is very convenient since you don’t need to check if the property is null before iterating over it.
Handling complex elements that also contain a value
There’s a complication when an element has a value or text but also contains attributes or child elements. The dynamic API will create DynamicXmlNode
s for these elements and therefore accessing the corresponding property will not return a string but a DynamicXmlNode
. See an example of this situation below:
<title lang="en">The Selfish Gene</title>
The element contains both an attribute and text. If you wanted to access the lang value you’d just go Book_Array[0].Title.Lang
, but if you wanted to access just the text you can’t because Book_Array[0].Title
returns a DynamicXmlNode
.
Solution
I have three solutions for this, all of which will return the value of the title element
- Implicit String Operator:
DynamicXmlNode
includes an implicit operator to string, so assigning a DynamicXmlNode
object to a string will always return the value of the element being wrapped i.e.
string title = bookstore.Book__Array[0].Title;
Book_Array[0].Title.ToString()
will also return the value of the element being wrapped
Book_Array[0].Title._
will also return the value of the element being wrapped
Underscore Conventions
__Array
: If the caller wants an array they get an array! Users just have to append
__Array (case insensitive, double underscore) to the property name e.g. BookStore.Book__Array
__PropertyName
: Including a double underscore before a property
will return the XML (attribute or element) that was used to generate that property e.g. Book_Array[0].__Title
__
: A double underscore will return the XML element being wrapped by the DynamicXmlNode
e.g. Book_Array[0].__
_
: A single underscore will always return the value of the element e.g. Book__Array[0].Title._
How it Works
All property names are case in-sensitive.
All properties requested that don’t exist will return null, except in the case where a request property ends with __Array
(as mentions above).
Any element that contains child elements or attributes will be wrapped in a DynamicXmlNode
object. This means you can easily drill down into the file like this BookStore.Book_Array[0].Title.Lang
Using the code
I’ll use the book store example commonly used on W3Schools to demonstrate how to
use the API. Here’s the xsd schema for the bookstore.
As you can see, there are three places in the schema where an array can occur i.e. books, book authors, and CDs. Here’s a sample snippet from a book store instance document (xml file).
<bookstore>
<book category="COOKING">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="POPULAR SCIENCE">
<title lang="en">The Selfish Gene</title>
<author>Richard Dawkins</author>
<year>1976</year>
<price>15.00</price>
</book>
<book category="CHILDREN">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="WEB">
<title lang="en">XQuery Kick Start</title>
<author>James McGovern</author>
<author>Per Bothner</author>
<author>Kurt Cagle</author>
<author>James Linn</author>
<author>Vaidyanathan Nagarajan</author>
<year>2003</year>
<price>49.99</price>
</book>
<book category="WEB">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
<bookstore>
Let’s say we want to write the title of the first book to the console.
dynamic bookstore = DynamicXmlNode.Load(File);
Console.WriteLine(bookstore.Book_Array[0].Title);
Console output: Everyday Italian
Note that Console.WriteLine
calls ToString()
on the Title
property so the issue of complex elements describe above is masked here.
Now, let’s say we want to write the title of the first book by Richard Dawkins to the console.
dynamic bookstore = DynamicXmlNode.Load(File);
Console.WriteLine
(
(from book in bookstore.Book_Array as IEnumerable<dynamic>
where book.Author == "Richard Dawkins"
select book).First().Title
);
Console output: Selfish Gene
Now, let’s say we want to find the first book with multiple authors and write it’s title to the console
Console.WriteLine
(
(from book in bookstore.Book_Array as IEnumerable<dynamic>
where book.Author_Array.Count > 1
select book).First().Title
);
Console output: XQuery Kick Start
Points of Interest
RunTimeBinderExceptions
When using dynamic, the DLR (Dynamic Language Runtime) first attempts to resolve member calls by looking for statically defined members on the dynamic object. When it doesn’t find any it throws a RunTimeBinderException
before calling the TryGetMember
and TrySetMember
method of the dynamic object. These are just first chance exceptions and nothing to worry about, but they can make debugging a nightmare when you’ve the debugger configured to beak on exceptions. A simple solution to this problem is to add the RunTimeBinderException
to the list of exceptions to break on and then uncheck it.
Element or Attribute naming clashes with conventions used
The underscore convention I’m using could potentially conflict with element names, but this scenario is very unlikely. For example,
in order for the __Array
convention to cause a clash the XML schema being followed would have to be using sibling elements with the names like x
and x__Array
. It’s much more likely that these elements would have a parent/child relationship, i.e. x__Array
/x
. Also, I’ve purposely chosen to use double underscores to avoid potential clashes.
I could have capitalized on the xml element naming restrictions i.e. element names cannot begin with numbers or "xml", but this would look ugly in API usage.
Element or Attribute names containing dots
It is perfectly valid for XML elements or attributes to contain the dot character ('.') in their names, however the dot character in C# is a special operator for specifying a member of a type or namespace. Since we’re using the element/attribute name as
C# property names these dot characters need to be replaces with C# property friendly characters. You’ll never guess
which character I’ve chosen to replace them with. The underscore! Here’s an example scenario:
<element some.attribute="12">
Note: In the code below how the dot in the element name has been replaced by an underscore
Element.Some_Attribute
Handling Namespaces
The API works with the element's local name (i.e. without the namespaces prepended), so files containing namespaces are supported but there is the potential for elements to be overridden if the namespace was being used to uniquely qualify sibling elements.
Future Feature Ideas
Lazy Instantiation
Currently the complete XML file is loaded into memory. It may be desirable to have a lazy version of the API.
Write Capabilities
Dynamically reading is one thing, but dynamic write is a much more complex problem. Issues like knowing when to add an attribute or an element when a setter is called, handling namespaces, and satisfying element sequence constraints. These are all things you’d need to handle.