Introduction
Parsing XML can be a tedious, labor intensive process. Attempts to encapsulate XML into data object can improve the process. Depending on the problem space, creating and maintaining the data objects can be just as cumbersome, depending on the structure of the XML and how often the XML changes.
Background
In a project I was working recently, a third party produced XML files for import into the application I was writing. Each XML file held a batch of transactions. Each transaction consisted of a form and supporting documentation. Each file could contain any number of forms and the forms were of different types (each form type had different elements). Each form could have any number of supporting documents and the documents were of different types (each document type had different elements). When parsing, it wasn’t until you were knee deep into the XML that you really knew what information you had to be able to create a corresponding data object for the element. At that point, why bother creating a data object? I could just as easily have started processing the raw XML. There had to be a better way.
While searching for a better way, I found the .NET 4.0 System.Dynamic.ExpandoObject
.ExpandoObject
class. ExpandoObject
instances have the ability to have members added and removed at runtime. With a little more research, I found two examples of where an ExpandoObject
was being used to represent an XML document. That is, while parsing, an ExpandoObject
instance is created and each XML element is added as a member of the ExpandoObject
.
For example, this XML:
<car>
<make>Ford</make>
<model>Explorer</model>
<color>Silver</color>
</car>
... essentially becomes an instance of the following dynamic “virtual” class:
public class Car
{
public string make;
public string model;
public string color;
}
I found two examples of building an ExpandoObject
from XML. Both were close, but neither was what I needed. Both made assumptions about the XML being parsed. One example assumed that attributes could be added to the XML tags to change the behavior of the creation of the ExpandoObject
. I was a consumer of the XML. I didn’t have control over what the publisher provided. The other example assumed that list nodes always preceded non-list nodes. That was not the case in my source XML. In short, both potential solutions made assumptions about the XML being read that were bad assumptions in my problem space. I needed a generic and more robust solution.
Start With What Works Best
Of the two examples if found to convert XML to an ExpandoObject
instance, the one that was closest to being what I needed was posted on ITDevSpace.com. I make no attempt to hide that my solution originated from using this code. In fact, looking closely at both, there are only a few differences. I give 90% credit to them for what was, for me, a 90% solution.
This article describes my 10%.
Requirement 1
The original class did a really good job of handling lists of objects. It, however, assumed that the list was the first child element in the XML. For example, according to the ITDevSpace.com solution, this XML is “good”:
<car>
<owners>
<owner>Bob Jones</owner>
<owner>Betty Jones</owner>
</owners>
<make>Ford</make>
<model>Explorer</model>
<color>Silver</color>
</car>
This XML is “bad”:
<car>
<make>Ford</make>
<model>Explorer</model>
<color>Silver</color>
<owners>
<owner>Bob Jones</owner>
<owner>Betty Jones</owner>
</owners>
</car>
This is unfortunate, as they are syntactically identical XML. My XML looked like the “bad” XML, so that wasn’t going to work. I needed a way to generically handle lists, regardless of their placement in the XML.
Requirement 2
Although the original class did a good job of handling lists, there was a problem if the list only contained on item. For example, if parsing the “good” XML above, the original would produce this class:
public class Car
{
public string make;
public string model;
public string color;
public dynamic owners; }
However, if there were only one owner. That is, this XML:
<car>
<make>Ford</make>
<model>Explorer</model>
<color>Silver</color>
<owners>
<owner>Bob Jones</owner>
</owners>
</car>
Would produce this class:
public class Car
{
public string make;
public string model;
public string color;
public dynamic owner; }
Compared to the output of the “good” XML above, when using the Car.owners.owner
member,
I wouldn’t know if I was looking at a List<>
or a single object without investigation through reflection.
I wanted (i.e., not “needed”) a standard interface for items that I knew were going to be lists.
Requirement 3
Attributes! What about attributes? The original class processed intermediate nodes without regard for their attributes. For example, this XML:
<car>
<make>Ford</make>
<model>Explorer</model>
<color>Silver</color>
<owners type=”Current”>
<owner>Bob Jones</owner>
</owners>
</car>
... would lose the data contained in the type attribute. I needed all attributes! I needed something like this result:
public class Car
{
public string make;
public string model;
public string color;
public dynamic owners;
}
public class owners
{
public string type;
public <List>dynamic owner; };
The Code
Here is the most recent version. This class is also included in the sample project provided with this article for download.
public static class ExpandoObjectHelper
{
private static List<string> KnownLists;
public static void Parse(dynamic parent, XElement node, List<string> knownLists = null)
{
if (knownLists != null)
{
KnownLists = knownLists;
}
IEnumerable<xelement> sorted = from XElement elt in node.Elements()
orderby node.Elements(elt.Name.LocalName).Count() descending select elt;
if (node.HasElements)
{
int nodeCount = node.Elements(sorted.First().Name.LocalName).Count();
bool foundNode = false;
if (KnownLists != null && KnownLists.Count > 0)
{
foundNode = (from XElement el in node.Elements()
where KnownLists.Contains(el.Name.LocalName) select el).Count() > 0;
}
if (nodeCount>1 || foundNode==true)
{
var item = new ExpandoObject();
List<dynamic> list = null;
string elementName = string.Empty;
foreach (var element in sorted)
{
if (element.Name.LocalName != elementName)
{
list = new List<dynamic>;
elementName = elementName.LocalName;
}
if (element.HasElements ||
(KnownLists != null && KnownLists.Contains(element.Name.LocalName)))
{
Parse(list, element);
AddProperty(item, element.Name.LocalName, list);
}
else
{
Parse(item, element);
}
}
foreach (var attribute in node.Attributes())
{
AddProperty(item, attribute.Name.ToString(), attribute.Value.Trim());
}
AddProperty(parent, node.Name.ToString(), item);
}
else
{
var item = new ExpandoObject();
foreach (var attribute in node.Attributes())
{
AddProperty(item, attribute.Name.ToString(), attribute.Value.Trim());
}
foreach (var element in sorted)
{
Parse(item, element);
}
AddProperty(parent, node.Name.ToString(), item);
}
}
else
{
AddProperty(parent, node.Name.ToString(), node.Value.Trim());
}
}
private static void AddProperty(dynamic parent, string name, object value)
{
if (parent is List<dynamic>)
{
(parent as List<dynamic>).Add(value);
}
else
{
(parent as IDictionary<string, object>)[name] = value;
}
}
}
Usage
The following section of code is taken directly from the sample project provided with this article for download. The first line of code loads an XML file from disk. The next section creates a list of XML node names that are known to contain lists of items. Once the XML is loaded and the list of nodes is prepared, creation of the dynamic
object can begin by calling ExpandoObjectHelper.Parse
. Once the parsing is complete, the parsed data may then be used.
var xmlDocument = XDocument.Load("test.xml");
List<string> listNodes = new List<string>() {"owners"};
dynamic xmlContent = new ExpandoObject();
ExpandoObjectHelper.Parse(xmlContent, xmlDocument.Root, listNodes);
Console.WriteLine("Make: {0}", xmlContent.car.make);
Known Issues
Everything has a drawback, right? I’ve noticed that compared to my previous XML parsing method, using an
ExpandoObject
is slow.
I haven’t run benchmarks to know how much slower, but it is noticeable. I know part of the difference is that my original parsing method used XPath.
I was able to target exactly the nodes I wanted before I realized that the data files were all different. When creating an
ExpandoObject
, the code has
to slog through the entire file – every element and every attribute. If the file is large, this can take some time. Since my project is a scheduled job
that runs overnight without a human watching it, I don’t expect this to be a problem for me. Your mileage may vary.
References
History
25 October 2012
- Bug fix. An issue with multiple lists of objects at the same node level was discovered (see forum post below) whereby the properties of one list would be mixed with the properties of the sibling list.
19 September 2012
- Reformatted mangled text, including fixing code broken during reformatting.
- Updated the usage code.
- Added sample project.