Index
This article reviews methods for reading and writing XML in .Net. These methods are applicable for tree structures (see also SQLite data storage), as well, as pretty much any structure that has a similar complex natur. This review includes techniques that should be interesting for large projects were hiding implementation details (using non-public data models) is key towards stable software development [1]. This discussion includes the usage of XSDs with the DataContractSerializer
class.
The XML format is widely used to store data, but I was not able to find a .Net sample code project for storing/loading XML data from/into a tree structure. Let alone using interface to hide the model details or the usage of XSDs (Data Contracts) with the DataContractSerializer.
The .Net framework offers a very simple XmlSerializer feature for handling XML data. This feature requires just a data model with public get
and set
properties. The XmlSerializer
considers each property for serialization and decides on a format based on either a given mark up (e.g.: with XmlIgnore) or the available data-type and actual name of each property. This section shows how we can use this simple method to do a quick first implementation that is usually good enough for a quick first prototype.
The attached demo application in Easy_Xml_V1.zip shows a simple XML persistence method for reading (de-serializing) and writing (serializing) XML data to/from a tree structured object model. The tree structure contains a root object, a collection of root items, and their children:
This simple XmlSerializer
method requires the above object model structure and the work-flow shown in the main method (everything else is just demo set-up).
var rootModel = new ModelRoot("My Test Tree");
rootModel.RootItems.Add(makeTree());
string result = null;
XmlSerializer serializer = new XmlSerializer(typeof(ModelRoot));
using (StreamWriter writer = new StreamWriter(@"data.xml"))
{
serializer.Serialize(writer, rootModel);
result = writer.ToString();
}
ModelRoot resultTree = null;
using (var reader = XmlReader.Create(@"data.xml"))
{
XmlSerializer deserializer = new XmlSerializer(typeof(ModelRoot));
resultTree = (ModelRoot) deserializer.Deserialize(reader);
}
Console.WriteLine("Debug Deserialized XML for {0}", resultTree);
var items = TreeLib.BreadthFirst.Traverse.LevelOrder(resultTree.RootItems, i => i.Children);
foreach (var item in items)
{
Console.WriteLine("Level: {0} Node: '{1}'", item.Level, item.Node);
}
Dim rootModel = New ModelRoot("My Test Tree")
rootModel.RootItems.Add(makeTree())
Dim result As String = Nothing
Dim serializer As XmlSerializer = New XmlSerializer(GetType(ModelRoot))
Using writer As StreamWriter = New StreamWriter("data.xml")
serializer.Serialize(writer, rootModel)
result = writer.ToString()
End Using
Dim resultTree As ModelRoot = Nothing
Using reader = XmlReader.Create("data.xml")
Dim deserializer As XmlSerializer = New XmlSerializer(GetType(ModelRoot))
resultTree = CType(deserializer.Deserialize(reader), ModelRoot)
End Using
Console.WriteLine("Deserialized XML for {0}", resultTree)
Dim items = TreeLib.BreadthFirst.Traverse.LevelOrder(resultTree.RootItems, Function(i) i.Children)
For Each item In items
Console.WriteLine("Level: {0} Node: '{1}'", item.Level, item.Node)
Next
The above code generates XML into a string or a file, and reads the XML from a string or a file, depending on whether we use the:
StreamWriter(@"data.xml")
and XmlReader.Create(@"data.xml")
or the StringWriter()
and StringReader(result)
approach shown above (comment in or out to see how this works).
The foreach
loop at the bottom of the listing is used to visit (traverse) each node in the tree and display its content. We can see here that saving (serializing) and loading (de-serializing) from and to XML is not that complicated when we consider that we are looking at a tree structure.
It is important to note that this simple serialization method will work only, if we can provide:
- public classes with a public default constructor and
- public properties with public accessors for
get
and set
The naming of the XML elements is by default based on class names and property names but this can be separated from the model, if we decorate items accordingly [6].
A pit-fall to avoid here are direct references to parent items, because these would result into a circular structure, which cannot be serialized in XML (this is a limitation of the XML format). But this does not mean that we cannot have a parent pointer in the model, if we feel like we need one. We can simply apply the [XmlIgnore]
[6] tag to the Parent
property and get away with that (because the Parent
property is not needed for XML serialization) or we can resolve the parent relationship by using a ParentId
as shown in the code below (assuming that Ids are unique over the complete tree structure).
public class Node
{
private Node _Parent = null;
public string Id { get; set; }
public List<node> Children { get; set; }
[XmlIgnore]
public Node Parent
{
get
{
return _Parent;
}
set
{
if (_Parent != value)
{
_Parent = value;
if (_Parent != null)
ParentId = _Parent.Id;
else
ParentId = string.Empty;
}
}
}
public string ParentId { get; set; }
}</node>
Public Class Node
Private _Parent As Node = Nothing
Public Property Id As String
Public Property Children As List(Of Node)
<xmlignore>
Public Property Parent As Node
Get
Return _Parent
End Get
Set(ByVal value As Node)
If _Parent <> value Then
_Parent = value
If _Parent IsNot Nothing Then ParentId = _Parent.Id Else ParentId = String.Empty
End If
End Set
End Property
Public Property ParentId As String
End Class</xmlignore>
The ParentId
can be converted into a parent reference upon loading by using the foreach
loop at the bottom of the second last listing together with a dictionary as previously explained for the Level-Order conversion .
The above serialization and de-serialization code requires the class of the root to be known and can be done between different "storage types", such as, string, file etc. - this can be implemented more conveniently towards usage for different classes with a templated class definition (see Easy_Xml_V2.zip solution for complete listing):
public class XmlStorage
{
public static string WriteXmlToString<t>(T rootModel)
{
using (var writer = new StringWriter())
{
XmlSerializer serializer = new XmlSerializer(typeof(T));
serializer.Serialize(writer, rootModel);
return writer.ToString();
}
}
public static void WriteXmlToFile<t>(string filename, T rootModel)
{
using (StreamWriter writer = new StreamWriter(filename))
{
XmlSerializer serializer = new XmlSerializer(typeof(T));
serializer.Serialize(writer, rootModel);
}
}
public static T ReadXmlFromString<t>(string input)
{
using (var reader = new StringReader(input))
{
XmlSerializer deserializer = new XmlSerializer(typeof(T));
return (T) deserializer.Deserialize(reader);
}
}
public static T ReadXmlFromFile<t>(string filename)
{
using (var reader = XmlReader.Create(filename))
{
XmlSerializer deserializer = new XmlSerializer(typeof(T));
return (T)deserializer.Deserialize(reader);
}
}
}</t></t></t></t>
Public Class XmlStorage
Public Shared Function WriteXmlToString(Of T)(ByVal rootModel As T) As String
Using writer = New StringWriter()
Dim serializer As XmlSerializer = New XmlSerializer(GetType(T))
serializer.Serialize(writer, rootModel)
Return writer.ToString()
End Using
End Function
Public Shared Sub WriteXmlToFile(Of T)(ByVal filename As String, ByVal rootModel As T)
Using writer As StreamWriter = New StreamWriter(filename)
Dim serializer As XmlSerializer = New XmlSerializer(GetType(T))
serializer.Serialize(writer, rootModel)
End Using
End Sub
Public Shared Function ReadXmlFromString(Of T)(ByVal input As String) As T
Using reader = New StringReader(input)
Dim deserializer As XmlSerializer = New XmlSerializer(GetType(T))
Return CType(deserializer.Deserialize(reader), T)
End Using
End Function
Public Shared Function ReadXmlFromFile(Of T)(ByVal filename As String) As T
Using reader = XmlReader.Create(filename)
Dim deserializer As XmlSerializer = New XmlSerializer(GetType(T))
Return CType(deserializer.Deserialize(reader), T)
End Using
End Function
End Class
So, that's the easy solution and it should be applicable as long as the structure is rather small (a few thousand objects) and public access to the objects that make up the model is not a problem.
But what can we do, if we want to hide the implementation details of the XML de/serialization, because this is part of a larger project and we need to minimize room for error? Or what about a project where we need more control over the de/serialization process? This is where the IXmlSerializable
interface and the DataContractSerializer
come to the rescue as we will see in the developments of the next sections.
The IXmlSerializable
interface [7] must be implemented by the classes that represent the data model. There are then service classes, such as, the XMLSerializer
or the DataContractSerializer
, which use the defined interface methods to drive the serialization process for each object. The interface requires 3 methods to be implemented:
public interface IXmlSerializable
{
XmlSchema GetSchema ();
void ReadXml ( XmlReader reader );
void WriteXml ( XmlWriter writer );
}
Public Interface IXmlSerializable
Function GetSchema() As XmlSchema
Sub ReadXml(ByVal reader As XmlReader)
Sub WriteXml(ByVal writer As XmlWriter)
End Interface
The IXmlSerializable
interface requires that each class of the model implements a ReadXml
, WriteXml
, and GetSchema
method. The GetSchema
method is required, but is already implemented, if it returns null. The other two methods require an implementation and are not so straight forward, especially when reading XML, because the read process should be flexible enough to handle different situations, but still be good enough to read everything correctly.
The de/serialization process through the IXmlSerializable
interface can be handled by more than one class. A well known class that has been around since .Net 1.1 is the XMLSerializer
class while the more recently added DataContractSerializer
was added in .Net 3.5.
The XMLSerializer
requires that:
- the modelling class is present as a public class,
- with an internal constructor,
- and property getters/setters can be private (properties are not required).
This implementation can be verified in the IXmlSerializable_V1.zip sample code where a first prototype like implementation is based on the IXmlSerializable
interface with a collection that is persisted using a Count
attribute on the RootItems
and Children
collections. This Count
attribute is used upon reading XML in each ReadXml()
method to load each child entry in turn.
The simplified solution for handling collection data in XML as shown in IXmlSerializable_V1.zip is improved upon in the demo code from IXmlSerializable_V2.zip where we read and write items in each collection without using the Count
attribute. This is possible, because the XmlReader
lets us see, if we are at the end of the XML collection or at the beginning of the next child element:
public void ReadXml(System.Xml.XmlReader reader)
{
Name = reader.GetAttribute("Name");
Version = int.Parse(reader.GetAttribute("Version"));
MinorVersion = int.Parse(reader.GetAttribute("MinorVersion"));
reader.ReadStartElement();
reader.MoveToContent();
while (reader.NodeType == System.Xml.XmlNodeType.Whitespace)
reader.Read();
if (reader.NodeType != System.Xml.XmlNodeType.EndElement)
{
reader.ReadStartElement("RootItems");
reader.MoveToContent();
while (reader.NodeType == System.Xml.XmlNodeType.Whitespace)
reader.Read();
if (reader.NodeType != System.Xml.XmlNodeType.EndElement)
{
var nodeSer = new XmlSerializer(typeof(Node));
while (reader.NodeType != System.Xml.XmlNodeType.EndElement)
{
var nextNode = (Node)nodeSer.Deserialize(reader);
_RootItems.Add(nextNode);
while (reader.NodeType == System.Xml.XmlNodeType.Whitespace)
reader.Read();
}
reader.ReadEndElement();
}
reader.ReadEndElement();
}
}
Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements IXmlSerializable.ReadXml
Name = reader.GetAttribute("Name")
Version = Integer.Parse(reader.GetAttribute("Version"))
MinorVersion = Integer.Parse(reader.GetAttribute("MinorVersion"))
reader.ReadStartElement()
reader.MoveToContent()
While reader.NodeType = System.Xml.XmlNodeType.Whitespace
reader.Read()
End While
If reader.NodeType <> System.Xml.XmlNodeType.EndElement Then
reader.ReadStartElement("RootItems")
reader.MoveToContent()
While reader.NodeType = System.Xml.XmlNodeType.Whitespace
reader.Read()
End While
If reader.NodeType <> System.Xml.XmlNodeType.EndElement Then
Dim nodeSer = New XmlSerializer(GetType(Node))
While reader.NodeType <> System.Xml.XmlNodeType.EndElement
Dim nextNode = CType(nodeSer.Deserialize(reader), Node)
_RootItems.Add(nextNode)
While reader.NodeType = System.Xml.XmlNodeType.Whitespace
reader.Read()
End While
End While
reader.ReadEndElement()
End If
reader.ReadEndElement()
End If
End Sub
The var nodeSer = new XmlSerializer(typeof(Node));
statement in the above listing is necessary, because the XmlSerializer should be used to write the opening and closing tag of each child node.
The solution in IXmlSerializable_V2.zip also moves the model data classes into a seperate assembly and implements interfaces to test how far access to these classes can be restricted:
We can now have private property setters and internal class constructors for the Node
and ModelRoot
classes, which still have to be public to function correctly with the IXmlSerializable
interface implementation and the IXmlSerializer
service class.
This means that clients of the TreeModelLib
library can see the modelling classes but they can no longer create these classes and cannot manipulate data items without having to go through the defined interfaces.
So, hidding implementation details with the XMLSerializer
cannot be done completely (although we get pretty close). The DataContractSerializer
[9], on the other hand, can also work with an internal class model and private property set accessors when implementing the IXmlSerializable
interface (see sample download V3_DataContractSerializer.zip). That is, we can use the DataContractSerializer
with the IXmlSerializable
interface on the modelling class to completely hide implementation details of saving and loading data via XML.
Both service classes, XMLSerializer
and DataContractSerializer
[9], do not implement exactly the same behavior. One difference I noticed was that the DataContractSerializer
yields System.Xml.XmlNodeType.Whitespace
more frequently, while the XMLSerializer
is happy to browse from one tag StartElement to the other, never mentioning the white spaces in-between. These details are not difficult to handle, when it has become clear where the difference in behaviour is, but it is important to carefully debug existing code when changing from one serializer to the other.
The remainder of this article reviews the details of the DataContractSerializer
implementation, while the XMLSerializer
implementation is covered with the attached downloads - mainly because the XMLSerializer
cannot implement a hidden class model and the DataContractSerializer
is a more recent class that should be considered as preference over the XMLSerializer
since it offers the same features plus more advanced functions.
The DataContractSerializer
[9] implementation in DataContractSerializer_V1.zip is very identical to the previously discussed IXmlSerializable_V2.zip sample. The main difference is that we use DataContractSerializer
insetad of the XmlSerializer
. This requires a reference to the System.Runtime.Serialization
assembly and a using statement with that namespace. We can replace the previously used Storage
class with the DataContractStorage
class to initiate each de/serialization process. We also have to replace the previously stated pattern for reading and writing child items with a pattern that is specific to the DataContractSerializer:
public void ReadXml(System.Xml.XmlReader reader)
{
...
reader.ReadStartElement("Children");
reader.MoveToContent();
while (reader.NodeType == System.Xml.XmlNodeType.Whitespace)
reader.Read();
if (reader.NodeType != System.Xml.XmlNodeType.EndElement)
{
while (reader.NodeType != System.Xml.XmlNodeType.EndElement)
{
var dataContractSerializer = new DataContractSerializer(typeof(Node));
var nextNode = (Node)dataContractSerializer.ReadObject(reader);
_ChildItems.Add(nextNode);
while (reader.NodeType == System.Xml.XmlNodeType.Whitespace)
reader.Read();
}
reader.ReadEndElement();
}
...
}
Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements IXmlSerializable.ReadXml
...
reader.MoveToContent()
While reader.NodeType = System.Xml.XmlNodeType.Whitespace
reader.Read()
End While
If reader.NodeType <> System.Xml.XmlNodeType.EndElement Then
reader.ReadStartElement("Children")
reader.MoveToContent()
While reader.NodeType = System.Xml.XmlNodeType.Whitespace
reader.Read()
End While
If reader.NodeType <> System.Xml.XmlNodeType.EndElement Then
While reader.NodeType <> System.Xml.XmlNodeType.EndElement
Dim dataContractSerializer = New DataContractSerializer(GetType(Node))
Dim nextNode = CType(dataContractSerializer.ReadObject(reader), Node)
_ChildItems.Add(nextNode)
While reader.NodeType = System.Xml.XmlNodeType.Whitespace
reader.Read()
End While
End While
reader.ReadEndElement()
End If
reader.ReadEndElement()
...
End Sub
public void WriteXml(System.Xml.XmlWriter writer)
{
...
writer.WriteStartElement("Children");
foreach (var item in Children)
{
var dataContractSerializer = new DataContractSerializer(typeof(Node));
dataContractSerializer.WriteObject(writer, item);
}
writer.WriteEndElement();
...
}
Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements IXmlSerializable.ReadXml
...
writer.WriteStartElement("Children")
For Each item In Children
Dim dataContractSerializer = New DataContractSerializer(GetType(Node))
dataContractSerializer.WriteObject(writer, item)
Next
writer.WriteEndElement()
...
End Sub
This is all that is required towards using an XML serializer with a completely hidden data model implementation. All solutions discussed so far in this article are still naive with respect to the usage of an XML Schema Definition (XSD), which is typically used to ensure that all data items match an expected consistency. The next section evaluates that point with the DataContractSerializer
to make things even more robust towards failures that usually occur in production.
This section discusses how we can use an XML Schema Definition (XSD) [10] with a DataContractSerializer
[9] to ensure that the consistency of transfered data items meats expectations in production. The XsdDataContract.zip solution contains 2 projects that where directly drawn from the referenced MSDN articles.
The XsdDataContractExporter project shows how the DataContractSerializer
[9] can be used to create a DataContract based on a given data model. The exported XSD is not much to talk about in terms of detail and so forth but maybe this detail is useful to someone else.
The XmlSchemaSet_Sample shows how an XSD file representation can be used to control the parsing process when reading XML with an XmlReader
. The project shows that the XmlReaderSettings
class can contain multiple schema definitions (short schemas or XSDs) which in turn can be used to initialize an XmlReader
. The XmlReader
can then use the call back function to report any errors or warnings, if they are necessary.
The lessons learned in the XmlSchemaSet_Sample project are applied in the DataContractSerializer_V2.zip sample. This sample has extended read XML method signatures to accommodate for the additional schemas:
public static IModelRoot ReadXmlFromString<t>(string input
, XmlSchemaSet schemas = null
, ValidationEventHandler validationCallBack = null)
public static IModelRoot ReadXmlFromString<t>(string input
, XmlSchemaSet schemas = null
, ValidationEventHandler validationCallBack = null)
</t></t>
Public Shared Function ReadXmlFromFile(Of T)(ByVal filename As String,
ByVal Optional schemas As XmlSchemaSet = Nothing,
ByVal Optional validationCallBack As ValidationEventHandler = Nothing) As IModelRoot
Public Shared Function ReadXmlFromString(Of T)(ByVal input As String,
ByVal Optional schemas As XmlSchemaSet = Nothing,
ByVal Optional validationCallBack As ValidationEventHandler = Nothing) As IModelRoot
These schemas are handed off to the XmlReader
to report back information whenever the XML does not meet the specified expectations in the TreeModel.xsd file. We can verify this, if we open the TreeModel.xsd file and specify an attribute that is currently not implemented:
<xsd:attributeGroup name ="ModelRootAttribs">
<xsd:attribute name="Version" default="1" type="xsd:int" />
<xsd:attribute name="MinorVersion" default="0" type="xsd:int" />
<xsd:attribute name="Name" type="xsd:string" use="required" />
<xsd:attribute name="Test" type="xsd:string" use="required" />
</xsd:attributeGroup>
...should produce the following output:
Validation Error: The required attribute 'Test' is missing.
Exception The required attribute 'Test' is missing. at line 2 position 2
The .Net framework also supports the ISerializable
interface for serializing objects into a binary format. This form of serialization was not part of this article since it is not interoperable and is very similar to the IXmlSerializable
interface. I have not actually tried this, but I would expect that using the IXmlSerializable
interface together with a zip container can yield similar performance and space requirements (especially when reading data) as the ISerializable
interface can provide. Please have a look at the 04_sqlite_tut.zip demo application attached to this article, if you want to have a performance hint towards reading interoperable XML from a zipped data container.
The support for XML in the .Net framework is huge and the mentioned interfaces and techniques in this article are by no means complete. A tool that is often useful for generating XSDs, or model classes from XSDs, is for example the XML Schema Definition Tool (Xsd.exe) tool. This tool is also useful when it comes to quickly generating an object model with complex constrains, but its application details are certainly the topic of another article. A similar tool, but for the DataContractSerializer
is the ServiceModel Metadata Utility Tool (Svcutil.exe), which is also not covered in this article.
The .Net support for XML serialization is so wide that the question, whether something could be done with XML, is quickly replaced by the question: How it could be done. Which in turn leads to the questions of a working code sample. I hope this article sheds some light on these questions and gives you a better overview on the serialization techniques with XML.
Any feedback towards important items that were missed, or improvements that might might be applicable, are as always very welcome.