Reading and Writing XML in C#/VB.Net

Dirk Bahle

5.00/5 (12 votes)

22 Dec 2017CPOL11 min read

60.8K

8.5K

Tips & Tricks on De/Serializing object graphs with XML

Index

Introduction
Background
The XmlSerializer
- Reading and Writing Tree Structured XML Data
- Implementing IXmlSerializable
The DataContractSerializer
Making Things Bullet Proof
Conclusions
References

Introduction

This article reviews methods for reading and writing XML in .Net. These methods are applicable for tree structures (see also SQLite data storage), as well, as pretty much any structure that has a similar complex natur. This review includes techniques that should be interesting for large projects were hiding implementation details (using non-public data models) is key towards stable software development [1]. This discussion includes the usage of XSDs with the DataContractSerializer class.

Background

The XML format is widely used to store data, but I was not able to find a .Net sample code project for storing/loading XML data from/into a tree structure. Let alone using interface to hide the model details or the usage of XSDs (Data Contracts) with the DataContractSerializer.

The XmlSerializer

The .Net framework offers a very simple XmlSerializer feature for handling XML data. This feature requires just a data model with public get and set properties. The XmlSerializer considers each property for serialization and decides on a format based on either a given mark up (e.g.: with XmlIgnore) or the available data-type and actual name of each property. This section shows how we can use this simple method to do a quick first implementation that is usually good enough for a quick first prototype.

Reading and Writing Tree Structured XML Data

The attached demo application in Easy_Xml_V1.zip shows a simple XML persistence method for reading (de-serializing) and writing (serializing) XML data to/from a tree structured object model. The tree structure contains a root object, a collection of root items, and their children:

This simple XmlSerializer method requires the above object model structure and the work-flow shown in the main method (everything else is just demo set-up).

var rootModel = new ModelRoot("My Test Tree");
    rootModel.RootItems.Add(makeTree());                     // Create tree in-memory

    string result = null;
    XmlSerializer serializer = new XmlSerializer(typeof(ModelRoot));

    using (StreamWriter writer = new StreamWriter(@"data.xml"))  // Write Xml to file
////    using (var writer = new StringWriter())                  // Write Xml to string
    {
      serializer.Serialize(writer, rootModel);
      result = writer.ToString();                // Convert result to string to read below
    }
    
    ModelRoot resultTree = null;                        // Re-create tree from XML
    using (var reader = XmlReader.Create(@"data.xml")) // Read Xml Data from file
////    using (var reader = new StringReader(result)) // Read Xml Data from string
    {
      XmlSerializer deserializer = new XmlSerializer(typeof(ModelRoot));
      resultTree = (ModelRoot) deserializer.Deserialize(reader);
    }
    
    Console.WriteLine("Debug Deserialized XML for {0}", resultTree);
    var items = TreeLib.BreadthFirst.Traverse.LevelOrder(resultTree.RootItems, i => i.Children);
    foreach (var item in items)
    {
      Console.WriteLine("Level: {0} Node: '{1}'", item.Level, item.Node);
    }

Dim rootModel = New ModelRoot("My Test Tree") ' Create tree in-memory
rootModel.RootItems.Add(makeTree())

Dim result As String = Nothing                ' Write Xml to file
Dim serializer As XmlSerializer = New XmlSerializer(GetType(ModelRoot))

Using writer As StreamWriter = New StreamWriter("data.xml")
''''    using writer = New StringWriter()                  ' Write Xml to string
    serializer.Serialize(writer, rootModel)
    result = writer.ToString() ' Convert result to string to read below
End Using

Dim resultTree As ModelRoot = Nothing        ' Re-create tree from XML
Using reader = XmlReader.Create("data.xml")  ' Read Xml Data from file
''''    using reader = New StringReader(result) ' Read Xml Data from string
    Dim deserializer As XmlSerializer = New XmlSerializer(GetType(ModelRoot))
    resultTree = CType(deserializer.Deserialize(reader), ModelRoot)
End Using

Console.WriteLine("Deserialized XML for {0}", resultTree)
Dim items = TreeLib.BreadthFirst.Traverse.LevelOrder(resultTree.RootItems, Function(i) i.Children)
For Each item In items
    Console.WriteLine("Level: {0} Node: '{1}'", item.Level, item.Node)
Next

The above code generates XML into a string or a file, and reads the XML from a string or a file, depending on whether we use the:

StreamWriter(@"data.xml") and XmlReader.Create(@"data.xml") or the
StringWriter() and StringReader(result)

approach shown above (comment in or out to see how this works).

The foreach loop at the bottom of the listing is used to visit (traverse) each node in the tree and display its content. We can see here that saving (serializing) and loading (de-serializing) from and to XML is not that complicated when we consider that we are looking at a tree structure.

It is important to note that this simple serialization method will work only, if we can provide:

public classes with a public default constructor and
public properties with public accessors for get and set

The naming of the XML elements is by default based on class names and property names but this can be separated from the model, if we decorate items accordingly [6].

A pit-fall to avoid here are direct references to parent items, because these would result into a circular structure, which cannot be serialized in XML (this is a limitation of the XML format). But this does not mean that we cannot have a parent pointer in the model, if we feel like we need one. We can simply apply the [XmlIgnore] [6] tag to the Parent property and get away with that (because the Parent property is not needed for XML serialization) or we can resolve the parent relationship by using a ParentId as shown in the code below (assuming that Ids are unique over the complete tree structure).

public class Node
{
   private Node _Parent = null;

   public string Id      { get; set; }
   
   public List<node> Children  { get; set; }

   [XmlIgnore]
   public Node Parent
   {
     get
     {
       return _Parent;
     }
     
     set
     {
       if (_Parent != value)
       {
         _Parent = value;
         
         if (_Parent != null)
           ParentId = _Parent.Id;
         else
           ParentId = string.Empty;
       }
       
     }
   }

   public string ParentId      { get; set; }
}</node>

Public Class Node

    Private _Parent As Node = Nothing

    Public Property Id As String

    Public Property Children As List(Of Node)

        <xmlignore>
    Public Property Parent As Node
        Get
            Return _Parent
        End Get

        Set(ByVal value As Node)
            If _Parent <> value Then
                _Parent = value
                If _Parent IsNot Nothing Then ParentId = _Parent.Id Else ParentId = String.Empty
            End If
        End Set
    End Property

    Public Property ParentId As String
End Class</xmlignore>

The ParentId can be converted into a parent reference upon loading by using the foreach loop at the bottom of the second last listing together with a dictionary as previously explained for the Level-Order conversion .

The above serialization and de-serialization code requires the class of the root to be known and can be done between different "storage types", such as, string, file etc. - this can be implemented more conveniently towards usage for different classes with a templated class definition (see Easy_Xml_V2.zip solution for complete listing):

public class XmlStorage
{
  public static string WriteXmlToString<t>(T rootModel)
  {
    using (var writer = new StringWriter())     // Write Xml to string
    {
      XmlSerializer serializer = new XmlSerializer(typeof(T));
      serializer.Serialize(writer, rootModel);
      return writer.ToString();                // Convert result to string to read below
    }
  }

  public static void WriteXmlToFile<t>(string filename, T rootModel)
  {
    using (StreamWriter writer = new StreamWriter(filename))  // Write Xml to file
    {
      XmlSerializer serializer = new XmlSerializer(typeof(T));
      serializer.Serialize(writer, rootModel);
    }
  }

  public static T ReadXmlFromString<t>(string input)
  {
    using (var reader = new StringReader(input))   // Read Xml Data from string
    {
      XmlSerializer deserializer = new XmlSerializer(typeof(T));
      return (T) deserializer.Deserialize(reader);
    }
  }
  
  public static T ReadXmlFromFile<t>(string filename)
  {
    using (var reader = XmlReader.Create(filename))    // Read Xml Data from file
    {
      XmlSerializer deserializer = new XmlSerializer(typeof(T));
      return (T)deserializer.Deserialize(reader);
    }
  }
}</t></t></t></t>

Public Class XmlStorage
  ' Write Xml to String
  Public Shared Function WriteXmlToString(Of T)(ByVal rootModel As T) As String
    Using writer = New StringWriter()
      Dim serializer As XmlSerializer = New XmlSerializer(GetType(T))
      serializer.Serialize(writer, rootModel)
      Return writer.ToString()
    End Using
  End Function

  ' Write Xml to file
  Public Shared Sub WriteXmlToFile(Of T)(ByVal filename As String, ByVal rootModel As T)
    Using writer As StreamWriter = New StreamWriter(filename)
      Dim serializer As XmlSerializer = New XmlSerializer(GetType(T))
      serializer.Serialize(writer, rootModel)
    End Using
  End Sub

  ' Read Xml from String
  Public Shared Function ReadXmlFromString(Of T)(ByVal input As String) As T
    Using reader = New StringReader(input)
      Dim deserializer As XmlSerializer = New XmlSerializer(GetType(T))
      Return CType(deserializer.Deserialize(reader), T)
    End Using
  End Function

  ' Read Xml from file
  Public Shared Function ReadXmlFromFile(Of T)(ByVal filename As String) As T
    Using reader = XmlReader.Create(filename)
      Dim deserializer As XmlSerializer = New XmlSerializer(GetType(T))
      Return CType(deserializer.Deserialize(reader), T)
    End Using
  End Function
End Class

So, that's the easy solution and it should be applicable as long as the structure is rather small (a few thousand objects) and public access to the objects that make up the model is not a problem.

But what can we do, if we want to hide the implementation details of the XML de/serialization, because this is part of a larger project and we need to minimize room for error? Or what about a project where we need more control over the de/serialization process? This is where the IXmlSerializable interface and the DataContractSerializer come to the rescue as we will see in the developments of the next sections.

Implementing IXmlSerializable

The IXmlSerializable interface [7] must be implemented by the classes that represent the data model. There are then service classes, such as, the XMLSerializer or the DataContractSerializer, which use the defined interface methods to drive the serialization process for each object. The interface requires 3 methods to be implemented:

public interface IXmlSerializable
{
  XmlSchema GetSchema ();
  void ReadXml ( XmlReader reader );
  void WriteXml ( XmlWriter writer );
}

Public Interface IXmlSerializable

  Function GetSchema() As XmlSchema
  Sub ReadXml(ByVal reader As XmlReader)
  Sub WriteXml(ByVal writer As XmlWriter)

End Interface

The IXmlSerializable interface requires that each class of the model implements a ReadXml, WriteXml, and GetSchema method. The GetSchema method is required, but is already implemented, if it returns null. The other two methods require an implementation and are not so straight forward, especially when reading XML, because the read process should be flexible enough to handle different situations, but still be good enough to read everything correctly.

The de/serialization process through the IXmlSerializable interface can be handled by more than one class. A well known class that has been around since .Net 1.1 is the XMLSerializer class while the more recently added DataContractSerializer was added in .Net 3.5.

The XMLSerializer requires that:

the modelling class is present as a public class,
with an internal constructor,
and property getters/setters can be private (properties are not required).

This implementation can be verified in the IXmlSerializable_V1.zip sample code where a first prototype like implementation is based on the IXmlSerializable interface with a collection that is persisted using a Count attribute on the RootItems and Children collections. This Count attribute is used upon reading XML in each ReadXml() method to load each child entry in turn.

The simplified solution for handling collection data in XML as shown in IXmlSerializable_V1.zip is improved upon in the demo code from IXmlSerializable_V2.zip where we read and write items in each collection without using the Count attribute. This is possible, because the XmlReader lets us see, if we are at the end of the XML collection or at the beginning of the next child element:

public void ReadXml(System.Xml.XmlReader reader) // ModelRoot class
{
  Name = reader.GetAttribute("Name");

  Version = int.Parse(reader.GetAttribute("Version"));
  MinorVersion = int.Parse(reader.GetAttribute("MinorVersion"));

  reader.ReadStartElement();
  reader.MoveToContent();
  while (reader.NodeType == System.Xml.XmlNodeType.Whitespace)
      reader.Read();

  if (reader.NodeType != System.Xml.XmlNodeType.EndElement)
  {
    reader.ReadStartElement("RootItems");

    reader.MoveToContent();
    while (reader.NodeType == System.Xml.XmlNodeType.Whitespace)
        reader.Read();

    if (reader.NodeType != System.Xml.XmlNodeType.EndElement)
    {
      var nodeSer = new XmlSerializer(typeof(Node));
      while (reader.NodeType != System.Xml.XmlNodeType.EndElement)
      {
        var nextNode = (Node)nodeSer.Deserialize(reader);
        _RootItems.Add(nextNode);

        while (reader.NodeType == System.Xml.XmlNodeType.Whitespace)
          reader.Read();
      }
      reader.ReadEndElement();
    }
    reader.ReadEndElement();
  }
}

Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements IXmlSerializable.ReadXml

  Name = reader.GetAttribute("Name")
  Version = Integer.Parse(reader.GetAttribute("Version"))
  MinorVersion = Integer.Parse(reader.GetAttribute("MinorVersion"))

  reader.ReadStartElement()
  reader.MoveToContent()

  While reader.NodeType = System.Xml.XmlNodeType.Whitespace
      reader.Read()
  End While

  If reader.NodeType <> System.Xml.XmlNodeType.EndElement Then
    reader.ReadStartElement("RootItems")
    reader.MoveToContent()
    While reader.NodeType = System.Xml.XmlNodeType.Whitespace
        reader.Read()
    End While

    If reader.NodeType <> System.Xml.XmlNodeType.EndElement Then
      Dim nodeSer = New XmlSerializer(GetType(Node))

      While reader.NodeType <> System.Xml.XmlNodeType.EndElement
        Dim nextNode = CType(nodeSer.Deserialize(reader), Node)
        _RootItems.Add(nextNode)

        While reader.NodeType = System.Xml.XmlNodeType.Whitespace
            reader.Read()
        End While
      End While

      reader.ReadEndElement()
    End If

    reader.ReadEndElement()
  End If
End Sub

The var nodeSer = new XmlSerializer(typeof(Node)); statement in the above listing is necessary, because the XmlSerializer should be used to write the opening and closing tag of each child node.

The solution in IXmlSerializable_V2.zip also moves the model data classes into a seperate assembly and implements interfaces to test how far access to these classes can be restricted:

We can now have private property setters and internal class constructors for the Node and ModelRoot classes, which still have to be public to function correctly with the IXmlSerializable interface implementation and the IXmlSerializer service class.

This means that clients of the TreeModelLib library can see the modelling classes but they can no longer create these classes and cannot manipulate data items without having to go through the defined interfaces.

So, hidding implementation details with the XMLSerializer cannot be done completely (although we get pretty close). The DataContractSerializer [9], on the other hand, can also work with an internal class model and private property set accessors when implementing the IXmlSerializable interface (see sample download V3_DataContractSerializer.zip). That is, we can use the DataContractSerializer with the IXmlSerializable interface on the modelling class to completely hide implementation details of saving and loading data via XML.

Both service classes, XMLSerializer and DataContractSerializer [9], do not implement exactly the same behavior. One difference I noticed was that the DataContractSerializer yields System.Xml.XmlNodeType.Whitespace more frequently, while the XMLSerializer is happy to browse from one tag StartElement to the other, never mentioning the white spaces in-between. These details are not difficult to handle, when it has become clear where the difference in behaviour is, but it is important to carefully debug existing code when changing from one serializer to the other.

The remainder of this article reviews the details of the DataContractSerializer implementation, while the XMLSerializer implementation is covered with the attached downloads - mainly because the XMLSerializer cannot implement a hidden class model and the DataContractSerializer is a more recent class that should be considered as preference over the XMLSerializer since it offers the same features plus more advanced functions.

The DataContractSerializer

DataContractSerializer_V1.zip

VB_DataContractSerializer_V1.zip

The DataContractSerializer [9] implementation in DataContractSerializer_V1.zip is very identical to the previously discussed IXmlSerializable_V2.zip sample. The main difference is that we use DataContractSerializer insetad of the XmlSerializer. This requires a reference to the System.Runtime.Serialization assembly and a using statement with that namespace. We can replace the previously used Storage class with the DataContractStorage class to initiate each de/serialization process. We also have to replace the previously stated pattern for reading and writing child items with a pattern that is specific to the DataContractSerializer:

public void ReadXml(System.Xml.XmlReader reader)
{
...
  reader.ReadStartElement("Children");

  reader.MoveToContent();
  while (reader.NodeType == System.Xml.XmlNodeType.Whitespace)
    reader.Read();

  if (reader.NodeType != System.Xml.XmlNodeType.EndElement)
  {
    while (reader.NodeType != System.Xml.XmlNodeType.EndElement)
    {
      var dataContractSerializer = new DataContractSerializer(typeof(Node));
      var nextNode = (Node)dataContractSerializer.ReadObject(reader);
      _ChildItems.Add(nextNode);

      while (reader.NodeType == System.Xml.XmlNodeType.Whitespace)
          reader.Read();
    }
    reader.ReadEndElement();
  }
...
}

Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements IXmlSerializable.ReadXml
...
  reader.MoveToContent()

  While reader.NodeType = System.Xml.XmlNodeType.Whitespace
      reader.Read()
  End While

  If reader.NodeType <> System.Xml.XmlNodeType.EndElement Then
    reader.ReadStartElement("Children")
    reader.MoveToContent()

    While reader.NodeType = System.Xml.XmlNodeType.Whitespace
        reader.Read()
    End While

    If reader.NodeType <> System.Xml.XmlNodeType.EndElement Then

      While reader.NodeType <> System.Xml.XmlNodeType.EndElement
        Dim dataContractSerializer = New DataContractSerializer(GetType(Node))
        Dim nextNode = CType(dataContractSerializer.ReadObject(reader), Node)

        _ChildItems.Add(nextNode)
        While reader.NodeType = System.Xml.XmlNodeType.Whitespace
            reader.Read()
        End While
      End While

      reader.ReadEndElement()
    End If

    reader.ReadEndElement()
...
End Sub

public void WriteXml(System.Xml.XmlWriter writer)
{
...
  writer.WriteStartElement("Children");
  foreach (var item in Children)
  {
    var dataContractSerializer = new DataContractSerializer(typeof(Node));
    dataContractSerializer.WriteObject(writer, item);
  }
writer.WriteEndElement();
...
}

Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements IXmlSerializable.ReadXml
...
writer.WriteStartElement("Children")

For Each item In Children
Dim dataContractSerializer = New DataContractSerializer(GetType(Node))
dataContractSerializer.WriteObject(writer, item)
Next

writer.WriteEndElement()
...
End Sub

This is all that is required towards using an XML serializer with a completely hidden data model implementation. All solutions discussed so far in this article are still naive with respect to the usage of an XML Schema Definition (XSD), which is typically used to ensure that all data items match an expected consistency. The next section evaluates that point with the DataContractSerializer to make things even more robust towards failures that usually occur in production.

Making Things Bullet Proof

This section discusses how we can use an XML Schema Definition (XSD) [10] with a DataContractSerializer [9] to ensure that the consistency of transfered data items meats expectations in production. The XsdDataContract.zip solution contains 2 projects that where directly drawn from the referenced MSDN articles.

The XsdDataContractExporter project shows how the DataContractSerializer [9] can be used to create a DataContract based on a given data model. The exported XSD is not much to talk about in terms of detail and so forth but maybe this detail is useful to someone else.

The XmlSchemaSet_Sample shows how an XSD file representation can be used to control the parsing process when reading XML with an XmlReader. The project shows that the XmlReaderSettings class can contain multiple schema definitions (short schemas or XSDs) which in turn can be used to initialize an XmlReader. The XmlReader can then use the call back function to report any errors or warnings, if they are necessary.

The lessons learned in the XmlSchemaSet_Sample project are applied in the DataContractSerializer_V2.zip sample. This sample has extended read XML method signatures to accommodate for the additional schemas:

public static IModelRoot ReadXmlFromString<t>(string input
                                            , XmlSchemaSet schemas = null
                                            , ValidationEventHandler validationCallBack = null)

public static IModelRoot ReadXmlFromString<t>(string input
                                            , XmlSchemaSet schemas = null
                                            , ValidationEventHandler validationCallBack = null)

</t></t>

Public Shared Function ReadXmlFromFile(Of T)(ByVal filename As String,
     ByVal Optional schemas As XmlSchemaSet = Nothing,
     ByVal Optional validationCallBack As ValidationEventHandler = Nothing) As IModelRoot

Public Shared Function ReadXmlFromString(Of T)(ByVal input As String,
    ByVal Optional schemas As XmlSchemaSet = Nothing,
    ByVal Optional validationCallBack As ValidationEventHandler = Nothing) As IModelRoot

These schemas are handed off to the XmlReader to report back information whenever the XML does not meet the specified expectations in the TreeModel.xsd file. We can verify this, if we open the TreeModel.xsd file and specify an attribute that is currently not implemented:

XML

<xsd:attributeGroup name ="ModelRootAttribs">
  <xsd:attribute name="Version" default="1" type="xsd:int" />
  <xsd:attribute name="MinorVersion" default="0" type="xsd:int" />
  <xsd:attribute name="Name" type="xsd:string" use="required" />
  <xsd:attribute name="Test" type="xsd:string" use="required" />
</xsd:attributeGroup>

...should produce the following output:

Validation Error: The required attribute 'Test' is missing.
Exception The required attribute 'Test' is missing. at line 2 position 2

Conclusions

The .Net framework also supports the ISerializable interface for serializing objects into a binary format. This form of serialization was not part of this article since it is not interoperable and is very similar to the IXmlSerializable interface. I have not actually tried this, but I would expect that using the IXmlSerializable interface together with a zip container can yield similar performance and space requirements (especially when reading data) as the ISerializable interface can provide. Please have a look at the 04_sqlite_tut.zip demo application attached to this article, if you want to have a performance hint towards reading interoperable XML from a zipped data container.

The support for XML in the .Net framework is huge and the mentioned interfaces and techniques in this article are by no means complete. A tool that is often useful for generating XSDs, or model classes from XSDs, is for example the XML Schema Definition Tool (Xsd.exe) tool. This tool is also useful when it comes to quickly generating an object model with complex constrains, but its application details are certainly the topic of another article. A similar tool, but for the DataContractSerializer is the ServiceModel Metadata Utility Tool (Svcutil.exe), which is also not covered in this article.

The .Net support for XML serialization is so wide that the question, whether something could be done with XML, is quickly replaced by the question: How it could be done. Which in turn leads to the questions of a working code sample. I hope this article sheds some light on these questions and gives you a better overview on the serialization techniques with XML.

Any feedback towards important items that were missed, or improvements that might might be applicable, are as always very welcome.

References

[1] Advanced WPF TreeViews in C#/VB.Net - Part 5
[5] Custom Serialization - Part 2
[6] XmlAttributes
[7] How to Implement IXmlSerializable Correctly
[8] IXmlSerializable Interface
XmlWriter Class
XmlReader Class
[9] Windows Communication Foundation (WCF)
Data Transfer and Serialization
Using XML Schema Import and Export for DataContractSerializer
XsdDataContractExporter Class to generate XSD
XsdDataContractImporter Class
[10] Data Contract Schema Reference

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)