Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / XML

A Deep XmlSerializer, Supporting Complex Classes, Enumerations, Structs, Collections, and Arrays

4.89/5 (41 votes)
15 Dec 2007CPOL16 min read 3   3.1K  
A deep XmlSerializer, supporting complex classes, enumerations, structs, collections, generics, and arrays

Introduction and Background

Serialization of application/configuration data or object states and reconstructing in time and space is one of the most common tasks in software development. Sometimes, the data to be serialized is of a simple structure, sometimes it's nasty complex.

Besides proprietary formats and mechanisms the .NET Framework provides mainly three classes to serialize and deserialize objects:

  • XmlSerializer (System.Xml.Serialization namespace)
  • BinaryFormatter (System.Runtime.Serialization.Formatters.Binary namespace)
  • SoapFormatter (System.Runtime.Serialization.Formatters.Soap namespace)

Unfortunately, all of them have their limitations. For example, the XmlSerializer does not support collections, the BinaryFormatter and the SoapFormatter serialize only classes marked as serializable. Further on, objects serialized with the BinaryFormatter cannot be edited manually (sometimes desirable). Another aspect is the possibility to serialize a group of objects in one document.

The article "A Simple Serializer / Deserializer" by Marc Clifton on CodeProject inspired me to write a deep XML-serializer and deserializer that match my specific needs.

This article targets deep XML-serialization, supporting complex classes, enumerations, structs, collections, arrays, generic types, binary data, and some more. The classes XmlSerializer and XmlDeserializer described in this article "do not claim to be complete or perfect"! Rather, they try to show one more approach to solving the problem of deep XML serialization.

Using the Code

XmlSerializer and XmlDeserializer are based on reflection including recursive calls. How it works in short: Loop the properties of an object, determine their values, type names, and assembly names, and write them to XmlNodes (public fields are not considered). Properties of complex types are processed in the same way, and are nested in their parent class' XmlNodes. Collections and arrays are also looped item by item, considering each item type (complex, collection, array). Follow the code, it is commented.

The XML Format

I decided to name each element by its role. A collection of properties is called properties, a single property is called property. Elements of collections are called item, and are nested in an items element. The root element of a serialized object is the object element. Each property or item element contains the attributes type and assembly to describe its Type for deserialization purposes. The name attribute of a property is called name, and the value of a property or item is the element's value. The following example will give you an impression of the structure (note that the assembly attributes are simplified - they are left blank):

XML
<object name="" type="XmlSerializerDemo.ComplexDummyClass" assembly="">
  <properties>
    <property name="Name" type="System.String"
              assembly="">CodeProject</property>
    <property name="Number" type="System.Int32"
              assembly="">1234</property>
    <property name="Value"
              type="System.Collections.Hashtable" assembly="">
      <items>
        <item>
          <properties>
            <property name="Key" type="System.String"
                      assembly="">my super key</property>
            <property name="Value" type="System.Double"
                      assembly="">100.4512</property>
          </properties>
        </item>
        <item>
          <properties>
            <property name="Key" type="System.String"
                      assembly="">Klaus</property>
            <property name="Value" type="System.Int32"
                      assembly="">1234</property>
          </properties>
        </item>
      </items>
    <property>
  </properties>
</object>

Naming of XML Tags

As some may not agree with the naming conventions, a little flexibility was added: by implementing a custom IXmlSerializationTag implementation, the XML nodes can be arbitrary named. If you wrote a custom IXmlSerializationTag implementation, just set an instance of this implementation to the TagLib property of the XmlSerializer AND XmlDeserializer.
Caution: Always use the same IXmlSerializationTag implementation for serialization and deserialization!

Since this feature is not absolutely necessary and is likely to cause problems, there should be no need to use it. For some or the other reason, it was implemented.

The Type Dictionary

As you may see from the example above, the type information (Type and Assembly) are stored for each property and it's values. Of course, this has an impact on the file size. Imagine a to-be serialized String array of 10,000 items: the same information about type and assembly will be written 10,000 times. Obvious that this not optimal. The solution is a type dictionary.

During serialization each Type found is (internally) added to a dictionary and gets a unique key. Instead of the complete type information, this key will be written in the type attribute of a property's XmlNode. The assembly attribute is left blank. When all properties are processed, this type dictionary is appended to the object's root node.

The usage of this type dictionary is an optional, but advised property of the XmlSerializer.

At deserialization, the XmlDeserializer checks whether the object's root node contains a type dictionary and, if this is the case, deserializes the type dictionary first of all. During deserialization, it tries to resolve the properties' Types from the dictionary. In case of failure, it tries to resolve the Types directly from the type and assembly attributes.

The type information is stored in TypeInfo objects which hold the type name as well as the assembly name. The type dictionary is serialized in the same way like the objects properties. The following example shows the simplified structure.

XML
<object name="" type="TK0" assembly="">
  <items>
    <item name="0" type="TK1" assembly="">Item 0</item>
    <item name="1" type="TK1" assembly="">Item 1</item>

     <!-- To be continued... -->

    <item name="9999" type="TK1" assembly="">Item 9999</item>
  </items>

  <typedictionary name="" type="System.Collections.Hashtable" assembly="">
    <items>

      <item>
        <properties>
          <property name="Key"
                    type="System.String"
                    assembly="">TK0</property>
          <property name="Value"
                    type="Yaowi.Common.Serialization.TypeInfo"
                    assembly="">
             <properties>

               <!-- The type information are stored here -->

             </properties>
          </property>
        </properties>
      </item>

      <item>
        <properties>
          <property name="Key"
                    type="System.String"
                    assembly="">TK1</property>
          <property name="Value"
                    type="Yaowi.Common.Serialization.TypeInfo"
                    assembly="">
             <properties>

                <!-- The type information are stored here -->

             </properties>
          </property>
        </properties>
      </item>

    </items>
  </typedictionary>

</object>

In my tests, the filesize could be decreased up to 50% by using a type dictionary.

The usage of a type dictionary is optional and can be set by the XmlSerializers UseTypeDictionary property. It is set to true by default.

Circular References

Circular references are annoying and can cause infinite loops. Unfortunately they appear more than rarely. Just think about a System.Windows.Forms container control (e.g. Form) and its Controls collection property which holds references to all nested controls. That's one direction. The opposite direction is the controls Parent property. The parent references its children and the children reference their parent.

Looping all properties, you will meet an instance more than once and process it again and again. To avoid these circular references during serialization, a collection is built in which all processed instances are stored. Before the value of a property is processed, the XmlSerializer checks whether this instance was processed before. If this is the case, the XmlSerializer jumps over that property.

This procedure ensures that infinite loops won't occur, but, consequently, circular references cannot be rebuilt by the XmlDeserializer either (the demo application provides an example for this behavior).
I must admit this is not perfectly solved yet. Solving this problem is a task for the future.

Note: the XmlSerializer implements the IDisposable interface to clear the collection of instances. It is recommended to make sure Dispose() will be called.

Generics

In the earlier versions of this article, I stated that generic types are not supported. That is not the case. Generics are supported and can be serialized as well as deserialized. But they have to implement the ISerializable and IObjectReference interfaces (most generic .NET types do so).

The demo includes an example on how to implement this interfaces in custom generic types (note the Serializable attribute):

C#
[Serializable]
public class GenericDummyClass<T> : ISerializable, IObjectReference
{
  private T instanceoftype;

  // Example property
  public T InstanceOfType
  {
    get { return instanceoftype; }
    set { instanceoftype = value; }
  }

  // ISerializable member
  public void GetObjectData(SerializationInfo info, StreamingContext context)
  {
    info.SetType(typeof(T));
  }

  // IObjectReference member
  public object GetRealObject(StreamingContext context)
  {
    return this.instanceoftype;
  }
}

Binary Data

A property possibly holds binary data or the object to-be serialized itself has a binary format. It is obvious that binary data must be handled differently. But how to determine a Type contains data which needs binary serialization? E.g. an Image (Bitmap) has some properties but at least it has to be serialized in a binary format to restore it properly. Well, what is the indication for binary serialization of a Bitmap?
I decided to determine this by the existence of a constructor with exactly one parameter of the type byte[] or Stream and, the second indication, the object's TypeConverter can convert to byte[] or Stream. To stick to the example, the Bitmap has the constructor Bitmap(Stream)and its TypeConverter (System.Drawing.ImageConverter) can convert to byte[]. That should be enough evidence that an instance of this type can be serialized and deserialized binary as byte[].

Objects detected as binary are not serialized in the common <property><properties><property>... manner. Instead they get a <constructor> tag with the child <binarydata> which holds the byte[] of the data - converted to a String representation encoded with base 64 digits (plus the mandatory type information attributes). Further properties are ignored, assuming they are included in the binary presentation.

To serialize arbitrary binary data, the classes BinaryContainer and BinaryContainerTypeConverter were introduced. While the BinaryContainer functions as a container for binary data (the name says it), the BinaryContainerTypeConverter is the corresponding TypeConverter implementation to satisfy the needs specified above.

The BinaryContainerTypeConverter converts to and from byte[] or Stream.

The demo shows how to load an arbitrary binary file into a BinaryContainer and how it is converted into byte[] and Stream. In the demo it is the sample JPEG, but it could be a MP3, MPG, or whatever.

C#
// Loading a binary file and wrap it into a BinaryContainer
FileStream fs = new FileStream("MH_Wallpaper.jpg", FileMode.Open);
BinaryContainer bc = new BinaryContainer(fs);

// Do something with the BinaryContainer: serialize it to disk,
// chase it through the net...
// Eventually, deserialize it to the instance "bc"

// Get the BinaryContainerTypeConverter
TypeConverter tc = TypeDescriptor.GetConverter(bc.GetType());

// Convert the wrapped data
Stream stream = tc.ConvertTo(bc, typeof(Stream));
byte[] barr = ConvertTo(bc, typeof(byte[]));

// E.g. a Bitmap
Bitmap bmp = new Bitmap(stream);

XML-serialization is not only meant to serialize to disk, XML is a forwarding mechanism as well. For example, make your photo or MP3 collection accessible by a webservice (even the traffic you will cause won't amuse everyone since - in my tests - the filesize increased immensely up to +70%).

Assembly Versions

From time to time, new versions of assemblies are released. Serialization also means that you can store an object on disk for an unlimited time or that an object can be distributed to another machine. Thus, you can never be sure that at the time or location of deserialization, exactly the same version of the assembly the object was serialized with (still or already) exists. Therefore the exact definition of the assembly to load can lead to errors if this assembly version does not exist.

For example a Color was serialized with the assembly information:

System.Drawing, Version=1.0.5000.0, Culture=neutral,
                PublicKeyToken=b03f5f7f11d50a3a 

and at the time or location of deserialization, there is already a newer assembly available:

System.Drawing, Version=2.0.0.0, Culture=neutral,
                PublicKeyToken=b03f5f7f11d50a3a 

... deserialization will fail with a FileNotFound exception because an assembly with the specified version could not be found.

If in the example above the assembly information is reduced to the assembly's name only (System.Drawing), deserialization will succeed. So, the solution could be to reduce the information used for deserialization. Unfortunately, there is another circumstance which forbids this approach: .NET allows the installation of more than one version of an assembly. That means that at deserialization, possibly multiple versions of an assembly are available. In this case the assembly to-be loaded must be exactly specified (Version, Culture, PublicKeyToken), otherwise an exception will be thrown.

As far as I see there is only one way to solve this problem: Trial and error! Thus, the XmlDeserializer first tries to load the assembly with reduced information and, if this fails, tries to load it with the complete assembly information. If no version fits, there is no way to load the assembly. Deserialization of at least one property will fail.

Of course, this procedure is not satisfying and costs some performance (by caching the loaded assemblies during deserialization, these costs are minimized). But until now it is the only way I found which works in almost all cases.

Class Versions

Sometimes classes have to be modified, properties are added or removed. So it can happen that a serialized Type does not have identical properties at the time or location of deserialization. Except the SoapFormatter, all serializers mentioned above tolerate this case. The XmlDeserialzer tolerates this as well, since it uses Reflection to determine the available properties.

The SoapFormatter throws an exception, saying "Wrong number of Members or Member name 'xyz' not found", but probably this case can be handled by setting the SurrogateSelector property (I did not test this - there seems to be some coding).

Unknown Assemblies

When an assembly is loaded from a file at runtime (Assembly.LoadFrom("...")) and an instance of one of its included types is serialized, it is not necessarily ensured that this assembly can be loaded automatically at deserialization. Just think about plug-ins which are loaded at runtime and it is unknown, which assemblies will follow in the future. The applications themselves do not have a reference on the assemblies.

The XmlDeserialzer provides a method RegisterAssembly which allows registering assemblies to the internal assembly cache. Due to this mechanism, assemblies referenced at runtime can be used by deserialization. For the plug-in example: just register all found plug-in assemblies.

Tolerance/Adjustment

The tolerance and adjustment of the XmlSerializer and the XmlDeserializer can be influenced by the following properties:

  • XmlSerializer.IgnoreSerializableAttribute - Gets or sets whether the ISerializable attribute is ignored. If set to false only properties marked as serializable are going to be serialized. Otherwise even "not serializable" properties will be serialized. Default is false.
  • XmlSerializer.SerializationIgnoredAttributeType - Gets or sets the Attribute type that, when applied to a property, disables its serialization. If null every property is serialized.
  • XmlSerializer.IgnoreSerialisationErrors - Gets or sets whether errors during serialization shall be ignored.
  • XmlSerializer.TagLib - Gets or sets the dictionary of XML-tags. If not explicitly set, the default implementation will be used.
  • XmlSerializer.UseTypeDictionary - Gets or sets whether a type dictionary is used to store Type information (see above).
  • XmlDeserializer.IgnoreCreationErrors - Gets or sets whether creation errors shall be ignored. Creation errors can occur if e.g. a type has no parameterless constructor and an instance cannot be instantiated from String. Or, just simply, an assembly cannot be found. Default is false.
  • XmlDeserializer.TagLib - Gets or sets the dictionary of XML-tags. If not explicitly set, the default implementation will be used.

Restrictions

During evaluation and testing, I found out that some types in the System.Collections.Specialized namespace cause problems. While most collections within this namespace can be serialized and deserialized smoothly, two of them behave differently.

Attempting to serialize instances of...

  • NameValueCollection ...nothing happens
  • StringDictionary ...an exception will be thrown

Since these classes derive from a time without generics, it should be good advice to replace them with types of Dictionary<String, String> (even the NameValueCollection has this strange behavior if a key is added multiple times; I was never really convinced of this feature). I am not planning to make any efforts to find a solution for this misbehavior.

Circular references cannot be rebuilt! The solution could probably be to register referencing properties and to serialize/deserialize them. Perhaps simple, but a hard slog.

As far as I know these are the only restrictions, but if you find out some other - feel free to let me know.

Disadvantages/Advantages

Due to the intensive use of Reflection, the solution of this article does not provide the highest performance. The SoapFormatter or the BinaryFormatter provide a remarkable higher performance. The files generated by the XmlDeserializer are larger for small serialization operations. Not considering the BinaryFormatter which generates the smallest output (down to 10%?), the files generated by the XmlDeserializer seem to be up to 100% larger than files generated by the SoapFormatter for "small objects" (62/36 KB for the "String Array with 1.000 items" test). But that's relative: The "Hashtable (String, SimpleDummyClass) with 1,000 items" test shows a relation of 509/500 KB. Nearly equal.

Except the not-yet realized feature of rebuilding circular references and (willingly) not implemented feature of serialization of fields, the XmlSerializer and XmlDeserializer provide the combination of almost all features provided by the other serializers. Plus some more like unknown assemblies or merging serialized objects into one document.

Using the Classes

To serialize an arbitrary object to an XML-file, it needs two lines of code:

C#
ComplexDummyClass dummy = new ComplexDummyClass();
// Sample class from the demo

// ... Set some properties
XmlSerializer xs = new XmlSerializer();
xs.Serialize(dummy, filename);

To deserialize an arbitrary object from an XML-file, it also needs only two lines of code:

C#
XmlDeserializer xd = new XmlDeserializer();
Object obj = xd.Deserialize(filename);

And if you are sure of what Type the object will be:

C#
XmlDeserializer xd = new XmlDeserializer();
ComplexDummyClass dummy = (ComplexDummyClass)xd.Deserialize(filename);

Saving the object directly to disk is just one overload of the serialize method. Other overloads allow to append the object's XML elements to existing XmlDocuments or XmlNodes.

The following example shows how to append an object's XML elements to an XmlNode:

C#
ComplexDummyClass dummy = new ComplexDummyClass();

// ...

XmlDocument doc = new XmlDocument();
XmlNode root = doc.CreateElement("configuration");
doc.AppendChild(root);

XmlSerializer xs = new XmlSerializer();
xs.Serialize(dummy, "dummyconfig", root);

This method allows you to store multiple objects in one XmlDocument. For example, you can save your database and GUI settings in one file to reload them - typesafe and strictly separated - from this file at the startup of your application.

The Deserialize method provides various overloads to deserialize from XmlNodes or XmlDocuments, as well.

How to register an assembly at runtime is demonstrated here:

C#
XmlDeserializer xd = new XmlDeserializer();

Assembly a = Assembly.LoadFrom(@"..\..\DummyExternalAssembly.dll");
xd.RegisterAssembly(a);

Object obj = xd.Deserialize(filename);

To exclude a property from serialization define an (arbitrary) Attribute to that property and set this Attributes Type to the XmlSerializers SerializationIgnoredAttributeType property:

C#
public class SimpleDummyClass
{
  // Define an Attribute
  [XmlIgnore]
  public String Name
  {
    get { return this.name; }
    set { this.name = value; }
  }
}

...

XmlSerializer xd = new XmlSerializer();

// Assign the Attribute to the XmlSerializers property
xd.SerializationIgnoredAttributeType = typeof(XmlIgnoreAttribute);

Now serialization will ignore the SimpleDummyClass' Name property.
Note: XmlSerializers SerializationIgnoredAttributeType accepts all kind of types, but if the assigned Type is not an Attribute this setting will be ignored.

The Demo Application

The sample application lets you create instances of the sample classes SimpleDummyClass, ComplexDummyClass and ComplexDummyClassExt which extend the ComplexDummyClass with a property GraphicsPath which is not marked as serializable (resulting in a BinaryFormatter exception at serialization), as well as examples for circular references, arrays, Hashtables and generics. The created instances will be set to a PropertyGrid to display its properties. I did not invest efforts to display collection values in the PropertyGrid, but there are articles on CodeProject which explain how to do so.

A TabControl contains commands to serialize the PropertyGrids selected object and to deserialize objects from file for all mentioned serializers.

The to-be serialized and deserialized type ComplexDummyClassExt (item EXTENDED ComplexDummyClass) is the culminating complex class containing various sample properties. The properties are these:

  • ArrayList (items of different types)
  • System.Windows.Forms.ArrowDirection
  • DecimalNumber (Decimal)
  • IDictionary (set to a generic dictionary at runtime)
  • Color
  • GenericDummyClass<String> (a custom generic class)
  • GraphicsPath (not marked as serializable)
  • Hashtable (items of different types)
  • HybridDictionary (in Collections.Specialized namespace)
  • Image (binary)
  • Name (String)
  • Number (int)
  • ObjectArray (object[])
  • SimpleDummyClass (a simple custom class )
  • Stream (binary, Bitmap loaded from file)

The assembly Demo\XmlSerializerDemo\XmlSerializerDemo\DummyExternalAssembly.dll contains a class ExternalDummy (code not provided) which is used to demonstrate an unknown assembly loaded at runtime.

To demonstrate a modified class, the class SimpleDummyClass is prepared with a property AdditionalProperty. To modify this class, just comment or uncomment this property between serialization and deserialization.

How to serialize an arbitrary binary file and wrap it into a XmlBinaryContainer demonstrates a Binary File (wrapped in a XmlBinaryContainer).

A control can be serialized as well (item Control), but note, Controls contain the property Controls which is read-only and, therefore, will not be serialized. That is why Windows.Forms (which inherit from Control) cannot be serialized with its controls.

In a lot of cases, the standard serialization mechanisms will throw exceptions because not all kinds of objects can be serialized or deserialized with all serializers. But this may be a helpful capability comparison of the various serializers.

The demo application uses my GenericTypeConverter<T> class for display purposes of nested classes in the PropertyGrid. Since this class is not the subject of this article, it is mentioned but not explained (perhaps worth a short article.... later!).

The SimpleDummyClass has a property Dummy of the type SimpleDummyClass which the attribute XmlIgnore is applied to. By checking the "Set Ignore Attribute" checkbox this attribute is set to the XmlSerializer's SerializationIgnoredAttributeType property so this property will not be serialized.

Points of Interest

Rebuilding circular references: The problem is not solved, yet. The solution could be to serialize indexes of referencing properties. Any suggestions?

I would like to "apologize" for naming the serializer like I did: XmlSerializer. Despite the naming overlap with the .NET class, I decided to call it in a manner which expresses its purpose. In an environment of thousands of classes, it is going to be impossible to find unique and expressive names soon. .NET supports import aliases, thus, it should be quite easy to avoid conflicts. In the demo application, I used the following alias:

C#
using SYSTEMXML = System.Xml.Serialization;

...

SYSTEMXML.XmlSerializer serializer = new SYSTEMXML.XmlSerializer();

History

  • 20-09-2006 - First published
  • 04-10-2006 - Fixed a bug (if the to-be serialized object is a collection...). Extended the demo
  • 12-10-2006 - Some changes in XmlSerializer and XmlDerserializer
    • XmlSerializer: Avoided circular calls
    • XmlSerializer: IDisposable interface implemented
    • XmlSerializer: Optional ignorance of the Serializable attribute implemented
    • XmlDerserializer: Optional ignorance of creation errors implemented
  • 20-10-2006 - Major enhancements!
    • Optional type dictionary
    • XmlDerserializer: Assembly version independence
    • XmlDerserializer: IDisposable interface implemented
    • XmlDerserializer: Caching of loaded assemblies
    • XmlDerserializer: Registering custom (unknown) assemblies
    • Extended the demo
    • Article updated to match the enhancements
  • 01-04-2007 - Code enhancements, article updated
    • IXmlSerializationTag introduced
    • XmlSerializer and XmlDerserializer: TagLib property added
    • Extended the demo (especially generics)
    • Article updated due to new insights (e.g. generics, Collections.Specialized)
  • 12-04-2007 - Support for binary data, article revised and updated
    • BinaryContainer and BinaryContainerTypeConverter introduced
  • 10-12-2007 - Exclusion of properties by Attribute introduced, some code improvements
    • XmlSerializer: SerializationIgnoredAttributeType property added

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)