Introduction and Background
Serialization of application/configuration data or object states and reconstructing in time and space is one of the most common tasks in software development. Sometimes, the data to be serialized is of a simple structure, sometimes it's nasty complex.
Besides proprietary formats and mechanisms the .NET Framework provides mainly three classes to serialize and deserialize objects:
XmlSerializer
(System.Xml.Serialization
namespace) BinaryFormatter
(System.Runtime.Serialization.Formatters.Binary
namespace) SoapFormatter
(System.Runtime.Serialization.Formatters.Soap
namespace)
Unfortunately, all of them have their limitations. For example, the XmlSerializer
does not support collections, the BinaryFormatter
and the SoapFormatter
serialize only classes marked as serializable. Further on, objects serialized with the BinaryFormatter
cannot be edited manually (sometimes desirable). Another aspect is the possibility to serialize a group of objects in one document.
The article "A Simple Serializer / Deserializer" by Marc Clifton on CodeProject inspired me to write a deep XML-serializer and deserializer that match my specific needs.
This article targets deep XML-serialization, supporting complex classes, enumerations, structs, collections, arrays, generic types, binary data, and some more. The classes XmlSerializer
and XmlDeserializer
described in this article "do not claim to be complete or perfect"! Rather, they try to show one more approach to solving the problem of deep XML serialization.
Using the Code
XmlSerializer
and XmlDeserializer
are based on reflection including recursive calls. How it works in short: Loop the properties of an object, determine their values, type names, and assembly names, and write them to XmlNode
s (public fields are not considered). Properties of complex types are processed in the same way, and are nested in their parent class' XmlNode
s. Collections and arrays are also looped item by item, considering each item type (complex, collection, array). Follow the code, it is commented.
The XML Format
I decided to name each element by its role. A collection of properties is called properties
, a single property is called property
. Elements of collections are called item
, and are nested in an items
element. The root element of a serialized object is the object
element. Each property
or item
element contains the attributes type
and assembly
to describe its Type
for deserialization purposes. The name attribute of a property is called name
, and the value
of a property or item is the element's value. The following example will give you an impression of the structure (note that the assembly attributes are simplified - they are left blank):
<object name="" type="XmlSerializerDemo.ComplexDummyClass" assembly="">
<properties>
<property name="Name" type="System.String"
assembly="">CodeProject</property>
<property name="Number" type="System.Int32"
assembly="">1234</property>
<property name="Value"
type="System.Collections.Hashtable" assembly="">
<items>
<item>
<properties>
<property name="Key" type="System.String"
assembly="">my super key</property>
<property name="Value" type="System.Double"
assembly="">100.4512</property>
</properties>
</item>
<item>
<properties>
<property name="Key" type="System.String"
assembly="">Klaus</property>
<property name="Value" type="System.Int32"
assembly="">1234</property>
</properties>
</item>
</items>
<property>
</properties>
</object>
Naming of XML Tags
As some may not agree with the naming conventions, a little flexibility was added: by implementing a custom IXmlSerializationTag
implementation, the XML nodes can be arbitrary named. If you wrote a custom IXmlSerializationTag
implementation, just set an instance of this implementation to the TagLib
property of the XmlSerializer
AND XmlDeserializer
.
Caution: Always use the same IXmlSerializationTag
implementation for serialization and deserialization!
Since this feature is not absolutely necessary and is likely to cause problems, there should be no need to use it. For some or the other reason, it was implemented.
The Type Dictionary
As you may see from the example above, the type information (Type
and Assembly
) are stored for each property and it's values. Of course, this has an impact on the file size. Imagine a to-be serialized String
array of 10,000 items: the same information about type and assembly will be written 10,000 times. Obvious that this not optimal. The solution is a type dictionary.
During serialization each Type
found is (internally) added to a dictionary and gets a unique key. Instead of the complete type information, this key will be written in the type
attribute of a property's XmlNode
. The assembly
attribute is left blank. When all properties are processed, this type dictionary is appended to the object's root node.
The usage of this type dictionary is an optional, but advised property of the XmlSerializer
.
At deserialization, the XmlDeserializer
checks whether the object's root node contains a type dictionary and, if this is the case, deserializes the type dictionary first of all. During deserialization, it tries to resolve the properties' Type
s from the dictionary. In case of failure, it tries to resolve the Type
s directly from the type
and assembly
attributes.
The type information is stored in TypeInfo
objects which hold the type name as well as the assembly name. The type dictionary is serialized in the same way like the objects
properties. The following example shows the simplified structure.
<object name="" type="TK0" assembly="">
<items>
<item name="0" type="TK1" assembly="">Item 0</item>
<item name="1" type="TK1" assembly="">Item 1</item>
<item name="9999" type="TK1" assembly="">Item 9999</item>
</items>
<typedictionary name="" type="System.Collections.Hashtable" assembly="">
<items>
<item>
<properties>
<property name="Key"
type="System.String"
assembly="">TK0</property>
<property name="Value"
type="Yaowi.Common.Serialization.TypeInfo"
assembly="">
<properties>
</properties>
</property>
</properties>
</item>
<item>
<properties>
<property name="Key"
type="System.String"
assembly="">TK1</property>
<property name="Value"
type="Yaowi.Common.Serialization.TypeInfo"
assembly="">
<properties>
</properties>
</property>
</properties>
</item>
</items>
</typedictionary>
</object>
In my tests, the filesize could be decreased up to 50% by using a type dictionary.
The usage of a type dictionary is optional and can be set by the XmlSerializer
s UseTypeDictionary
property. It is set to true
by default.
Circular References
Circular references are annoying and can cause infinite loops. Unfortunately they appear more than rarely. Just think about a System.Windows.Forms
container control (e.g. Form
) and its Controls
collection property which holds references to all nested controls. That's one direction. The opposite direction is the controls Parent
property. The parent references its children and the children reference their parent.
Looping all properties, you will meet an instance more than once and process it again and again. To avoid these circular references during serialization, a collection is built in which all processed instances are stored. Before the value of a property is processed, the XmlSerializer
checks whether this instance was processed before. If this is the case, the XmlSerializer
jumps over that property.
This procedure ensures that infinite loops won't occur, but, consequently, circular references cannot be rebuilt by the XmlDeserializer
either (the demo application provides an example for this behavior).
I must admit this is not perfectly solved yet. Solving this problem is a task for the future.
Note: the XmlSerializer
implements the IDisposable
interface to clear the collection of instances. It is recommended to make sure Dispose()
will be called.
Generics
In the earlier versions of this article, I stated that generic types are not supported. That is not the case. Generics are supported and can be serialized as well as deserialized. But they have to implement the ISerializable
and IObjectReference
interfaces (most generic .NET types do so).
The demo includes an example on how to implement this interfaces in custom generic types (note the Serializable
attribute):
[Serializable]
public class GenericDummyClass<T> : ISerializable, IObjectReference
{
private T instanceoftype;
public T InstanceOfType
{
get { return instanceoftype; }
set { instanceoftype = value; }
}
public void GetObjectData(SerializationInfo info, StreamingContext context)
{
info.SetType(typeof(T));
}
public object GetRealObject(StreamingContext context)
{
return this.instanceoftype;
}
}
Binary Data
A property possibly holds binary data or the object to-be serialized itself has a binary format. It is obvious that binary data must be handled differently. But how to determine a Type
contains data which needs binary serialization? E.g. an Image
(Bitmap
) has some properties but at least it has to be serialized in a binary format to restore it properly. Well, what is the indication for binary serialization of a Bitmap
?
I decided to determine this by the existence of a constructor with exactly one parameter of the type byte[]
or Stream
and, the second indication, the object's TypeConverter
can convert to byte[]
or Stream
. To stick to the example, the Bitmap
has the constructor Bitmap(Stream)
and its TypeConverter
(System.Drawing.ImageConverter
) can convert to byte[]
. That should be enough evidence that an instance of this type can be serialized and deserialized binary as byte[]
.
Objects detected as binary are not serialized in the common <property><properties><property>...
manner. Instead they get a <constructor>
tag with the child <binarydata>
which holds the byte[]
of the data - converted to a String
representation encoded with base 64 digits (plus the mandatory type information attributes). Further properties are ignored, assuming they are included in the binary presentation.
To serialize arbitrary binary data, the classes BinaryContainer
and BinaryContainerTypeConverter
were introduced. While the BinaryContainer
functions as a container for binary data (the name says it), the BinaryContainerTypeConverter
is the corresponding TypeConverter
implementation to satisfy the needs specified above.
The BinaryContainerTypeConverter
converts to and from byte[]
or Stream
.
The demo shows how to load an arbitrary binary file into a BinaryContainer
and how it is converted into byte[]
and Stream
. In the demo it is the sample JPEG, but it could be a MP3, MPG, or whatever.
FileStream fs = new FileStream("MH_Wallpaper.jpg", FileMode.Open);
BinaryContainer bc = new BinaryContainer(fs);
TypeConverter tc = TypeDescriptor.GetConverter(bc.GetType());
Stream stream = tc.ConvertTo(bc, typeof(Stream));
byte[] barr = ConvertTo(bc, typeof(byte[]));
Bitmap bmp = new Bitmap(stream);
XML-serialization is not only meant to serialize to disk, XML is a forwarding mechanism as well. For example, make your photo or MP3 collection accessible by a webservice (even the traffic you will cause won't amuse everyone since - in my tests - the filesize increased immensely up to +70%).
Assembly Versions
From time to time, new versions of assemblies are released. Serialization also means that you can store an object on disk for an unlimited time or that an object can be distributed to another machine. Thus, you can never be sure that at the time or location of deserialization, exactly the same version of the assembly the object was serialized with (still or already) exists. Therefore the exact definition of the assembly to load can lead to errors if this assembly version does not exist.
For example a Color
was serialized with the assembly information:
System.Drawing, Version=1.0.5000.0, Culture=neutral,
PublicKeyToken=b03f5f7f11d50a3a
and at the time or location of deserialization, there is already a newer assembly available:
System.Drawing, Version=2.0.0.0, Culture=neutral,
PublicKeyToken=b03f5f7f11d50a3a
... deserialization will fail with a FileNotFound
exception because an assembly with the specified version could not be found.
If in the example above the assembly information is reduced to the assembly's name only (System.Drawing
), deserialization will succeed. So, the solution could be to reduce the information used for deserialization. Unfortunately, there is another circumstance which forbids this approach: .NET allows the installation of more than one version of an assembly. That means that at deserialization, possibly multiple versions of an assembly are available. In this case the assembly to-be loaded must be exactly specified (Version, Culture, PublicKeyToken), otherwise an exception will be thrown.
As far as I see there is only one way to solve this problem: Trial and error! Thus, the XmlDeserializer
first tries to load the assembly with reduced information and, if this fails, tries to load it with the complete assembly information. If no version fits, there is no way to load the assembly. Deserialization of at least one property will fail.
Of course, this procedure is not satisfying and costs some performance (by caching the loaded assemblies during deserialization, these costs are minimized). But until now it is the only way I found which works in almost all cases.
Class Versions
Sometimes classes have to be modified, properties are added or removed. So it can happen that a serialized Type
does not have identical properties at the time or location of deserialization. Except the SoapFormatter
, all serializers mentioned above tolerate this case. The XmlDeserialzer
tolerates this as well, since it uses Reflection
to determine the available properties.
The SoapFormatter
throws an exception, saying "Wrong number of Members or Member name 'xyz' not found
", but probably this case can be handled by setting the SurrogateSelector
property (I did not test this - there seems to be some coding).
Unknown Assemblies
When an assembly is loaded from a file at runtime (Assembly.LoadFrom("...")
) and an instance of one of its included types is serialized, it is not necessarily ensured that this assembly can be loaded automatically at deserialization. Just think about plug-ins which are loaded at runtime and it is unknown, which assemblies will follow in the future. The applications themselves do not have a reference on the assemblies.
The XmlDeserialzer
provides a method RegisterAssembly
which allows registering assemblies to the internal assembly cache. Due to this mechanism, assemblies referenced at runtime can be used by deserialization. For the plug-in example: just register all found plug-in assemblies.
Tolerance/Adjustment
The tolerance and adjustment of the XmlSerializer
and the XmlDeserializer
can be influenced by the following properties:
XmlSerializer.IgnoreSerializableAttribute
- Get
s or set
s whether the ISerializable
attribute is ignored. If set to false
only properties marked as serializable are going to be serialized. Otherwise even "not serializable" properties will be serialized. Default is false
. XmlSerializer.SerializationIgnoredAttributeType
- Get
s or set
s the Attribute
type that, when applied to a property, disables its serialization. If null
every property is serialized. XmlSerializer.IgnoreSerialisationErrors
- Get
s or set
s whether errors during serialization shall be ignored. XmlSerializer.TagLib
- Get
s or set
s the dictionary of XML-tags. If not explicitly set, the default implementation will be used. XmlSerializer.UseTypeDictionary
- Get
s or set
s whether a type dictionary is used to store Type information (see above). XmlDeserializer.IgnoreCreationErrors
- Get
s or set
s whether creation errors shall be ignored. Creation errors can occur if e.g. a type has no parameterless constructor and an instance cannot be instantiated from String
. Or, just simply, an assembly cannot be found. Default is false
. XmlDeserializer.TagLib
- Get
s or set
s the dictionary of XML-tags. If not explicitly set, the default implementation will be used.
Restrictions
During evaluation and testing, I found out that some types in the System.Collections.Specialized
namespace cause problems. While most collections within this namespace can be serialized and deserialized smoothly, two of them behave differently.
Attempting to serialize instances of...
NameValueCollection
...nothing happens StringDictionary
...an exception will be thrown
Since these classes derive from a time without generics, it should be good advice to replace them with types of Dictionary<String, String>
(even the NameValueCollection
has this strange behavior if a key is added multiple times; I was never really convinced of this feature). I am not planning to make any efforts to find a solution for this misbehavior.
Circular references cannot be rebuilt! The solution could probably be to register referencing properties and to serialize/deserialize them. Perhaps simple, but a hard slog.
As far as I know these are the only restrictions, but if you find out some other - feel free to let me know.
Disadvantages/Advantages
Due to the intensive use of Reflection
, the solution of this article does not provide the highest performance. The SoapFormatter
or the BinaryFormatter
provide a remarkable higher performance. The files generated by the XmlDeserializer
are larger for small serialization operations. Not considering the BinaryFormatter
which generates the smallest output (down to 10%?), the files generated by the XmlDeserializer
seem to be up to 100% larger than files generated by the SoapFormatter
for "small objects" (62/36 KB for the "String Array with 1.000 items" test). But that's relative: The "Hashtable (String, SimpleDummyClass)
with 1,000 items" test shows a relation of 509/500 KB. Nearly equal.
Except the not-yet realized feature of rebuilding circular references and (willingly) not implemented feature of serialization of fields, the XmlSerializer
and XmlDeserializer
provide the combination of almost all features provided by the other serializers. Plus some more like unknown assemblies or merging serialized objects into one document.
Using the Classes
To serialize an arbitrary object to an XML-file, it needs two lines of code:
ComplexDummyClass dummy = new ComplexDummyClass();
XmlSerializer xs = new XmlSerializer();
xs.Serialize(dummy, filename);
To deserialize an arbitrary object from an XML-file, it also needs only two lines of code:
XmlDeserializer xd = new XmlDeserializer();
Object obj = xd.Deserialize(filename);
And if you are sure of what Type the object will be:
XmlDeserializer xd = new XmlDeserializer();
ComplexDummyClass dummy = (ComplexDummyClass)xd.Deserialize(filename);
Saving the object directly to disk is just one overload of the serialize method. Other overloads allow to append the object's XML elements to existing XmlDocument
s or XmlNode
s.
The following example shows how to append an object's XML elements to an XmlNode
:
ComplexDummyClass dummy = new ComplexDummyClass();
XmlDocument doc = new XmlDocument();
XmlNode root = doc.CreateElement("configuration");
doc.AppendChild(root);
XmlSerializer xs = new XmlSerializer();
xs.Serialize(dummy, "dummyconfig", root);
This method allows you to store multiple objects in one XmlDocument
. For example, you can save your database and GUI settings in one file to reload them - typesafe and strictly separated - from this file at the startup of your application.
The Deserialize
method provides various overloads to deserialize from XmlNode
s or XmlDocument
s, as well.
How to register an assembly at runtime is demonstrated here:
XmlDeserializer xd = new XmlDeserializer();
Assembly a = Assembly.LoadFrom(@"..\..\DummyExternalAssembly.dll");
xd.RegisterAssembly(a);
Object obj = xd.Deserialize(filename);
To exclude a property from serialization define an (arbitrary) Attribute
to that property and set this Attribute
s Type
to the XmlSerializer
s SerializationIgnoredAttributeType
property:
public class SimpleDummyClass
{
[XmlIgnore]
public String Name
{
get { return this.name; }
set { this.name = value; }
}
}
...
XmlSerializer xd = new XmlSerializer();
xd.SerializationIgnoredAttributeType = typeof(XmlIgnoreAttribute);
Now serialization will ignore the SimpleDummyClass
' Name
property.
Note: XmlSerializer
s SerializationIgnoredAttributeType
accepts all kind of types, but if the assigned Type is not an Attribute
this setting will be ignored.
The Demo Application
The sample application lets you create instances of the sample classes SimpleDummyClass
, ComplexDummyClass
and ComplexDummyClassExt
which extend the ComplexDummyClass
with a property GraphicsPath
which is not marked as serializable (resulting in a BinaryFormatter
exception at serialization), as well as examples for circular references, arrays, Hashtable
s and generics. The created instances will be set to a PropertyGrid
to display its properties. I did not invest efforts to display collection values in the PropertyGrid
, but there are articles on CodeProject which explain how to do so.
A TabControl
contains commands to serialize the PropertyGrid
s selected object and to deserialize objects from file for all mentioned serializers.
The to-be serialized and deserialized type ComplexDummyClassExt
(item EXTENDED ComplexDummyClass
) is the culminating complex class containing various sample properties. The properties are these:
ArrayList
(items of different types) System.Windows.Forms.ArrowDirection
DecimalNumber
(Decimal)
IDictionary
(set to a generic dictionary at runtime) Color
GenericDummyClass<String>
(a custom generic class) GraphicsPath
(not marked as serializable) Hashtable
(items of different types) HybridDictionary
(in Collections.Specialized
namespace) Image
(binary) Name
(String) Number
(int) ObjectArray
(object[]) SimpleDummyClass
(a simple custom class ) Stream
(binary, Bitmap loaded from file)
The assembly Demo\XmlSerializerDemo\XmlSerializerDemo\DummyExternalAssembly.dll contains a class ExternalDummy
(code not provided) which is used to demonstrate an unknown assembly loaded at runtime.
To demonstrate a modified class, the class SimpleDummyClass
is prepared with a property AdditionalProperty
. To modify this class, just comment or uncomment this property between serialization and deserialization.
How to serialize an arbitrary binary file and wrap it into a XmlBinaryContainer
demonstrates a Binary File (wrapped in a XmlBinaryContainer
).
A control
can be serialized as well (item Control
), but note, Control
s contain the property Controls
which is read-only and, therefore, will not be serialized. That is why Windows.Forms
(which inherit from Control
) cannot be serialized with its controls.
In a lot of cases, the standard serialization mechanisms will throw exceptions because not all kinds of objects can be serialized or deserialized with all serializers. But this may be a helpful capability comparison of the various serializers.
The demo application uses my GenericTypeConverter<T>
class for display purposes of nested classes in the PropertyGrid
. Since this class is not the subject of this article, it is mentioned but not explained (perhaps worth a short article.... later!).
The SimpleDummyClass
has a property Dummy
of the type SimpleDummyClass
which the attribute XmlIgnore
is applied to. By checking the "Set Ignore Attribute" checkbox this attribute is set to the XmlSerializer
's SerializationIgnoredAttributeType
property so this property will not be serialized.
Points of Interest
Rebuilding circular references: The problem is not solved, yet. The solution could be to serialize indexes of referencing properties. Any suggestions?
I would like to "apologize" for naming the serializer like I did: XmlSerializer
. Despite the naming overlap with the .NET class, I decided to call it in a manner which expresses its purpose. In an environment of thousands of classes, it is going to be impossible to find unique and expressive names soon. .NET supports import aliases, thus, it should be quite easy to avoid conflicts. In the demo application, I used the following alias:
using SYSTEMXML = System.Xml.Serialization;
...
SYSTEMXML.XmlSerializer serializer = new SYSTEMXML.XmlSerializer();
History
- 20-09-2006 - First published
- 04-10-2006 - Fixed a bug (if the to-be serialized object is a collection...). Extended the demo
- 12-10-2006 - Some changes in
XmlSerializer
and XmlDerserializer
XmlSerializer
: Avoided circular calls XmlSerializer
: IDisposable
interface implemented XmlSerializer
: Optional ignorance of the Serializable
attribute implemented XmlDerserializer
: Optional ignorance of creation errors implemented
- 20-10-2006 - Major enhancements!
- Optional type dictionary
XmlDerserializer
: Assembly version independence XmlDerserializer
: IDisposable
interface implemented XmlDerserializer
: Caching of loaded assemblies XmlDerserializer
: Registering custom (unknown) assemblies - Extended the demo
- Article updated to match the enhancements
- 01-04-2007 - Code enhancements, article updated
IXmlSerializationTag
introduced XmlSerializer
and XmlDerserializer
: TagLib
property added - Extended the demo (especially generics)
- Article updated due to new insights (e.g. generics,
Collections.Specialized
)
- 12-04-2007 - Support for binary data, article revised and updated
BinaryContainer
and BinaryContainerTypeConverter
introduced
- 10-12-2007 - Exclusion of properties by Attribute introduced, some code improvements
XmlSerializer
: SerializationIgnoredAttributeType
property added