Introduction
When I originally sat down and wrote this code, I wanted to write a client/server application. I looked at using XML serialization to send the server state to the client. XML serialization in .NET is extremely flexible, being able to serialize and deserialize almost any .NET type with a lot of control on how the XML is formatted. However, sending the entire server state over a network to the client is inefficient; I wanted a way to send only the differences from the previous state since the last update.
I needed the following features:
- The XML output had to conform to a fixed format with specific naming and ordering conventions, so any two XML sources could be compared for differences quickly and efficiently.
- In an XML difference document, the XML has to store the information about the difference state (Additions, Deletes, Updates, and a No Change placeholder, as well as the previous value that was deleted.)
- The ability to be able to recurse through an existing XML document or data structure, comparing or combining any differences.
I therefore wrote a simple XML serialization library called "Wml
", to support these features, built on and trying to follow the same conventions as .NET's existing XML and serialization classes as much as possible. In the process of writing it, I found that being able to recurse over a Wml
structure, and being able to calculate and combine the differences had a few other uses as well.
In this article, I cover some of the basics of Wml
, how to make .NET types serializable to Wml
, and some uses of the Wml
serialization library.
Wml
The restricted form of XML which the Wml
library serializes to is called "Wml
".
An example:
<Wml>
<DirectReports I="2" V="Wxv.WmlDemo.JobPosition">
<FirstName V="Daniel" />
<Id V="2" />
<LastName V="Taylor" />
</DirectReports>
<FirstName V="Christopher" />
<Id V="0" />
<LastName V="Wilson" />
<zz. I="1" V="Wxv.WmlDemo.JobPosition">
<FirstName V="Isabella" />
<Id V="1" />
<LastName V="Jones" />
</zz.>
</Wml>
An example including difference state:
<Wml>
<JobTitle V="Director" D="Chairperson" S="U" />
<zz. I="1" V="Wxv.WmlDemo.JobPosition" S="A">
<FirstName V="Isabella" S="A" />
<LastName V="Jones" S="A" />
</zz.>
</Wml>
Its basic structure is:
- The root element name is always "
Wml
". - Element names correspond to field or property names, or "
zz
." for collection items. - The two main attributes are:
- "Identity" (I) - An integer representing the unique object identity. Default is
-1
. - "Value" (V) - Either the
string
representation of a value, or full type name for a .NET type. Default is null
.
- Child nodes must be ordered by, and be unique on their name and ID.
- Difference documents use two other attributes:
- "State" (S) - Either "Added" (A), "Deleted" (D), "Updated" (U), or "No Change" (N). Default is "No Change".
- "DeletedValue" (D) - The previous value of the member, used when rolling back any differences. Default is
null
.
Wml
has this fixed structure and constraints so that any two Wml
sources can be compared for differences efficiently.
Like XML:
- It can be held in a DOM, using the class
WmlDocument
, which holds WmlDomNode
s. - It can be read or written to a text source or target using
WmlTextReader
and WmlTextWriter
. - A
Wml
DOM itself can be read or written to using a WmlDomNodeReader
or WmlDomNodeWriter
. - The nodes implement an
IWmlNode
interface, similar in concept to IXPathNavigable
, which allows a WmlNodeReader
or WmlNodeWriter
(which WmlDomNodeReader
or WmlDomNodeWriter
inherit from) to access the nodes without having to know the underlying structure or types in which the Wml
node information is held.
Serializing Types
For a type (class
or struct
) to be serializable, it must implement the IWmlSerializable
interface:
public interface IWmlSerializable
{
int GetHashCode();
}
There is only one method that needs to be implemented in the type, IWmlSerializable.GetHashCode()
, which forces any implementer to override object.GetHashCode()
. This function should return an integer that corresponds to the unique identity of the instance compared to any other instance or a null
value (which is treated as having the default hash code or identity value of -1
) in the place where it is held. XML serialization in .NET does not need this concept, as it's always de-serializing a new data structure from scratch, but it is important in Wml
when merging in differences into an existing structure. The hash code result should not change over the instance lifetime, and its value based on a member value which is also serialized, so it can be reliably recalculated after deserialization.
The simplest way to have a unique identity for an object is to simply have an auto-incrementing integer value assigned to an "ID
" field or property when it's instantiated. It could also be calculated from a unique and constant data value that it holds, or based on the index of the collection or array it's held in (as long as it's the only reference ever held at that index, including null
s).
Like XML serialization, any IWmlSerializable
type must also have a parameter-less constructor defined, so the types can be automatically instantiated during deserialization.
By default, Wml
serialization serializes any non-static public
member (fields or properties) defined on the IWmlSerializable
type whose value can be read and written to. This includes IWmlSerializable
references, and any other "primitive" type whose value can be converted to and from a string
by the .NET class TypeConverter
.
The members you don't want to be serialized (e.g., they hold temporary or derived information) can be marked with the WmlIgnore
attribute.
For example:
public class JobPosition : IWmlSerializable
{
private static int MaxId = 0;
private int id = -1;
public int Id
{
get { if (id == -1) id = MaxId++; return id; }
set { id = value; }
}
public override int GetHashCode()
{
return Id;
}
public string FirstName;
public string LastName;
public string JobTitle;
public DateTime DateStarted;
public enum GenderEnum { Male = 0, Female = 1 }
public GenderEnum Gender;
public JobPosition DirectReports;
[WmlIgnore()]
public int Tag;
}
If a collection type needs to serialize any child data objects that it holds, it can implement the IWmlSerializableCollection
interface:
public interface IWmlSerializableCollection : IWmlSerializable,
IEnumerable
{
IWmlSerializable Get (int id);
void Remove (int id);
void Add (IWmlSerializable item);
}
The id
parameter corresponds to the GetHashCode()
value that the IWmlSerializable
instance returns. IWmlSerializableCollection
instances may not contain null
items, and its enumerator must return the IWmlSerialiable
collection items in "id
" order.
For example:
public class JobPosition : IWmlSerializable, IWmlSerializableCollection {
private SortedList manages = new SortedList();
public IWmlSerializable Get(int id)
{
return Get (id);
}
public void Remove (int id)
{
manages.Remove (id);
}
public void Add (IWmlSerializable item)
{
Add ((JobPosition) item);
}
public IEnumerator GetEnumerator()
{
return manages.Values.GetEnumerator();
}
}
(Though generally you provide type safe versions of these methods and hide the IWmlSerializableCollection
implementation.)
When a type implements IWmlSerializable
and optionally IWmlSerializableCollection
, it allows the Wml
code to build a view over it in a class called WmlSerializableNode
. Like Wml DOM nodes, this class implements IWmlNode
which allows it to be treated the same way by WmlNodeReader
and WmlNodeWriter
as a Wml DOM. The overridden WmlNodeReader
and WmlNodeWriter
classes for IWmlSerializable
instances are WmlSerializableNodeReader
and WmlSerializableNodeWriter
.
Every instance in an IWmlSerializable
should be referenced only once by a serialized member or collection, e.g., no duplicate references to the same instance, or circular reference. This is validated by the WmlSerializableNodeReader
and it raises an exception if you try to serialize an invalid data structure.
WmlSerializer
This is an abstract
class containing static
methods that perform various Wml
related utility functions, mostly utilizing Wml
readers and writers. There are eight kinds of methods (overridden to support WmlDocument
or IWmlSerializable
instances, or Wml
readers and writers). In this list, "Wml Structure" refers to both WmlDocument
and IWmlSerializable
instances:
Equals
- Compare two Wml
structures for equality Compare
- Compare two Wml
structures and produce a Wml
difference document Combine
- Combine a Wml
difference into an existing Wml
structure Copy
- Copies the input from a WmlReader
source to a target WmlWriter
Clone
- Creates a deep copy of a Wml
structure Serialize
- Saves a Wml
structure to a writer or document Deserialize
- Loads a Wml
structure from a reader or document ToString
- Saves a Wml
structure to a string
Functionalities like "Equals
" and "Clone
" are a side effect of having a WmlNodeReader
and WmlNodeWriter
that can recurse through both a DOM and a IWmlSerializable
structure, node by node. This is not possible in .NET XML serialization unless you generate a complete intermediate XML output first. Combining a difference document into an existing IWmlSerializable
instance or a Wml
DOM only adds, removes, or updates any differences; no other part of the data structure is changed.
Transactions
One of the main benefits of being able to calculate the difference between a before and after state is that you can roll back (or forward) any operation on a data object you may make. The only proviso being that the changes must keep the data structure in a valid serializable state, which mostly means that you have to ensure an object has a valid identity or GetHashCode()
result before you add it to an existing data structure.
Rolling an object's state back and forth can be done manually using different methods on the WmlSerializer
class (e.g., Serialize
, Compare
, and Combine
), but the Wml
library makes this simpler on IWmlSerializable
instance using the WmlTransaction
class.
For example:
WmlTransaction transaction = new WmlTransaction (myDataObject,
"my transaction name");
try
{
myDataObject.MakeChanges();
transaction.Commit();
}
catch (Exception)
{
transaction.RollBack();
}
When created, the transaction
object serializes the IWmlSerializable
to a WmlDocument
to hold the "previous state". The differences to the data object are calculated when the transaction is finished, and is used to roll the data object back if "RollBack
" is called and then immediately discarded, or stored to provide a difference history when "Commit
" is called.
Using individual transactions is somewhat inefficient, since the WmlTransaction
has to serialize a copy of the previous state of data object when it is instantiated. If you are performing multiple transactions on your data, a better technique would be to use the WmlTransactionLog
class which will cache and automatically update the previous state on any changes, as well as keep a copy of any committed transaction which you can roll back and forth for Undo/Redo.
For example:
WmlTransactionLog transactionLog = new WmlTransactionLog();
transactionLog.CurrentState = myDataObject;
WmlTransaction transaction1 =
transactionLog.BeginTransaction ("Change 1");
myDataObject.MakeChanges();
WmlTransaction transaction1a =
transactionLog.BeginTransaction ("Change 1 a");
myDataObject.MakeChanges();
transaction1a.RollBack();
transaction1.Commit();
WmlTransaction transaction2 =
transactionLog.BeginTransaction ("Change 2");
myDataObject.MakeChanges();
transaction2.Commit();
transactionLog.RollBack();
transactionLog.RollBack();
transactionLog.RollForward();
transactionLog.RollForward();
As per this example, transactions can be nested provided they are rolled back or committed in the reverse order. Nested transactions are more expensive though, since a new "previous state" has to be calculated, like the cached version the transaction log keeps cannot be used. The transaction log also has a "Modified
" event, which is raised every time its current state changes (e.g., when a transaction is committed), it is useful for knowing when to refresh a user interface.
Demo Application
The sample application illustrates how a simple class "JobPosition
", which was used in previous examples, can be made Wml
serializable. The demo user interface only lets you do one operation to it, "Modify
" which randomly shuffles the data, adding and removing JobPosition
s to the parents collection and "DirectReports
" reference, and modifying the descriptive information.
It demonstrates the following Wml
serialization features:
- Loading (Deserialize) and saving (Serialize) an
IWmlSerializable
data structure to a file using WmlSerializer
. - Keeping a
WmlTransactionLog
instance to provide undo/redo functionality, and to notify the user interface when the data object was modified. - Using "
WmlSerializer.Equals()
" to test the data object against its previous state stored in a Wml
document in the transaction log until it has been modified (because the randomize operation doesn't always change the data structure).
All this with very little code...
Conclusion
The Wml
library was not written to replace .NET XML serialization. It's not that fast, flexible or robust (sorry). For, its intended purpose was to keep track of the changes to data structures, it might not be even as good as a custom solution, since every "Compare
" for differences has to recurse over the entire data structure. It is probably more efficient to build a change log manually as you make your data structure modifications. Also, keeping the difference state for any data that changes a large percentage of its state in every operation is counter productive.
However, the Wml
library does its job with very little support from the programmer. Any type that's currently XML serializable can be made serializable to Wml
without much effort.
Some other uses that I have found for it:
- Deep copies and equality tests.
- Keeping a smart client synchronized by sending only the differences since the last update from the server.
- Allowing validation code to be placed in the data structure itself, rather than having changes pre-validated before they are applied, since if there are any validation errors, it can be rolled back.
- Undo/Redo functionality in applications.
- Building multi-level modal dialogs that work safely on the main data object (because the main data object changes can be rolled back, or changes on a cloned structure merged in.)
- Change logs.
I developed the Wml
library on version 1.1 of the .NET Framework, and I have not tested it on generic types in version 2.0 of the .NET Framework yet. I don't foresee any problems, as long as reflection on generic types behaves consistently even on non-generic types, though I cannot confirm this.
Acknowledgements
- Chris Beckett, for his article and code on the use of his nifty custom extender class for menu images for the demo.
- Marc Clifton, for his simple serializer/deserializer article, for pointing me to use
TypeConverter
(I had been using reflection to clumsily look for a ToString()
method and a static Parse()
method on a primitive type before.)
History
- 26th January, 2006 - Version 1.0
License
This article has no explicit license attached to it, but may contain usage terms in the article text or the download files themselves. If in doubt, please contact the author via the discussion board below.
A list of licenses authors might use can be found here.