(untagged)

A Fast Serialization Technique

Tim Haynes

0.00/5 (No votes)

19 May 2006

Transparently boosting serialization performance and shrinking the serialized object's size.

Download source - 4.62 Kb

Introduction

Serialization is everywhere in .NET. Every parameter you pass to or from a remoted object, web service, or WCF service gets serialized at one end and deserialized at the other. So why write about fast serialization? Surely, the standard BinaryFormatter and SoapFormatter are pretty quick, aren't they?

Well, no. When passing a reasonably substantial object from one process to another using Remoting, we find that performance was topping out at 300 calls per second. Investigation showed that each serialization/deserialization cycle was taking 360 microseconds, which would be fine except that 300 per second means that 11% of the CPU is being consumed by the serialization alone!

Background

Some form of custom serialization would be an option. An object knows exactly what types of what fields it wants to serialize. It doesn't need all the general purpose overheads and Reflection to work this out and extract the data - it can do it all by itself, much more efficiently. The result is generally much more compact. There is an example in .Shoaib's article, which demonstrates these benefits.

The problem with custom serialization is that the interface is different, requiring the calling code to be changed. It also doesn't help the automated serialization in .NET's remote access mechanisms, unless you manually serialize to a byte array and then pass this as a parameter. This isn't very type-safe!

What I cover below is a simple way to retain the benefits of custom serialization, while retaining the standard serialization interface and all the benefits that confers.

Using the code

As is often the case in matters of complex serialization, the solution lies in implementing the ISerializable interface (see here for a primer). Here's a much simplified version of the object we are using:

[Serializable]
public class TestObject : ISerializable {
  public long     id1;
  public long     id2;
  public long     id3;
  public string   s1;
  public string   s2;
  public string   s3;
  public string   s4;
  public DateTime dt1;
  public DateTime dt2;
  public bool     b1;
  public bool     b2;
  public bool     b3;
  public byte     e1;
  public IDictionary<string,object> d1;
}

To serialize an object, ISerializable requires us to implement GetObjectData to define the set of data to be serialized. The trick here is to use custom serialization to merge all the fields into a single buffer, then to add this buffer to the SerializationInfo parameter to be serialized by the standard formatters. This is how it's done:

// Serialize the object. Write each field to the SerializationWriter

// then add this to the SerializationInfo parameter


public void GetObjectData (SerializationInfo info, StreamingContext ctxt) {
  SerializationWriter sw = SerializationWriter.GetWriter ();
  sw.Write (id1);
  sw.Write (id2);
  sw.Write (id3);
  sw.Write (s1);
  sw.Write (s2);
  sw.Write (s3);
  sw.Write (s4);
  sw.Write (dt1);
  sw.Write (dt2);
  sw.Write (b1);
  sw.Write (b2);
  sw.Write (b3);
  sw.Write (e1);
  sw.Write<string,object> (d1);
  sw.AddToInfo (info);
}

The SerializationWriter class extends BinaryWriter to add support for additional data types (DateTime and Dictionary) and to simplify the interface to SerializationInfo. It also overrides BinaryWriter's Write(string) method to allow for null strings. I won't go into the implementation detail here. There is lots of explanation in the code for those who are interested.

ISerializable also requires us to define a constructor to deserialize a stream to a new object. The process here is just as simple as that above:

// Deserialization constructor. Create a SerializationReader from

// the SerializationInfo then extract each field from it in turn.


public TestObject (SerializationInfo info, StreamingContext ctxt) {
  SerializationReader sr = SerializationReader.GetReader (info);
  id1 = sr.ReadInt64 ();
  id2 = sr.ReadInt64 ();
  id3 = sr.ReadInt64 ();
  s1  = sr.ReadString ();
  s2  = sr.ReadString ();
  s3  = sr.ReadString ();
  s4  = sr.ReadString ();
  dt1 = sr.ReadDateTime ();
  dt2 = sr.ReadDateTime ();
  b1  = sr.ReadBoolean ();
  b2  = sr.ReadBoolean ();
  b3  = sr.ReadBoolean ();
  e1  = sr.ReadByte ();
  d1  = sr.ReadDictionary<string,object> ();
}

Similarly, SerializationReader extends BinaryReader for the same reasons as above.

Over time, I'll probably be extending the set of types which the writer and reader can handle efficiently. There are already the WriteObject() and ReadObject() methods which will write any arbitrary type, but this just falls back to standard binary serialization (unless it's one of the supported fast types).

Results

The test program included in the download simply creates and populates the TestObject, and times its serialization and deserialization, in microseconds per cycle, averaged over 250K cycles. All timings are done on a 1.5GHz Pentium M laptop. The results are:

	Formatter	Size (bytes)	Time (uS)
Standard serialization	Binary	2080	364
Fast serialization	Binary	421	74
Fast serialization	SOAP	1086	308

So, the fast serialization technique below can cut both the size and serialization-deserialization time to about a fifth of the out-of-the box serialization. Even SOAP serialization (normally 2 to 3 times slower than binary) is faster than the standard binary serialization.

Summary

Combining custom serialization with ISerializable in this way delivers major performance gains without any change to the handling of the objects in question. It allows fast serialization to be transparently added to specific objects where a performance issue has been identified.

In our own case, throughput increased from 300 Remoting calls per second to over 700, just by changing this for one key object. No other changes were necessary.

There is also one other unexpected benefit from this. You'll notice that there are no comparative figures above for the SoapFormatter, which is because MS has not equipped the SoapFormatter to handle generic types. Using the technique above means that the SoapFormatter never sees the generic type which has been custom serialized to a byte array, so this restriction is removed.

Combining custom serialization with ISerializable is never going to be as fast as pure custom serialization alone. However, the added benefit of remaining within the standard serialization framework makes this a useful technique for boosting performance without impacting other code.

History

First version - 19 May 2005.

This is my first post on CodeProject - so please be gentle!

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here