Introduction
Serialization is everywhere in .NET. Every parameter you pass to or from a remoted object, web service, or WCF service gets serialized at one end and deserialized at the other. So why write about fast serialization? Surely, the standard BinaryFormatter
and SoapFormatter
are pretty quick, aren't they?
Well, no. When passing a reasonably substantial object from one process to another using Remoting, we find that performance was topping out at 300 calls per second. Investigation showed that each serialization/deserialization cycle was taking 360 microseconds, which would be fine except that 300 per second means that 11% of the CPU is being consumed by the serialization alone!
Background
Some form of custom serialization would be an option. An object knows exactly what types of what fields it wants to serialize. It doesn't need all the general purpose overheads and Reflection to work this out and extract the data - it can do it all by itself, much more efficiently. The result is generally much more compact. There is an example in .Shoaib's article, which demonstrates these benefits.
The problem with custom serialization is that the interface is different, requiring the calling code to be changed. It also doesn't help the automated serialization in .NET's remote access mechanisms, unless you manually serialize to a byte array and then pass this as a parameter. This isn't very type-safe!
What I cover below is a simple way to retain the benefits of custom serialization, while retaining the standard serialization interface and all the benefits that confers.
Using the code
As is often the case in matters of complex serialization, the solution lies in implementing the ISerializable
interface (see here for a primer). Here's a much simplified version of the object we are using:
[Serializable]
public class TestObject : ISerializable {
public long id1;
public long id2;
public long id3;
public string s1;
public string s2;
public string s3;
public string s4;
public DateTime dt1;
public DateTime dt2;
public bool b1;
public bool b2;
public bool b3;
public byte e1;
public IDictionary<string,object> d1;
}
To serialize an object, ISerializable
requires us to implement GetObjectData
to define the set of data to be serialized. The trick here is to use custom serialization to merge all the fields into a single buffer, then to add this buffer to the SerializationInfo
parameter to be serialized by the standard formatters. This is how it's done:
public void GetObjectData (SerializationInfo info, StreamingContext ctxt) {
SerializationWriter sw = SerializationWriter.GetWriter ();
sw.Write (id1);
sw.Write (id2);
sw.Write (id3);
sw.Write (s1);
sw.Write (s2);
sw.Write (s3);
sw.Write (s4);
sw.Write (dt1);
sw.Write (dt2);
sw.Write (b1);
sw.Write (b2);
sw.Write (b3);
sw.Write (e1);
sw.Write<string,object> (d1);
sw.AddToInfo (info);
}
The SerializationWriter
class extends BinaryWriter
to add support for additional data types (DateTime
and Dictionary
) and to simplify the interface to SerializationInfo
. It also overrides BinaryWriter
's Write(string)
method to allow for null strings. I won't go into the implementation detail here. There is lots of explanation in the code for those who are interested.
ISerializable
also requires us to define a constructor to deserialize a stream to a new object. The process here is just as simple as that above:
public TestObject (SerializationInfo info, StreamingContext ctxt) {
SerializationReader sr = SerializationReader.GetReader (info);
id1 = sr.ReadInt64 ();
id2 = sr.ReadInt64 ();
id3 = sr.ReadInt64 ();
s1 = sr.ReadString ();
s2 = sr.ReadString ();
s3 = sr.ReadString ();
s4 = sr.ReadString ();
dt1 = sr.ReadDateTime ();
dt2 = sr.ReadDateTime ();
b1 = sr.ReadBoolean ();
b2 = sr.ReadBoolean ();
b3 = sr.ReadBoolean ();
e1 = sr.ReadByte ();
d1 = sr.ReadDictionary<string,object> ();
}
Similarly, SerializationReader
extends BinaryReader
for the same reasons as above.
Over time, I'll probably be extending the set of types which the writer and reader can handle efficiently. There are already the WriteObject()
and ReadObject()
methods which will write any arbitrary type, but this just falls back to standard binary serialization (unless it's one of the supported fast types).
Results
The test program included in the download simply creates and populates the TestObject
, and times its serialization and deserialization, in microseconds per cycle, averaged over 250K cycles. All timings are done on a 1.5GHz Pentium M laptop. The results are:
|
Formatter |
Size (bytes) |
Time (uS) |
Standard serialization |
Binary |
2080 |
364 |
Fast serialization |
Binary |
421 |
74 |
Fast serialization |
SOAP |
1086 |
308 |
So, the fast serialization technique below can cut both the size and serialization-deserialization time to about a fifth of the out-of-the box serialization. Even SOAP serialization (normally 2 to 3 times slower than binary) is faster than the standard binary serialization.
Summary
Combining custom serialization with ISerializable
in this way delivers major performance gains without any change to the handling of the objects in question. It allows fast serialization to be transparently added to specific objects where a performance issue has been identified.
In our own case, throughput increased from 300 Remoting calls per second to over 700, just by changing this for one key object. No other changes were necessary.
There is also one other unexpected benefit from this. You'll notice that there are no comparative figures above for the SoapFormatter
, which is because MS has not equipped the SoapFormatter
to handle generic types. Using the technique above means that the SoapFormatter
never sees the generic type which has been custom serialized to a byte array, so this restriction is removed.
Combining custom serialization with ISerializable
is never going to be as fast as pure custom serialization alone. However, the added benefit of remaining within the standard serialization framework makes this a useful technique for boosting performance without impacting other code.
History
- First version - 19 May 2005.
This is my first post on CodeProject - so please be gentle!