(untagged)

BinaryFormatter vs. Manual Serializing

kosmoh

0.00/5 (No votes)

13 Jan 2012

Comparison of serializing with BinaryFormatter (standard .NET class) against manual per-byte serializing; some pros and cons before selecting the right method for you.

Introduction

First of all, let's clear up the term Serialization:

In computer science, in the context of data storage and transmission, serialization is the process of converting a data structure or object state into a format that can be stored (for example, in a file or memory buffer, or transmitted across a network connection link) and "resurrected" later in the same or another computer environment. (Wikipedia)

Serialization is the process of converting the state of an object into a form that can be persisted or transported. The complement of serialization is deserialization, which converts a stream into an object. (MSDN)

The serialization technique may roughly be divided into two types: Human-readable and Binary. The .NET Framework provides the BinaryFormatter class for binary serialization and the XmlSerializer class for human-readable technique.

Most sources recommend using BinaryFormatter instead of XmlSerializer, if we need smaller size of serialized data. I almost never use XmlSerializer - it brings some restrictions to your classes. However, BinaryFormatter is used rather often and I have something to say here about it.

Background

I started using BinaryFormatter when .NET Framework 2.0 was released. There was no WCF, and binary serializing was used by Remoting. We developed an application where there should have been a special protocol with the system of messages. We had the right to choose the way of message encoding. We decided to try BinaryFormatter, and it worked OK. For some time. When we tried sending a thousand messages in a loop - we got a problem of TCP Zero window size - that meant that the receiving side didn't manage to deserialize messages in time.

I was extremely surprised when examining the network traffic, I found out that the size of each message was about 1,000 bytes, while encoding them in a per-byte manner manually would have taken at most 70 bytes.

BinaryFormatter and manual approach

In the example application, we've got two "useful" classes - Person aggregating CreditCard, like it's shown in the picture:

We're going to serialize the objects of the Person class. To serialize it with BinaryFormatter, we need to add the Serializable attribute before the Person class definition, and before all classes that Person aggregates. Then we're allowed to write:

BinaryFormatter formatter = new BinaryFormatter();
formatter.Serialize(stream, toSerialize);

where stream and toSerialize are variables of type Stream and Person, respectively.

Things become a bit harder when implementing manual synchronization. Instead of putting a Serializable attribute, we have to provide some methods for calling them later. In the example, such methods are the static members WriteToStream and ReadFromStream inside the serialized classes. Then we write:

Person.WriteToStream(toSerialize, stream);

and that's it.

Using the code

Now that we briefly explored the two techniques, let's examine the testing code.

static byte[] getBytesWithBinaryFormatter(Person toSerialize)
{ 
    MemoryStream stream = new MemoryStream();
    BinaryFormatter formatter = new BinaryFormatter();
    formatter.Serialize(stream, toSerialize);
    return stream.ToArray();
}

static byte[] getBytesWithManualWrite(Person toSerialize)
{
    MemoryStream stream = new MemoryStream();
    Person.WriteToStream(toSerialize, stream);
    return stream.ToArray();
}

static void testSize(Person toTest)
{
    byte[] fromBinaryFormatter = getBytesWithBinaryFormatter(toTest);
    Console.WriteLine("\tWith binary formatter: {0} bytes", 
                      fromBinaryFormatter.Length);
    byte[] fromManualWrite = getBytesWithManualWrite(toTest);
    Console.WriteLine("\tManual serializing: {0} bytes", fromManualWrite.Length);
}

static void testTime(Person toTest, int timesToTest)
{
    List<byte[]> binaryFormatterResultArrays = new List<byte[]>();
    List<byte[]> manualWritingResultArrays = new List<byte[]>();

    DateTime start = DateTime.Now;
    for (int i = 0; i < timesToTest; ++i )
        binaryFormatterResultArrays.Add(getBytesWithBinaryFormatter(toTest));
    TimeSpan binaryFormatterSerializingTime = DateTime.Now - start;

    Console.WriteLine("\tBinary formatter serializing: {0} seconds", 
                      binaryFormatterSerializingTime.TotalSeconds);

    start = DateTime.Now;
    for (int i = 0; i < timesToTest; i++)
        manualWritingResultArrays.Add(getBytesWithManualWrite(toTest) );
    TimeSpan manualSerializingTime = DateTime.Now - start;
    
    Console.WriteLine("\tManual serializing: {0} seconds", 
                      manualSerializingTime.TotalSeconds);
    BinaryFormatter formatter = new BinaryFormatter();
    start = DateTime.Now;
    foreach (var next in binaryFormatterResultArrays)
        formatter.Deserialize(new MemoryStream(next));
    TimeSpan binaryFormatterDeserializingTime = DateTime.Now - start;

    Console.WriteLine("\tBinaryFormatter desrializing: {0} seconds", 
                      binaryFormatterDeserializingTime.TotalSeconds );

    start = DateTime.Now;
    foreach (var next in manualWritingResultArrays)
        Person.ReadFromStream(new MemoryStream(next));
    TimeSpan manualDeserializingTime = DateTime.Now - start;

    Console.WriteLine("\tManual desrializing: {0} seconds", 
                      manualDeserializingTime.TotalSeconds );
}

static void addCards(Person where, int howMany)
{
    for (int i = 0; i < howMany; ++i)
    { 
        where.AddNewCard(new CreditCard(Guid.NewGuid().ToString(), 
                         new DateTime(2012, 12, 24 )));
    }
}

static void Main(string[] args)
{
    Person aPerson = new Person("Kate", new DateTime(1986, 2, 1), 
      new CreditCard[] { new CreditCard(Guid.NewGuid().ToString(), 
      new DateTime(2012, 12, 24 )) });
    Console.WriteLine("Size test: 1 card");
    testSize(aPerson);
    Console.WriteLine("Time test: 1 card, 100 000 times");
    testTime(aPerson, 100000);

    addCards(aPerson, 10000);
    Console.WriteLine("Size test: 10 000 cards");
    testSize(aPerson);

    Console.WriteLine("Time test: 10 000 cards, 100 times");
    testTime(aPerson, 100);

    Console.ReadKey();
}

First, size and time tests are performed on a Person with one CreditCard object. Then the same test is performed on a person with 10,000 credit cards :). And here is the output from my machine:

Size test: 1 card
        With binary formatter: 896 bytes
        Manual serializing: 62 bytes
Time test: 1 card, 100 000 times
        Binary formatter serializing: 4,24 seconds
        Manual serializing: 0,28 seconds
        BinaryFormatter desrializing: 4,57 seconds
        Manual desrializing: 0,169 seconds
Size test: 10 000 cards
        With binary formatter: 640899 bytes
        Manual serializing: 450062 bytes
Time test: 10 000 cards, 100 times
        Binary formatter serializing: 4,713 seconds
        Manual serializing: 0,527 seconds
        BinaryFormatter desrializing: 4,519 seconds
        Manual desrializing: 0,57 seconds

Based on these results and on the knowledge we have, let's compare the two techniques.

Size

Serializing a message with BinaryFormatter is expensive because of the metadata present. Manual serializing gives a real benefit there. We receive a nearly 14 times smaller byte array result in the example. However, when we talk about serializing an array of objects - then the difference in resulting array size is not so huge - the result array of manual synchronization is 1.4 times smaller in our example.

Speed

We've got a nice benefit in speed for both one object and array of objects serialization. For one Person and one CreditCard object, we've got 4.24 / 0.28 = 15.14 times faster speed. And manual approach is almost 10 times faster for 10,000 CreditCard objects inside a Person object.

The speed tests usually require mentioning the machine parameters, a table which shows some dependencies - well you are welcome to test this yourself and draw a table. I'm not trying to give precise results, but the common picture.

Versioning

Well, protocols change, and your start message format may be very different from the one three years later. Fields could be removed and added. Because we are looking at serializing in a networking context, we should keep in mind: all peers should react in a good way on changes in the message format.

In manual serializing approach, we have two ways: either update all peers when the format changes, or provide our own way to handle the situation when messages with other versions come.

In .NET, the new fields can be marked as optional - and they may be present in the message - or may not. Of course, that doesn't solve all problems, and the new-version host, when receiving a message, should handle the presence and absence of optional fields.

Development speed

Manual serialization does take time. For developing and for testing. If you have version control logic - than it'll take even more time. Standard .NET methods save all this time for you.

Technologies

BinaryFormatter is used in WCF when you're using TCP binding. Theoretically there's an opportunity to provide your own formatter, however I've never heard of any good realization.

Structure complexity impact

BinaryFormatter is capable of serializing an object of any complexity. This means, that even if the objects from your system from a graph with cycles - it will be serialized correctly. Again, serializing (and deserializing!) a leaf class from class hierarhy is not a problem.

Things are not so shiny about Manual serialization. Having a class not aggregating any others is a trivial case. Then, when you have a tree of objects (Person with 100 CreditCard objects inside is a tree) - we've got some complications. Having a class hierarchy brings even more complexity, and independence of cycles makes your code completely hard to study. Personally, I've never used Manual synchronization for object graphs that might contain cycles.

Summary

Pros for BinaryFormatter:

ability to use WCF or other technologies
no need to spend time for implementing the Serializing mechanism
versioning support present
object of any structure complexity may be processed

Pros for manual serializing:

best size complexity may be achieved
best performance may be achieved

Points of interest

Manual serializing: leaf class serializing

Many aspects appear in serializing and especially deserializing code.

Manual serializing: where to locate serialization code?

Actually, this is a continuation of the previous point of interest. In this example I put the code as a static method into the respective classes. This approach may be explained by the Information Expert GRASP pattern. But everything is neat in the example project because there is no class hierarchy. The approach with static methods looks dirty when a class hierarchy appears. Maybe this point could be covered in the next article.

Protobuff: http://protobuf-net.googlecode.com. Interesting project, really hard work. I made the same tests with their serializer - and got the same size characteristic as for Manual approach! However, time consumption is even worse (in Debug build - thanks to Sam Cragg's remark ) than for BinaryFormatter. You could download this code - 92.5 KB - the same project as in the example, but also with Protobuff check to see that for yourself. The Release build for Protobuff-net tests is available here - 86.6 KB. Release build has shown 3-5 times better time characteristics than BinaryFormatter, and worse time characteristics than Manual approach.

Why is BinaryFormatter so slow and space-consuming?

History

10.01.2012 - First upload.
13.01.2012 - An update to reflect remarks from feedback.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here