(untagged)

Quick Compression Utility for C# Byte Arrays

ronnotel

0.00/5 (No votes)

11 Jan 2007

A quick but useful utility for compression and decompression of byte arrays

Download source files - 543 B

Introduction

To improve network bandwidth utilization for a high performance application, I determined that compressing a large-ish object (6 MB) prior to transmission improved performance of the network call by roughly a factor of 10x. However, the MSDN sample code for the GZipStream class left a lot to be desired. It's arcane and poorly written and it took me far too long to understand it.

When I had finally figured out what was going on, I was left with, IMHO, a fairly useful little utility for generalized compression and decompression of byte arrays. It uses the GZipStream class that comes standard as part of the System.IO.Compression package.

My utility consists of a single class, Compressor, with two static methods, Compress() and Decompress(). Both methods take in a byte array as a parameter, and return a byte array. For Compress(), the parameter is the uncompressed byte array, and the return is the compressed byte array and vice versa for Decompress().

During compression, the compressed bytes are prepended with an Int32 header containing the number of bytes in the uncompressed byte array. This header is used during decompression to allocate the byte array to be returned by Decompress().

Using the Code

Simply convert the object (or collection of objects) you wish to compress into a byte array. I find that a bit of custom serialization using the BitConverter and/or the Buffer classes can work well for this. For classes with a fixed record size (i.e. contains value types only, and no strings), you can also dip down into the Marshal class (see example below) to convert an object into a pointer and then copy the memory pointed to into your buffer.

Once you have your byte array, simply pass it to Compressor.Compress() to get a compressed array for transmission. On the far end, simply pass the compressed byte array to Decompress() and recover the original byte array. Voila!

//
// Sample Compression - how to send 100,000 stock prices across town in 1 second.
//
  public struct StockPrice
  {
    public int ID;
    public double bidPrice;
    public double askPrice;
    public double lastPrice;

    public static int sz = Marshal.SizeOf(typeof(StockPrice));
    public void CopyToBuffer(byte[] buffer, int startIndex)
    {
      IntPtr ptr = Marshal.AllocHGlobal(sz);
      Marshal.StructureToPtr(this, ptr, false);
      Marshal.Copy(ptr, buffer, startIndex, sz);
      Marshal.FreeHGlobal(ptr);
    }

    public static StockPrice CopyFromBuffer(byte[] buffer, int startIndex)
    {
      IntPtr ptr = Marshal.AllocHGlobal(sz);
      Marshal.Copy(buffer, startIndex, ptr, sz);
      StockPrice stockPrice = 
        (StockPrice)Marshal.PtrToStructure(ptr, typeof(StockPrice));
      Marshal.FreeHGlobal(ptr);
      return stockPrice;
    }
  }

  int Main()
  {
    // Assume that you are starting with a populated dictionary of StockPrice objects,
    // which is an instance of Dictionary<int, StockPrice> and is keyed by the ID field
    byte[] buffer = new byte[StockPriceDict.Count * StockPrice.sz];
    int startIndex = 0;
    foreach(StockPrice price in StockPriceDict.Values)
    {
      price.CopyToBuffer(buffer, startIndex);
      startIndex += StockPrice.sz;
    }

    byte[] gzBuffer = Compressor.Compress(buffer);

    // now uncompress the bytes and recover the original dictionary. 
    // This is *much* faster than
    // using .NET Remoting or similar techniques

    Dictionary<int, StockPrice> newStockPriceDict = new Dictionary<int, StockPrice>();
    byte[] buffer1 = Compressor.Decompress(gzBuffer);
    startIndex = 0;
    while (startIndex < buffer1.Length)
    {
      StockPrice stockPrice = StockPrice.CopyFromBuffer(buffer1, startIndex);
      newStockPriceDict[stockPrice.ID] = stockPrice;
    }
  }

Points of Interest

If there was any one thing I would improve about C# is its ability to manipulate objects as byte arrays. This aspect is absolutely critical for high performance computing and doesn't get enough respect from the C# product team. It seems like the functionality was only included for backwards compatibility with COM. However, it's probably the bit of code I rely on most when working in high-performance areas usually reserved for C++.

History

v1.0 - 10^thJanuary, 2007

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here