Introduction
To improve network bandwidth utilization for a high performance application, I determined that compressing a large-ish object (6 MB) prior to transmission improved performance of the network call by roughly a factor of 10x. However, the MSDN sample code for the GZipStream
class left a lot to be desired. It's arcane and poorly written and it took me far too long to understand it.
When I had finally figured out what was going on, I was left with, IMHO, a fairly useful little utility for generalized compression and decompression of byte arrays. It uses the GZipStream
class that comes standard as part of the System.IO.Compression
package.
My utility consists of a single class, Compressor
, with two static
methods, Compress()
and Decompress()
. Both methods take in a byte array as a parameter, and return a byte array. For Compress()
, the parameter is the uncompressed byte array, and the return is the compressed byte array and vice versa for Decompress()
.
During compression, the compressed bytes are prepended with an Int32
header containing the number of bytes in the uncompressed byte array. This header is used during decompression to allocate the byte array to be returned by Decompress()
.
Using the Code
Simply convert the object (or collection of objects) you wish to compress into a byte array. I find that a bit of custom serialization using the BitConverter
and/or the Buffer
classes can work well for this. For classes with a fixed record size (i.e. contains value types only, and no string
s), you can also dip down into the Marshal
class (see example below) to convert an object into a pointer and then copy the memory pointed to into your buffer.
Once you have your byte array, simply pass it to Compressor.Compress()
to get a compressed array for transmission. On the far end, simply pass the compressed byte array to Decompress()
and recover the original byte array. Voila!
public struct StockPrice
{
public int ID;
public double bidPrice;
public double askPrice;
public double lastPrice;
public static int sz = Marshal.SizeOf(typeof(StockPrice));
public void CopyToBuffer(byte[] buffer, int startIndex)
{
IntPtr ptr = Marshal.AllocHGlobal(sz);
Marshal.StructureToPtr(this, ptr, false);
Marshal.Copy(ptr, buffer, startIndex, sz);
Marshal.FreeHGlobal(ptr);
}
public static StockPrice CopyFromBuffer(byte[] buffer, int startIndex)
{
IntPtr ptr = Marshal.AllocHGlobal(sz);
Marshal.Copy(buffer, startIndex, ptr, sz);
StockPrice stockPrice =
(StockPrice)Marshal.PtrToStructure(ptr, typeof(StockPrice));
Marshal.FreeHGlobal(ptr);
return stockPrice;
}
}
int Main()
{
byte[] buffer = new byte[StockPriceDict.Count * StockPrice.sz];
int startIndex = 0;
foreach(StockPrice price in StockPriceDict.Values)
{
price.CopyToBuffer(buffer, startIndex);
startIndex += StockPrice.sz;
}
byte[] gzBuffer = Compressor.Compress(buffer);
Dictionary<int, StockPrice> newStockPriceDict = new Dictionary<int, StockPrice>();
byte[] buffer1 = Compressor.Decompress(gzBuffer);
startIndex = 0;
while (startIndex < buffer1.Length)
{
StockPrice stockPrice = StockPrice.CopyFromBuffer(buffer1, startIndex);
newStockPriceDict[stockPrice.ID] = stockPrice;
}
}
Points of Interest
If there was any one thing I would improve about C# is its ability to manipulate objects as byte arrays. This aspect is absolutely critical for high performance computing and doesn't get enough respect from the C# product team. It seems like the functionality was only included for backwards compatibility with COM. However, it's probably the bit of code I rely on most when working in high-performance areas usually reserved for C++.
History
- v1.0 - 10th January, 2007