Introduction
Sometimes you may need to compress data in memory without creating disk files. For example, it is possible to compress binary data in a CDATA
section of an XML file. Also, you may want to compress data transmitted between the client and server machines over a low-bandwidth connection.
The proposed classes implement the original compression algorithm. This method is based on the "DEFLATE Compressed Data Format Specification" (RFC 1951) with some enhancements. It works slightly better than the generally used zip compression. The algorithm is especially effective on Centrino and other platforms with 2 MB L2 cache. The above picture shows results of the mscorlib.xml file compression/decompression on a machine with Pentium M processor (1.8 GHz). Running the same test on a machine with AMD Athlon 2100+ processor (with 256 KB L2 cache) gives 721 ms as the compression time and 40 ms as the decompression time. It is interesting that Intel and AMD platforms have such an evident difference in performance for various operations (here 110 ms on the Pentium M is quite imperfect data, however, because of the big inaccuracy).
You can compare this compression engine with the standard DeflateStream
compressor added in .NET 2.0. To do so, please open the source code for the demo project (Form1.cs) and comment/uncomment several code blocks. They are marked in the source code. Then, try the demo project with various data files and compare results with the MemCompressor engine. You will see, the standard DeflateStream
engine is rather poor.
Using MemCompressor
The MemCompressor assembly includes two main classes: AcedDeflator
and AcedInflator
. The AcedDeflator
class has the Compress
method. This method accepts a byte array and returns new byte array with the same data in the compressed form. For example:
private int _originalSize;
private byte[] _originalData;
private int _originalChecksum;
private byte[] _compData;
private int _compSize;
private void CompressData()
{
_originalChecksum = AcedUtils.Adler32(_originalData, 0, _originalSize);
_compData = AcedDeflator.Instance.Compress(_originalData, 0, _originalSize,
AcedCompressionLevel.Normal, 0, 0);
_compSize = _compData.Length;
}
The AcedInflator
class works quite the contrary. It has the Decompress
method which accepts a binary array with compressed data. This method restores the original data in the output array.
private byte[] _decompData;
private int _decompSize;
private int _decompChecksum;
private void DecompressData()
{
_decompData = AcedInflator.Instance.Decompress(_compData, 0, 0, 0);
_decompSize = _decompData.Length;
_decompChecksum = AcedUtils.Adler32(_decompData, 0, _decompSize);
Debug.Assert(_decompChecksum == _originalChecksum);
}
The AcedDeflator.Compress
method accepts, among other, the compressionLevel
parameter. This parameter chooses balance between the speed and the compression ratio. Here you may pass one of the following values:
Store
(simple copying data to the output array without compression)
Fastest
(poor compression ratio but the best speed)
Fast
(optimal for less-redundant data)
Normal
(better choice for higher-redundant data, such as texts)
Maximum
(slow but has the best compression ratio)
AcedDeflator Class
public class AcedDeflator
Provides methods for compressing binary data with a combination of the LZ77 algorithm and Huffman coding similar to the method described in RFC 1951. You should synchronize calls to the methods of this class in a multithreaded application.
public AcedDeflator()
Creates a new instance of the AcedDeflator
class. It is not recommended to call this constructor directly. When possible, please use a single cached instance of the AcedDeflator
class which is returned by the Instance static
property.
public static AcedDeflator Instance { get; }
Returns a cached instance of the AcedDeflator
class. Such an instance occupies a lot of memory (a little more than 2 MB). So, a single instance is created and cached in an internal static
field of this class. This instance can be used each time when you want to compress some data. The only exception may be a multithreaded application where you compress data in several threads simultaneously. If so, you should create each instance with the regular constructor.
public byte[] Compress(byte[] sourceBytes, int sourceIndex, int sourceLength,
AcedCompressionLevel compressionLevel, int beforeGap, int afterGap)
Compresses binary data passed in the sourceBytes
array starting from sourceIndex
with sourceLength
bytes in length. The compression ratio depends on the compressionLevel
parameter. This method creates a new byte array which is enough to store the compressed data. You may also reserve beforeGap
bytes before the compressed data block and afterGap
bytes after the compressed data block. This may be useful if you want to supply a custom header and/or tail to the compressed data and you don't want to reallocate memory unnecessarily. The result byte array consists of (beforeGap
+ Compressed_Data_Length + afterGap
) bytes.
public int Compress(byte[] sourceBytes, int sourceIndex, int sourceLength,
AcedCompressionLevel compressionLevel, byte[] destinationBytes,
int destinationIndex)
This method is similar to the previous one but it doesn't allocate the memory for the output array. The output array is passed as the destinationBytes
argument. The compressed data will be stored in that array starting from the destinationIndex
. The maximum length of the compressed data is (sourceLength
+ 4) bytes. The output array must have enough space. This method returns the number of bytes stored in the output array. If you pass null
in the destinationBytes
argument, the method performs "dummy" compression. In such a way, you can find out the length of the output data. However, the "dummy" compression takes almost the same time as the regular compression.
public static void Release()
Call this method to release an internal static
reference to the cached instance of the AcedDeflator
class. After that, you may call GC.Collect()
to free memory which was occupied by that cached instance.
AcedInflator Class
public class AcedInflator
Provides methods for decompressing binary data which was compressed using the AcedDeflator
class. You should synchronize calls to the methods of this class in a multithreaded application.
public AcedInflator()
Creates a new instance of the AcedInflator
class. It is not recommended to call this constructor directly. When possible, to avoid unnecessary memory reallocation, please use a single cached instance of the AcedInflator
class which is returned by the Instance static
property.
public static AcedInflator Instance { get; }
Returns a cached instance of the AcedInflator
class. Such an instance is created when you access this property for the first time. Then, the instance is cached in an internal static
field of this class. You can use it to decompress any data in a single-threaded application. In a multithreaded application, please create an AcedInflator
for each the thread with the regular constructor.
public static int GetDecompressedLength(byte[] sourceBytes, int sourceIndex)
Returns the minimum length, in bytes, of the binary array where you can store the decompressed data for the specified compressed binary array (sourceBytes
). The sourceIndex
argument specifies the start index in the sourceBytes
array from which the compressed data begins.
public byte[] Decompress(byte[] sourceBytes, int sourceIndex,
int beforeGap, int afterGap)
Decompresses binary data passed in the sourceBytes
array starting from sourceIndex
. This method creates a new byte array which is enough to store the decompressed data. You may also reserve beforeGap
bytes before the decompressed data block and afterGap
bytes after the decompressed data block. This may be useful if you want to supply a custom header and/or tail to decompressed data and you don't want to reallocate memory unnecessarily. The result array will consist of (beforeGap
+ Decompressed_Data_Length
+ afterGap
) bytes.
public int Decompress(byte[] sourceBytes, int sourceIndex,
byte[] destinationBytes, int destinationIndex)
The same as the previous method but the memory for the output array must be allocated before calling this method. You should pass the destination array in the destinationBytes
argument. The destinationIndex
specifies offset in that array from which the decompressed data will be stored. The output array must have enough space to store all the data. To obtain the decompressed data size, please call the GetDecompressedLength
method of this class. The Decompress
method returns the number of bytes stored in the output (destinationBytes
) array. If you pass null
in the destinationBytes
argument, this method does nothing, just returns the decompressed data length, in bytes.
public static void Release()
Call this method to release an internal static
reference to the cached instance of the AcedInflator
class. After that, you may call GC.Collect()
to free memory which was occupied by that cached instance.
AcedUtils Class
public class AcedUtils
This class has one public static
method to compute the Adler-32 checksum.
public static int Adler32(byte[] bytes, int offset, int length)
Computes the Adler-32 checksum in accordance with RFC 1950 for length
bytes of the bytes
array starting from the offset
index.
Conclusion
Just have these simple classes in mind when you will consider adding some compression capabilities to your application.
History
- 23rd May, 2006: Initial post