(untagged)

Managed C++ wrapper for ZLib

Alberto Ferrazzoli

0.00/5 (No votes)

3 Mar 2005

.NET wrapper for ZLib, written in MC++

Introduction

In this article I present a namespace of managed types that provide a wrapper to some of the standard functionalities exported by ZLib. ZLib is a well known free, general-purpose lossless data-compression library for use on any operating system (1).

Background

Visual C++ allows you to produce managed and unmanaged code into the same assembly. The following example demonstrates the use of mixing unmanaged (native) code with managed one. This technique is very useful when building managed type that are wrappers around unmanaged ones, allowing you to migrate code and still maintain good efficiency. Another good point about mixing code is that, as in this case, bug fix or improvements on native code layer (especially when provided by other vendors) are so easy to handle that in most of the cases require only a rebuild of the project. The price to pay is that VC++ .NET code cannot be made verifiably type-safe (try the peverify.exe tool). Another quirk about mixing code came from initialization of the static structures of the standard libraries that are often linked with the native modules, such as CRT ATL and MFC. The solution to this problem comes from a MSDN article (2) but it is almost twisted. I have heard that this problem will be addressed at the next release of the .NET Framework. Waiting for next release of .NET framework, now it is best to not use static structure in these library or even better to not use the library at all (more difficult the last one).

Calling unmanaged code

One note on calling unmanaged code from managed ones. The ZStream class uses an internal (managed) buffer of Bytes to reproduce the stream behavior. To use the managed buffer with ZLib library functions, we must provide to the function a pinned pointer to the managed heap. This prevents the managed buffer from being moved by the garbage collector. Pinning a part of a managed object has the effect of pinning the entire object. So if any element of an array is pinned, then the whole array is also pinned. This led us to write the following code:

BYTE __pin * pBuffer = &buffer[0];
BYTE __pin * pSource = &source[0];

int nRes = compress2(pBuffer, & length, pSource, sourceLen, level);

The managed object is unpinned once the pinning pointer goes out of scope or when the pointer is set to zero.

Using the code

The DevelopDotnet.Compression namespace enclose many types dedicated to compression task. To use one of these types, just reference the component in the project and insert the following declaration at the beginning of the source file:

//
// Compression types
//
using DevelopDotNet.Compression;

ZStream, encapsulate the compressed stream functionalities. It derives from the System.Stream .NET Framework class and it can be used to compress streams as well as to decompress compressed streams. The class constructors take the base Stream to manage, and eventually other parameters to determinate if the ZStream can Read (decompress) data, or Write (compress) data. Note that ZStream is a sequential Stream so it does not support Seek.

// Read only Stream (it can only decompress)
ZStream(Stream stream);


// Read or Write Stream depending on the boolean parameter write
// If write = true this stream can only Write 
ZStream(Stream stream, bool write);


// Write only Stream  (it can only compress)
ZStream(Stream stream, CompressionLevel level);
ZStream(Stream stream, CompressionLevel level, CompressionStrategy strategy);

The following lines of code, represent the standard way to compress binary data into a file.

//
// Serializing dataset object
//

FibonacciDataSet ds = (FibonacciDataSet) GenerateFibonacciData();

fs = new FileStream(sFileName, FileMode.Create);

ZStream compressor = new ZStream(fs, true);

BinaryFormatter bf = new BinaryFormatter();
bf.Serialize(compressor, ds);

To regenerate the dataset object from a compressed stream, open the compressed data file then attach it to a ZStream object and use Deserialize method of BinaryFormatter class.

//
// Deserializing data
//
fs = new FileStream(sFileName, FileMode.Open);


ZStream decompressor = new ZStream(fs);


BinaryFormatter bf = new BinaryFormatter();
FibonacciDataSet ds = (FibonacciDataSet) bf.Deserialize(decompressor);

dataGrid1.DataSource = ds;

ZCompressor export two static methods: Compress and Un compress to quick compress and decompress a buffer of Bytes. Beware that while Compress does require as much memory as that occupied by the data to compress, Un compress may require very large memory allocation depending on the origin of the compressed data. This because decompression algorithm allocates buffers to accommodate the decompressed data on the fly while decompression goes on.

string sData = txtData.Text;

try
{
    Encoding encoder = Encoding.UTF7;
    byte [] compressedData = ZCompressor.Compress(encoder.GetBytes(sData), 
        CompressionLevel.BestCompression);

    txtData.Text = encoder.GetString(compressedData);
}
catch(ZException err)
{
    txtData.Text = err.Message;
}

The DevelopDotnet.Compression namespace also export types suited to do checksum. These are Adler32 and CRC32, and one interface IChecksum. The Adler32 and CRC32 managed types, both implements the IChecksum interface. Writing different concrete types that implements ICkecsum interface will do the trick of polymorphism. That is, both Adler32 and CRC32 do perform checksum on data buffer, but how this sum is performed it depends only by the implementation of the concrete types as long as they are all controlled through the same interface.

public __gc __interface IChecksum
{
    __property unsigned long get_Checksum();

    unsigned long Update(unsigned char buffer __gc[]);
    unsigned long Update(unsigned char buffer __gc[], int offset, int count);
};

This let us easily configure one application to use checksum facilities by writing code based on the generic access through the IChecksum interface neglecting how the checksum is performed.

Adler32 crc = new Adler32();
DoChecksum(txtFile.Text, crc);
lblAdler.Text = crc.Checksum.ToString("X");

CRC32 crc = new CRC32();
DoChecksum(txtFile.Text, crc);
lblCrc.Text = crc.Checksum.ToString("X");

// ...

private long DoChecksum(string sFile, IChecksum chk)
{
    FileStream fs = null;
    long checksum = 0;

    try
    {
        fs = new FileStream(sFile, FileMode.Open);

        Byte [] data = new Byte[16384];

        while(fs.Read(data, 0, 16384) > 0)
        {
            checksum = chk.Update(data);
        }
    }
    catch(Exception err)
    {
        MessageBox.Show(err.Message, "Error", MessageBoxButtons.OK, 
                                MessageBoxIcon.Error);
    }
    finally
    {
        if(null != fs)
            fs.Close();
    }

    return checksum;
}

ZException extends the System.Exception class to provide Compression namespace with some mechanism to translate ZLib errors trough the managed world. Every type exported by Compression namespace that use directly the ZLib library should throw a ZException to let know the application that something has "crashed" deep inner the library.

Finally I so proud to introduce to you, the last type, Ladies and Gents... ZLib. This little type has only two public static properties. The first is Version that returns the version string of the ZLib library stuck into the managed code, the last is CompiledFlags. CompiledFlags is an enumeration of type CompileOptions that has the Flags attribute and it represent the options chosen at compile time for the ZLib library source code.

Zip File Support

Starting from release 1.1.0.0 the library support Zip archive files, according to the PKWARE standard as stated in the zip file format specification document (3). Because ZLib, actually only the deflated method of compression is supported, anyway this method is the most used for zip archives. The Compression library exposes the following managed types related to zip archives: ZipCompressionMethod, ZipFileEncryption, ZipArchive, ZipEntry, ZipEntryCollection, ZipException, ZipBadCrc32Exception, ZipFailEventHandler, ZipFailEventArgs, ZipProgressEventHandler, ZipProgressEventArgs, ZipRecoveryAction and ZipStoreFilePath. The ZipArchive class can raise notification of errors occurred during archive operation trough the ZipFail event. Client that register this even can eventually decide whether stop the operation or give it a second chance or even hide the error to the user. See the ZTest application to get a picture of what can be done with the Fail event. The Progress event is notified to the user during the normal operation flow. This event carries the information about the current operation state and the completion percent as well as a flag used to abort the operation.

Follows some interesting code snipped that shows the standard use of some task that the library can perform on zip archives:

Adding files to Zip archive:

string archiveFile = "test.zip";
using(ZipArchive archive = new ZipArchive(archiveFile, FileMode.OpenOrCreate, 
                                          FileAccess.ReadWrite))
{
    // The simplest method to add file in archive. 
    archive.Add("file.txt");

    
    bool recursive = true;
    // Adding all the files contained in a folder.
    archive.Add(@"C:\Temp", recursive);
    

    // Another method to add a files.
    ZipEntry entry = new ZipEntry(@"C:\Temp");
    archive.Entries.Add(entry, recursive);

}

Removing files from Zip archive:

string archiveFile = "test.zip";
using(ZipArchive archive = new ZipArchive(archiveFile, FileMode.Open, 
                                          FileAccess.ReadWrite))
{
    // Remove from the zip archive all the entries contained in the given folder
    archive.Remove("Temp", true);
}

Doing some useful Test...:

string archiveFile = "test.zip";
using(ZipArchive archive = new ZipArchive(archiveFile, FileMode.Open, 
                                          FileAccess.Read))
{
    archive.Fail += new ZipFailEventHandler(OnZipFail);
    archive.Progress += new ZipProgressEventHandler(OnZipProgress);
    if(archive.Test())
    {
        Console.WriteLine("{0} test successfully completed.", archiveFile);
    }
}

Extracting the archive content:

string archiveFile = "test.zip";
using(ZipArchive archive = new ZipArchive(archiveFile, FileMode.Open, 
                                          FileAccess.Read))
{
    // Set some flag to perform extraction
    archive.OverwriteExisting = true;
    archive.BuildTree = true;
    // Get error notification.
    archive.Fail += new ZipFailEventHandler(OnZipFail);
    // Go with extract
    archive.ExtractTo(@"C:\Temp");
}

The ZipArchive class use .NET seekable stream to access the Zip archives. This gives us the possibility to create on the fly memory zip archives and speed up the things when multiple operations must be performed sequentially.

MemoryStream archiveStream = new MemoryStream();

ZipArchive archive = new ZipArchive(archiveStream);

try
{
    archive.Add("file1.txt");
    archive.Add("file2.pdf");
    archive.Comment = "Memory Archive Test";
}
catch(ZipException)
{
    MessageBox.Show("Something goes wrong while adding files to the archive.");
}
//
// Copy memory stream to file
//

using(FileStream file = new FileStream("memory.zip", FileMode.Create, 
                                       FileAccess.Write))
{
    StreamUtilities.StreamCopy(archiveStream, 0, file);
}

Improvements

There is always time to do the best, so what about to let the library manage SFX archive, password, and others stuff. The library is freeware and comes with the source code. Actually is under development at the DevelopDotNet http://www.developdotnet.com%20(4/4). There you will find a forum about the library and the possibility to signal bugs or require changes.

Points of Interest

Ok, by the time, when I had the power of ZStream I would like to have something to squeeze to doing some testing. Searching up and down I came along with an old school book where I found the way to get a lot of number: the Fibonacci's series. I hope you not get bored from the following little story about Fibonacci's series. In 1202, Leonardo Pisano, also called Leonardo Fibonacci, published a great book of mathematics: Liber Abaci. In this book Fibonacci, discusses the famous rabbit population problem, here is Prof. Sigler's (5) translation of Fibonacci's original statement:

How Many Pairs of Rabbits Are Created by One Pair in One Year.

A certain man had one pair of rabbits together in a certain enclosed place, and one wishes to know how many are created from the pair in one year when it is the nature of them in a single month to bear another pair, and in the second month those born to bear also. Because the above written pair in the first month bore, you will double it; there will be two pairs in one month. One of these, namely the first, bears in the second month, and thus there are in the second month 3 pairs; of these in one month two are pregnant, and in the third month 2 pairs of rabbits are born, and thus there are 5 pairs in the month; in this month 3 pairs are pregnant, and in the fourth month there are 8 pairs, of which 5 pairs bear another 5 pairs; these are added to the 8 pairs making 13 pairs in the fifth month; these 5 pairs that are born in this month do not mate in this month, but another 8 pairs are pregnant, and thus there are in the sixth month 21 pairs; to these are added the 13 pairs that are born in the seventh month; there will be 34 pairs in this month; to this are added the 21 pairs that are born in the eighth month; there will be 55 pairs in this month; to these are added the 34 pairs that are born in the ninth month; there will be 89 pairs in this month; to these are added again the 55 pairs that are born in the tenth month; there will be 144 pairs in this month; to these are added again the 89 pairs that are born in the eleventh month; there will be 233 pairs in this month.

To these are still added the 144 pairs that are born in the last month; there will be 377 pairs, and this many pairs are produced from the above written pair in the mentioned place at the end of the one year.

You can indeed see in the margin how we operated, namely that we added the first number to the second, namely the 1 to the 2, and the second to the third, and the third to the fourth, and the fourth to the fifth, and thus one after another until we added the tenth to the eleventh, namely the 144 to the 233, and we had the above written sum of rabbits, namely 377, and thus you can in order find it for an unending number of months.

The solution to this problem comes from the series with the nth term given by:

F(n) = F(n-1) + F(n-2) with the initial condition: F(0) = 0 and F(1) = 1.

1, 1, 2, 3, 5, 8, 11, ...,

Interesting, isn't it? A generic term of this series can be calculated using recursion but this get expensive in term of computational time, so I have reduced the problem to straight line by simply saving the two last term of the series and then getting the sum for the last term.

References

(1) ZLib home site http://www.gzip.org/zlib/
(2) Converting Managed Extensions for C++ Projects from Pure Intermediate Language to Mixed Mode - Managed Extensions for C++ Reference
(3) PKWARE White Papers ZIP File Format Specification
(4) DevelopDotNet download latest source code at http://www.developdotnet.com/.
(5) L. E. (Laurence E.) Sigler: "Fibonacci's Liber Abaci: A Translation into Modern English of Leonardo Pisano's Book of Calculation", Springer.

History

Feb 28, 2005: Version 1.1.0.0 (Added Zip file support and StreamUtilities. Strings on Resource.)
Jun 21, 2004: Version 1.0.0.0

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here