I recently had to work around a problem in a particularly ugly way (which I won't detail ), so after that painful experience, I opted to create a class to solve my specific issue in a sane and reusable manner! Out of this unexpected need, the class “GZipHelper
” was born. This is really just a wrapper around the base .NET System.IO.Compression.GZipStream. It was kind of a sad day as I really didn’t want to be doing this type of wrapper code. I was hoping it would have just been nativity available in the existing GZipStream
class and I could have got on with solving my real business problem at hand.
Firstly, it should be said that the standard GZipStream
stream provides the functionality I’m sure the MS engineers expected it to do, which was for HTTP based compression (at least I think that was its expected purpose). However, it is certainly not a fully featured class that is really easy to use for the programmers looking to get quick & helpful access to the GZip compression.
Specifically, the problem I needed to solve was I needed to know how big any given “.GZ” decompressed file was without fully reading and decompressing the file. It seemed trivial enough – “gzip.exe -l
” does what I needed, but no amount of hunting within MSDN helped. So on to the ever handy GZip wikipedia entry that detailed enough of the file format and provided the reference to the “GZIP file format specification version 4.3“.
So armed with this information, we can start to decode the GZip file format to extract the length. In fact, this class will check the file to see if it is GZip compressed and returns the decompressed length for that or the regular file length if it is not compressed.
The following class functions have been implemented (see the bottom of the article for the link to the full project):
public class GZipHelper
{
public bool GetFileDetails(string filename);
public void GetFileInformation(FileStream fileStream);
public void CompressFile(string filename, bool overWriteExisting);
public bool DecompressFile(string filename, bool overWriteExisting);
public Stream GetSeekableStream(string filename);
}
In combination to this, the following properties are available:
CompressedLength
– Size of the compressed file (or regular file size if not compressed) DecompressedLength
– Size of the file if it were uncompressed (or regular file size if not compressed) IsTextFile
– Indicates if GZip thought the file was text based, potentially leading to better compression CompressionModeValue
– Numeric indication of the compression mode used CRC16Present
– Indicates a CRC16
is available for the file ExtraFieldsPresent
– Additional meta fields are available in the file FileNamePresent
– GZip contains the original file name FileCommentPresent
– Compressed file has a comment associated with it IsCompressed
– Indicates if the file is GZip compressed or not CompressedDate
– If stored, this is the date the file was compressed. CRC32
– CRC32
value associated with the file
Along with the project, there are MSTest harnesses to test the class (trivial implementations). So the features of the class are:
- Can trivially determine a true file size (regardless if it was compressed via GZip or is uncompressed). This makes your code path much more readable if you are dealing with mixed file types.
- Provides a Seekable stream into the compressed file via via a
MemoryStream
. The key is that you don't need to worry about the compression (unless you are reading in BIG files) as you will get back a Stream for either a File or a Compressed file – both support seeking. This can be handy if you problem assumes it can Seek in the stream and you need to access GZip files! - Trivial Decompress file, this also honors the
CompressedDate
. If that date is set, then the decompressed file has that creation date. - Trivial Compress file. Unfortunately at the time of writing, I’ve not updated the header to include the date of the compressed file. This may come in a later version (and if so I’ll update the blog – but definitely no promises!).
Simple example usages are (taken straight from the unit tests!):
GZipHelper actual = new GZipHelper();
actual.CompressFile(_fileName, true);
GZipHelper actual = new GZipHelper();
string fileName = "CSharpHackerSmallTest.txt.gz";
actual.DecompressFile(fileName, true);
GZipHelper actual = new GZipHelper();
using (Stream dataStream = actual.GetSeekableStream("CSharpHackerSmallTest.txt.gz"))
{
dataStream.Seek(0, SeekOrigin.Begin);
StreamReader sr = new StreamReader(dataStream);
string contents = sr.ReadToEnd();
Assert.AreEqual(119, contents.Length);
}
GZipHelper actual = new GZipHelper();
actual.GetFileInformation("CSharpHackerSmallTest.txt.gz");
Assert.AreEqual(119, actual.DecompressedLength);
Finally, it should be noted that by all accounts the standard implementation of GZipStream
in the base .NET libraries (actually the DeflateStream
) have a problem when attempting to compress random or already compressed data. There is a Microsoft Connect article [http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=93930] that details the issue.
The GZipStream
and DeflateStream
classes can _significantly_ increase the size of “compressed” data. That means, they don’t just add a few header bytes as stand-alone compressors do, but they _inflate_ the data by as much as 50%. This is apparently because these classes do not check for incompressible data which is a standard feature of all stand-alone compressors. Both classes work fine when the data actually can be compressed.
Please refer to this thread for more details:
http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=179704&SiteID=1
The base implementation worked for me and met my specific needs without the need of bringing in any third party DLLs. Which incidentally also has a nice benefit for those looking to bring this into proprietary software of avoiding any licensing discussions with supervisors! If you want a more robust GZipStream
implementation, you can check out http://dotnetzip.codeplex.com/. This apparently has a drop in replacement, but this class could still be useful even if use this drop in replacement as well.
I hope this helps someone out there.
[Download GZipHelper (Source + Project) Here]
This download link will always have the latest and greatest version.
Gareth
I'm Gareth and am a guy who loves software! My day job is working for a retail company and am involved in a large scale C# project that process large amounts of data into up stream data repositories.
My work rule of thumb is that everyone spends much more time working than not, so you better enjoy what you do!
Needless to say - I'm having a blast.
Have fun,
Gareth