Introduction
One of the new changes in .NET 4.5 is the vast improvement in the System.IO.Compression
namespace. Now, in a very simple manner, we can perform zip and unzip actions. I will demonstrate how to perform these actions and how to avoid the pitfalls in using some of these methods. Finally, I will go over how to extend the current functionality in order to shore up some of the simpler weaknesses that we will discuss.
Getting Started
In order to use the new functionality of the Compression
namespace, you need to have .NET 4.5 installed. I built the examples we are about to see in the beta of Visual Studio 11, so if you download the solution file attached to this article you will need to use the beta as well. Also, you will need to add references to both System.IO.Compression
and System.IO.Compression.ZipArchive
. Simply adding System.IO.Compression
as a using statement will not work until you add it as a reference too. I also added using
statements for these two namespaces, as well as for System.IO
for the sake of simplicity. With these steps completed, we are ready to start coding. Whenever you come across an example that is numbered below, you can see that same example in the source code attached to this article. The code will have a corresponding example number above it in the comments.
Zipping Files and Folders
To zip up the contents (including sub-folders) of a folder, simply call the CreateFromDirectory
method of ZipFile.
You need to pass in first the root folder to zip and then the full name of the zip file to be created (which includes a relative or absolute path). Here is an example call (this is Example 1):
ZipFile.CreateFromDirectory(@“C:\Temp\Logs”, @“C:\Temp\LogFiles.zip”);
Let’s break this down a bit. First, in case you are not aware, the @ symbol tells the compiler that the following string is a string literal (no special characters need to be translated – just display whatever is between the quotes). That allows us to only put in one slash for each path segment. Next, note the simplicity here. We are telling the method just two pieces of information: what to zip and where to put it. Finally, note that there is no mention of compression. By default, the method assumes Optimal
compression (the other options are Fastest
and None
). The other default here is that we will not include the root directory in the zip file. We do include all of the folders and files in the root directory, just not the directory itself.
The only other overload for the CreateFromDirectory
method includes two additional options: compression and include root directory. These are fairly simple to understand. Here is an example to help you visualize (this is Example 2):
ZipFile.CreateFromDirectory(@“C:\Temp\Logs”, @“C:\Temp\LogFiles.zip”, CompressionLevel.Optimal, true);
This example is the same as above except that it includes the root director in the zip file. If you think this through, you may see a shortcoming with our current examples: we didn’t check for the existence of the output zip file. If you ran both of these examples together, the system will throw an error on the second file, since the output file will already exist (since they are named the same). One solution would be to check for the existence of the file before creating it. The problem I see with this is that it will require you to write the same code every time you use this class. That doesn’t seem very DRY to me. I will show you below how to extend this functionality to improve how this method works.
Unzipping Files
The basic implementation of this functionality is just as simple as zipping a file. To unzip a zip file to a directory, simply execute the following example (this is Example 3):
ZipFile.ExractToDirectory(@“C:\Temp\LogFiles.zip”, @“C:\Temp\Logs”);
This will unzip the LogFiles.zip file to the Logs directory. Here we immediately see another issue: what if some of the files already exist? The short answer is that the extract will fail and throw an error. Whoops. That doesn’t seem terribly useful. The reality is that this is fine for everyday use. Most of the time you are unzipping files to new root folders. However, if you are trying to restore a backup from zip, you may run into a serious issue. What that means, though, is that this is not the correct method for all cases. If you need to conditionally restore files (overwrite if newer, if not exists, etc.), you need to read through your zip file and perform actions on each file. We will see how to do that next.
Opening a Zip File
Sometimes you will need to open up a zip file and read the contents. The code to do this gets a bit more complex, but only because we need to loop through the files contained in the zip file. Here is an example (this is Example 4):
string fileUnzipFullPath;
string fileUnzipFullName;
using (ZipArchive archive = ZipFile.OpenRead(zipName))
{
foreach (ZipArchiveEntry file in archive.Entries)
{
Console.WriteLine("File Name: {0}", file.Name);
Console.WriteLine("File Size: {0} bytes", file.Length);
Console.WriteLine("Compression Ratio: {0}", ((double)file.CompressedLength / file.Length).ToString("0.0%"));
fileUnzipFullName = Path.Combine(dirToUnzipTo, file.FullName);
if (!System.IO.File.Exists(fileUnzipFullName))
{
fileUnzipFullPath = Path.GetDirectoryName(fileUnzipFullName);
Directory.CreateDirectory(fileUnzipFullPath);
file.ExtractToFile(fileUnzipFullName);
}
}
}
One really cool thing here is that the ZipArchive
implements IDisposable
, which allows us to use the using
statement. This ensures that we properly dispose of our class when we are done with it. The OpenRead
method is the same as the Open
method in Read
mode. We will get to the Open
method next.
This code opens a zip file in read-only mode (it cannot modify the zip file contents but it can extract them). It then loops through each file and gives us the properties of the file. I added a fun output titled “Compression Ratio”, where I figured out the difference between the compressed size and the actual size. I then displayed the percentage of the compressed size as it relates to the normal size. Finally, I extracted the file and put it in the specified root directory with the original path. This involves a couple of extra steps that aren’t directly related to Compression
. I build the new path (without file name) for the file, then I create that directory (which will do nothing if it exists already), and I then extract the file to the new location. The existing ExtractToFile
method does not create the path if it does not exist (it just throws an error) and it will throw an error if the file already exists.
This is another area of the Compression
namespace that I wish were a little more robust. However, I have written a couple of methods below that will extend the functionality of the Compression
namespace and make it a little easier to work with. First, however, let’s go over the rest of the Compression
namespace.
Manually Creating Zip Files
Above I showed you how to create a zip file by passing in a root folder and the name of a zip file. Sometimes, though, you want to specifically choose which files to include in an archive. We have seen above how to loop through an existing archive. We will do the same type of thing here, only we will add the files instead of viewing them. Here is an example (this is Example 5):
using (ZipArchive newFile = ZipFile.Open(zipName, ZipArchiveMode.Create))
{
newFile.CreateEntryFromFile(@"C:\Temp\File1.txt", "File1.txt");
newFile.CreateEntryFromFile(@"C:\Temp\File2.txt", "File2.txt", CompressionLevel.Fastest);
}
Basically, we are creating this file, adding two new files to it, and then the zip file is getting committed implicitly (at the end of the using statement). Notice that when I added the second txt file, I set the CompressionLevel
for that specific file. This can be a choice for each file. Note that we are using the Open
method with the mode set to Create
. The mode options are Create
, Open
, and Update
. I had mentioned above that when you use the Open
mode, it is the same as the OpenRead
method. Here we covered the Create
mode. The last mode is Update
, which we will cover next.
Adding or Removing Files from an Existing Zip File
The last Open
mode we have to cover is the Update
mode. This is both a read and a write mode on an existing zip file. The way you interact with the file is no different than the two above methods. For example, if we wanted to add two new files to the ManualBackup.zip file we just created, we would do so like this (this is Example 6):
using (ZipArchive modFile = ZipFile.Open(zipName, ZipArchiveMode.Update))
{
modFile.CreateEntryFromFile(@"C:\Temp\File1.txt", "File10.txt");
modFile.CreateEntryFromFile(@"C:\Temp\File2.txt", "File20.txt", CompressionLevel.Fastest);
}
Notice that I renamed the files I was adding, just so they would be different.
Extending the Basic Functionality
Now that we have learned how to work with zip files using the Compression
namespace, let’s look at how to make the features of this namespace even better. Above we pointed out a few shortcomings in the existing system, mostly revolving around how to handle unexpected situations (files already exist, folders don’t exist, etc.) If we want to properly work with archives, we really need to add a bunch of code each time to handle all possible situations. Instead of making you recreate this code each time you want to work with archives, I have added a Helper
namespace that adds functionality to the standard methods provided by Microsoft.
I added three new methods to the standard set you get with the Compression
namespace. I will explain what each method does and then I will put the code below it. If you want the entire project (which is a DLL for easy use in your projects), simply download the code for this article. I have attached both the full library and the examples on how to use both the Compression
namespace and my improved Compression
methods.
The ImprovedExtractToDirectory Method
This method improves upon the ExtractToDirectory
method. Instead of blindly extracting the files and hoping there are no existing files that match with the files we are extracting, this method loops through each file and compares it against the destination to see if it exists. If it does, it handles that specific file like we asked (Overwrite.Always
, Overwrite.IfNewer
, or Overwrite.Never
). It also creates the destination path if it does not exist before it attempts to write the file. These two new features ensure that we deal with each file in the archive individually and that we don’t fail the whole extraction because of an issue with one file in the archive.
To call this method, use the following line (this is Example 7):
Compression.ImprovedExtractToDirectory(zipName, dirToUnzipTo, Compression.Overwrite.IfNewer);
Note how simple that line was. In fact, it is patterned after Example 3, which also means you could leave off the last parameter and it would still work (the default Overwrite
is IfNewer
). The code for this method is as follows:
public static void ImprovedExtractToDirectory(string sourceArchiveFileName,
string destinationDirectoryName,
Overwrite overwriteMethod = Overwrite.IfNewer)
{
using (ZipArchive archive = ZipFile.OpenRead(sourceArchiveFileName))
{
foreach (ZipArchiveEntry file in archive.Entries)
{
ImprovedExtractToFile(file, destinationDirectoryName, overwriteMethod);
}
}
}
The ImprovedExtractToFile Method
When you have opened up an archive and you are looping through the included files, you need a way to safely extract the files in a reasonable manner. In the Opening a Zip File section (Example 4), we added a bit of plumbing to ensure that we were only extracting the files if they did not already exist. However, we have too many lines of code and too much logic here for something so standard. Instead, I created a method called ImprovedExtractToFile
. You pass in the ZipArchiveEntry
reference (the file), the root destination directory (my method will calculate the full directory based upon the root directory and the relative path of the file we are extracting), and what we want to do if the file exists in the destination location. For an example of how to call this method, I have copied Example 4 and replaced the plumbing with my new call (this is Example 8):
using (ZipArchive archive = ZipFile.OpenRead(zipName))
{
foreach (ZipArchiveEntry file in archive.Entries)
{
Console.WriteLine("File Name: {0}", file.Name);
Console.WriteLine("File Size: {0} bytes", file.Length);
Console.WriteLine("Compression Ratio: {0}", ((double)file.CompressedLength / file.Length).ToString("0.0%"));
Compression.ImprovedExtractToFile(file, dirToUnzipTo, Compression.Overwrite.Always);
}
}
Notice the comparative size of the two examples. This one is cleaner and has less logic we would need to repeat elsewhere. I did not, however, abstract the compression ratio code. If you find that you want to do that more than once, you could add this as a method to the library. It is also important to note that I use the ImprovedExtractToFile
method in my ImprovedExtractToDirectory
method. This way we do not duplicate logic where it is not needed. The code for this method is as follows:
public static void ImprovedExtractToFile(ZipArchiveEntry file,
string destinationPath,
Overwrite overwriteMethod = Overwrite.IfNewer)
{
string destinationFileName = Path.Combine(destinationPath, file.FullName);
string destinationFilePath = Path.GetDirectoryName(destinationFileName);
Directory.CreateDirectory(destinationFilePath);
switch (overwriteMethod)
{
case Overwrite.Always:
file.ExtractToFile(destinationFileName, true);
break;
case Overwrite.IfNewer:
if (!File.Exists(destinationFileName) || File.GetLastWriteTime(destinationFileName) < file.LastWriteTime)
{
file.ExtractToFile(destinationFileName, true);
}
break;
case Overwrite.Never:
if (!File.Exists(destinationFileName))
{
file.ExtractToFile(destinationFileName);
}
break;
default:
break;
}
}
The AddToArchive Method
I created this new method to take a set of files and put them into a zip file. If the zip file exists, we can merge the files into the zip, we can overwrite the zip file, we can throw an error, or we can fail silently (you should avoid this last option – there are valid reasons to use it, but they aren’t as common as you would think). If we end up merging our files into the existing archive, we can then check if we want every file to go in or not. We can overwrite every matching file, we can overwrite the matching files only if the files to be put in are newer, or we can ignore the file to be put in if there is already a match.
That is a lot of options for what used to be a fairly simple job. However, this method handles all of Example 5 and most of Example 6 (we don’t read through the existing files in the archive – that could be another helper method but I decided against it for the time being). Here is the improved call that we could use in both Example 5 and Example 6 to add files to an archive (this is Example 9):
List<string> filesToArchive = new List<string>();
filesToArchive.Add(@"C:\Temp\File1.txt");
filesToArchive.Add(@"C:\Temp\File2.txt");
Compression.AddToArchive(zipName,
filesToArchive,
Compression.ArchiveAction.Replace,
Compression.Overwrite.IfNewer,
CompressionLevel.Optimal);
Note that we have selected to Replace
the zip files if one already exists and to overwrite any matching files inside if the ones we are inserting are newer. We are also setting the compression on each file in the archive to be Optimal
. These last three parameters can be changed to fit your needs. They can also be omitted. The values I specified are the defaults. The code for this method is as follows:
public static void AddToArchive(string archiveFullName,
List<string> files,
ArchiveAction action = ArchiveAction.Replace,
Overwrite fileOverwrite = Overwrite.IfNewer,
CompressionLevel compression = CompressionLevel.Optimal)
{
ZipArchiveMode mode = ZipArchiveMode.Create;
bool archiveExists = File.Exists(archiveFullName);
switch (action)
{
case ArchiveAction.Merge:
if (archiveExists)
{
mode = ZipArchiveMode.Update;
}
break;
case ArchiveAction.Replace:
if (archiveExists)
{
File.Delete(archiveFullName);
}
break;
case ArchiveAction.Error:
if (archiveExists)
{
throw new IOException(String.Format("The zip file {0} already exists.", archiveFullName));
}
break;
case ArchiveAction.Ignore:
if (archiveExists)
{
return;
}
break;
default:
break;
}
using (ZipArchive zipFile = ZipFile.Open(archiveFullName, mode))
{
if (mode == ZipArchiveMode.Create)
{
foreach (string file in files)
{
zipFile.CreateEntryFromFile(file, Path.GetFileName(file), compression);
}
}
else
{
foreach (string file in files)
{
var fileInZip = (from f in zipFile.Entries
where f.Name == Path.GetFileName(file)
select f).FirstOrDefault();
switch (fileOverwrite)
{
case Overwrite.Always:
if (fileInZip != null)
{
fileInZip.Delete();
}
zipFile.CreateEntryFromFile(file, Path.GetFileName(file), compression);
break;
case Overwrite.IfNewer:
if (fileInZip != null)
{
if (fileInZip.LastWriteTime < File.GetLastWriteTime(file))
{
fileInZip.Delete();
zipFile.CreateEntryFromFile(file, Path.GetFileName(file), compression);
}
}
else
{
zipFile.CreateEntryFromFile(file, Path.GetFileName(file), compression);
}
break;
case Overwrite.Never:
break;
default:
break;
}
}
}
}
}
Compression Comparisons
It was brought up in the comments that it would be beneficial to see how these zip methods compared to other commonly-available zip methods available to us. I thought this was a great idea, so I started investigating. To create a test pool of data, I created 10,000 text files that each had 10,000 lines of text in them. Text files are easy to create and also easy to compress. This gave me a uniform base to work with. I then tested my code against the Windows "Send to Zip" and 7zip. Here are the results:
Compression Method | Compression Level | Time Elapsed | Final Size |
7zip (right-click and send to zip file) | Normal | 14 minutes 19 seconds | 66,698kb |
Code (Debug Mode) | Optimal | 10 minutes 13 seconds | 62,938kb |
Code (Release Mode) | Optimal | 9 minutes 36 seconds | 62,938kb |
Windows Zip (right-click and send to zip file) | Normal | 8 minutes 31 seconds | 62,938kb |
Code (Release Mode) | Fastest | 8 minutes 5 seconds | 121,600kb |
7zip (zip file through UI) | Fastest | 8 minutes 0 seconds | 66,698kb |
Note that my machine is fairly slow. I'm sure you can get your system to run faster. However, the thing to look at here is not the actual times but the comparisons between methods. It looks as though the Optimal
compression is the same as Windows zip (and probably uses the same libraries). The actual compression is a bit better than 7zip but it is slower. The first 7zip entry does confuse me a bit. It seems to be doing the same task as the last entry, but it takes over six minutes longer. I did repeat the test just to verify. I also verified that I am using the latest version of 7zip.
I think the results of these tests indicate that the use of the Compression
namespace in .NET 4.5 is a very viable candidate for all types of projects. It competes well with a great competitor and the fact that you do not need to rely on a third-party library to use it, in my mind, makes this the obvious choice for archiving files.
Conclusion
The System.IO.Compression
namespace in .NET 4.5 provides us with an easy way to work with zip files. We can create archives, update archives, and extract archives. This is a huge leap forward from what we had before. While there are a few shortcomings in the implementation, we saw how we can overcome them easily. I hope you found this article helpful. As always, I appreciate your constructive feedback.
History
- May 10, 2012 - Initial Version
- May 11, 2012 - Added Compression Comparisons section