Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Creating Zip Files Easily in .NET 4.5

0.00/5 (No votes)
11 May 2012 1  
We will use the updated System.IO.Compression namespace to easily create, update, and extract zip files.

Introduction

One of the new changes in .NET 4.5 is the vast improvement in the System.IO.Compression namespace. Now, in a very simple manner, we can perform zip and unzip actions. I will demonstrate how to perform these actions and how to avoid the pitfalls in using some of these methods. Finally, I will go over how to extend the current functionality in order to shore up some of the simpler weaknesses that we will discuss.

Getting Started

In order to use the new functionality of the Compression namespace, you need to have .NET 4.5 installed. I built the examples we are about to see in the beta of Visual Studio 11, so if you download the solution file attached to this article you will need to use the beta as well. Also, you will need to add references to both System.IO.Compression and System.IO.Compression.ZipArchive. Simply adding System.IO.Compression as a using statement will not work until you add it as a reference too. I also added using statements for these two namespaces, as well as for System.IO for the sake of simplicity. With these steps completed, we are ready to start coding. Whenever you come across an example that is numbered below, you can see that same example in the source code attached to this article. The code will have a corresponding example number above it in the comments.

Zipping Files and Folders

To zip up the contents (including sub-folders) of a folder, simply call the CreateFromDirectory method of ZipFile. You need to pass in first the root folder to zip and then the full name of the zip file to be created (which includes a relative or absolute path). Here is an example call (this is Example 1):

ZipFile.CreateFromDirectory(@“C:\Temp\Logs”, @“C:\Temp\LogFiles.zip”);

Let’s break this down a bit. First, in case you are not aware, the @ symbol tells the compiler that the following string is a string literal (no special characters need to be translated – just display whatever is between the quotes). That allows us to only put in one slash for each path segment. Next, note the simplicity here. We are telling the method just two pieces of information: what to zip and where to put it. Finally, note that there is no mention of compression. By default, the method assumes Optimal compression (the other options are Fastest and None). The other default here is that we will not include the root directory in the zip file. We do include all of the folders and files in the root directory, just not the directory itself.

The only other overload for the CreateFromDirectory method includes two additional options: compression and include root directory. These are fairly simple to understand. Here is an example to help you visualize (this is Example 2):

ZipFile.CreateFromDirectory(@“C:\Temp\Logs”, @“C:\Temp\LogFiles.zip”, CompressionLevel.Optimal, true);

This example is the same as above except that it includes the root director in the zip file. If you think this through, you may see a shortcoming with our current examples: we didn’t check for the existence of the output zip file. If you ran both of these examples together, the system will throw an error on the second file, since the output file will already exist (since they are named the same). One solution would be to check for the existence of the file before creating it. The problem I see with this is that it will require you to write the same code every time you use this class. That doesn’t seem very DRY to me. I will show you below how to extend this functionality to improve how this method works.

Unzipping Files

The basic implementation of this functionality is just as simple as zipping a file. To unzip a zip file to a directory, simply execute the following example (this is Example 3):

ZipFile.ExractToDirectory(@“C:\Temp\LogFiles.zip”, @“C:\Temp\Logs”);

This will unzip the LogFiles.zip file to the Logs directory. Here we immediately see another issue: what if some of the files already exist? The short answer is that the extract will fail and throw an error. Whoops. That doesn’t seem terribly useful. The reality is that this is fine for everyday use. Most of the time you are unzipping files to new root folders. However, if you are trying to restore a backup from zip, you may run into a serious issue. What that means, though, is that this is not the correct method for all cases. If you need to conditionally restore files (overwrite if newer, if not exists, etc.), you need to read through your zip file and perform actions on each file. We will see how to do that next.

Opening a Zip File

Sometimes you will need to open up a zip file and read the contents. The code to do this gets a bit more complex, but only because we need to loop through the files contained in the zip file. Here is an example (this is Example 4):

//This stores the path where the file should be unzipped to,
//including any subfolders that the file was originally in.
string fileUnzipFullPath;
 
//This is the full name of the destination file including
//the path
string fileUnzipFullName;
 
//Opens the zip file up to be read
using (ZipArchive archive = ZipFile.OpenRead(zipName))
{
    //Loops through each file in the zip file
    foreach (ZipArchiveEntry file in archive.Entries)
    {
        //Outputs relevant file information to the console
        Console.WriteLine("File Name: {0}", file.Name);
        Console.WriteLine("File Size: {0} bytes", file.Length);
        Console.WriteLine("Compression Ratio: {0}", ((double)file.CompressedLength / file.Length).ToString("0.0%"));
 
        //Identifies the destination file name and path
        fileUnzipFullName = Path.Combine(dirToUnzipTo, file.FullName);
 
        //Extracts the files to the output folder in a safer manner
        if (!System.IO.File.Exists(fileUnzipFullName))
        {
            //Calculates what the new full path for the unzipped file should be
            fileUnzipFullPath = Path.GetDirectoryName(fileUnzipFullName);
                        
            //Creates the directory (if it doesn't exist) for the new path
            Directory.CreateDirectory(fileUnzipFullPath);
 
            //Extracts the file to (potentially new) path
            file.ExtractToFile(fileUnzipFullName);
        }
    }
}

One really cool thing here is that the ZipArchive implements IDisposable, which allows us to use the using statement. This ensures that we properly dispose of our class when we are done with it. The OpenRead method is the same as the Open method in Read mode. We will get to the Open method next.

This code opens a zip file in read-only mode (it cannot modify the zip file contents but it can extract them). It then loops through each file and gives us the properties of the file. I added a fun output titled “Compression Ratio”, where I figured out the difference between the compressed size and the actual size. I then displayed the percentage of the compressed size as it relates to the normal size. Finally, I extracted the file and put it in the specified root directory with the original path. This involves a couple of extra steps that aren’t directly related to Compression. I build the new path (without file name) for the file, then I create that directory (which will do nothing if it exists already), and I then extract the file to the new location. The existing ExtractToFile method does not create the path if it does not exist (it just throws an error) and it will throw an error if the file already exists.

This is another area of the Compression namespace that I wish were a little more robust. However, I have written a couple of methods below that will extend the functionality of the Compression namespace and make it a little easier to work with. First, however, let’s go over the rest of the Compression namespace.

Manually Creating Zip Files

Above I showed you how to create a zip file by passing in a root folder and the name of a zip file. Sometimes, though, you want to specifically choose which files to include in an archive. We have seen above how to loop through an existing archive. We will do the same type of thing here, only we will add the files instead of viewing them. Here is an example (this is Example 5):

//Creates a new, blank zip file to work with - the file will be
//finalized when the using statement completes
using (ZipArchive newFile = ZipFile.Open(zipName, ZipArchiveMode.Create))
{
    //Here are two hard-coded files that we will be adding to the zip
    //file.  If you don't have these files in your system, this will
    //fail.  Either create them or change the file names.
    newFile.CreateEntryFromFile(@"C:\Temp\File1.txt", "File1.txt");
    newFile.CreateEntryFromFile(@"C:\Temp\File2.txt", "File2.txt", CompressionLevel.Fastest);
}

Basically, we are creating this file, adding two new files to it, and then the zip file is getting committed implicitly (at the end of the using statement). Notice that when I added the second txt file, I set the CompressionLevel for that specific file. This can be a choice for each file. Note that we are using the Open method with the mode set to Create. The mode options are Create, Open, and Update. I had mentioned above that when you use the Open mode, it is the same as the OpenRead method. Here we covered the Create mode. The last mode is Update, which we will cover next.

Adding or Removing Files from an Existing Zip File

The last Open mode we have to cover is the Update mode. This is both a read and a write mode on an existing zip file. The way you interact with the file is no different than the two above methods. For example, if we wanted to add two new files to the ManualBackup.zip file we just created, we would do so like this (this is Example 6):

//Opens the existing file like we opened the new file (just changed
//the ZipArchiveMode to Update
using (ZipArchive modFile = ZipFile.Open(zipName, ZipArchiveMode.Update))
{
    //Here are two hard-coded files that we will be adding to the zip
    //file.  If you don't have these files in your system, this will
    //fail.  Either create them or change the file names.  Also, note
    //that their names are changed when they are put into the zip file.
    modFile.CreateEntryFromFile(@"C:\Temp\File1.txt", "File10.txt");
    modFile.CreateEntryFromFile(@"C:\Temp\File2.txt", "File20.txt", CompressionLevel.Fastest);
 
    //We could also add the code from Example 4 here to read
    //the contents of the open zip file as well.
}

Notice that I renamed the files I was adding, just so they would be different.

Extending the Basic Functionality

Now that we have learned how to work with zip files using the Compression namespace, let’s look at how to make the features of this namespace even better. Above we pointed out a few shortcomings in the existing system, mostly revolving around how to handle unexpected situations (files already exist, folders don’t exist, etc.) If we want to properly work with archives, we really need to add a bunch of code each time to handle all possible situations. Instead of making you recreate this code each time you want to work with archives, I have added a Helper namespace that adds functionality to the standard methods provided by Microsoft.

I added three new methods to the standard set you get with the Compression namespace. I will explain what each method does and then I will put the code below it. If you want the entire project (which is a DLL for easy use in your projects), simply download the code for this article. I have attached both the full library and the examples on how to use both the Compression namespace and my improved Compression methods.

The ImprovedExtractToDirectory Method

This method improves upon the ExtractToDirectory method. Instead of blindly extracting the files and hoping there are no existing files that match with the files we are extracting, this method loops through each file and compares it against the destination to see if it exists. If it does, it handles that specific file like we asked (Overwrite.Always, Overwrite.IfNewer, or Overwrite.Never). It also creates the destination path if it does not exist before it attempts to write the file. These two new features ensure that we deal with each file in the archive individually and that we don’t fail the whole extraction because of an issue with one file in the archive.

To call this method, use the following line (this is Example 7):

Compression.ImprovedExtractToDirectory(zipName, dirToUnzipTo, Compression.Overwrite.IfNewer);

Note how simple that line was. In fact, it is patterned after Example 3, which also means you could leave off the last parameter and it would still work (the default Overwrite is IfNewer). The code for this method is as follows:

public static void ImprovedExtractToDirectory(string sourceArchiveFileName, 
                                              string destinationDirectoryName, 
                                              Overwrite overwriteMethod = Overwrite.IfNewer)
{
    //Opens the zip file up to be read
    using (ZipArchive archive = ZipFile.OpenRead(sourceArchiveFileName))
    {
        //Loops through each file in the zip file
        foreach (ZipArchiveEntry file in archive.Entries)
        {
            ImprovedExtractToFile(file, destinationDirectoryName, overwriteMethod);
        }
    }
}

The ImprovedExtractToFile Method

When you have opened up an archive and you are looping through the included files, you need a way to safely extract the files in a reasonable manner. In the Opening a Zip File section (Example 4), we added a bit of plumbing to ensure that we were only extracting the files if they did not already exist. However, we have too many lines of code and too much logic here for something so standard. Instead, I created a method called ImprovedExtractToFile. You pass in the ZipArchiveEntry reference (the file), the root destination directory (my method will calculate the full directory based upon the root directory and the relative path of the file we are extracting), and what we want to do if the file exists in the destination location. For an example of how to call this method, I have copied Example 4 and replaced the plumbing with my new call (this is Example 8):

//Opens the zip file up to be read
using (ZipArchive archive = ZipFile.OpenRead(zipName))
{
    //Loops through each file in the zip file
    foreach (ZipArchiveEntry file in archive.Entries)
    {
        //Outputs relevant file information to the console
        Console.WriteLine("File Name: {0}", file.Name);
        Console.WriteLine("File Size: {0} bytes", file.Length);
        Console.WriteLine("Compression Ratio: {0}", ((double)file.CompressedLength / file.Length).ToString("0.0%"));
 
        //This is the new call
        Compression.ImprovedExtractToFile(file, dirToUnzipTo, Compression.Overwrite.Always);
    }
}

Notice the comparative size of the two examples. This one is cleaner and has less logic we would need to repeat elsewhere. I did not, however, abstract the compression ratio code. If you find that you want to do that more than once, you could add this as a method to the library. It is also important to note that I use the ImprovedExtractToFile method in my ImprovedExtractToDirectory method. This way we do not duplicate logic where it is not needed. The code for this method is as follows:

public static void ImprovedExtractToFile(ZipArchiveEntry file, 
                                            string destinationPath, 
                                            Overwrite overwriteMethod = Overwrite.IfNewer)
{
    //Gets the complete path for the destination file, including any
    //relative paths that were in the zip file
    string destinationFileName = Path.Combine(destinationPath, file.FullName);
 
    //Gets just the new path, minus the file name so we can create the
    //directory if it does not exist
    string destinationFilePath = Path.GetDirectoryName(destinationFileName);
 
    //Creates the directory (if it doesn't exist) for the new path
    Directory.CreateDirectory(destinationFilePath);
 
    //Determines what to do with the file based upon the
    //method of overwriting chosen
    switch (overwriteMethod)
    {
        case Overwrite.Always:
            //Just put the file in and overwrite anything that is found
            file.ExtractToFile(destinationFileName, true);
            break;
        case Overwrite.IfNewer:
            //Checks to see if the file exists, and if so, if it should
            //be overwritten
            if (!File.Exists(destinationFileName) || File.GetLastWriteTime(destinationFileName) < file.LastWriteTime)
            {
                //Either the file didn't exist or this file is newer, so
                //we will extract it and overwrite any existing file
                file.ExtractToFile(destinationFileName, true);
            }
            break;
        case Overwrite.Never:
            //Put the file in if it is new but ignores the 
            //file if it already exists
            if (!File.Exists(destinationFileName))
            {
                file.ExtractToFile(destinationFileName);
            }
            break;
        default:
            break;
    }
}

The AddToArchive Method

I created this new method to take a set of files and put them into a zip file. If the zip file exists, we can merge the files into the zip, we can overwrite the zip file, we can throw an error, or we can fail silently (you should avoid this last option – there are valid reasons to use it, but they aren’t as common as you would think). If we end up merging our files into the existing archive, we can then check if we want every file to go in or not. We can overwrite every matching file, we can overwrite the matching files only if the files to be put in are newer, or we can ignore the file to be put in if there is already a match.

That is a lot of options for what used to be a fairly simple job. However, this method handles all of Example 5 and most of Example 6 (we don’t read through the existing files in the archive – that could be another helper method but I decided against it for the time being). Here is the improved call that we could use in both Example 5 and Example 6 to add files to an archive (this is Example 9):

//This creates our list of files to be added
List<string> filesToArchive = new List<string>();
 
//Here we are adding two hard-coded files to our list
filesToArchive.Add(@"C:\Temp\File1.txt");
filesToArchive.Add(@"C:\Temp\File2.txt");
 
Compression.AddToArchive(zipName, 
    filesToArchive, 
    Compression.ArchiveAction.Replace, 
    Compression.Overwrite.IfNewer,
    CompressionLevel.Optimal);


Note that we have selected to Replace the zip files if one already exists and to overwrite any matching files inside if the ones we are inserting are newer. We are also setting the compression on each file in the archive to be Optimal. These last three parameters can be changed to fit your needs. They can also be omitted. The values I specified are the defaults. The code for this method is as follows:

public static void AddToArchive(string archiveFullName, 
                                List<string> files, 
                                ArchiveAction action = ArchiveAction.Replace, 
                                Overwrite fileOverwrite = Overwrite.IfNewer,
                                CompressionLevel compression = CompressionLevel.Optimal)
{
    //Identifies the mode we will be using - the default is Create
    ZipArchiveMode mode = ZipArchiveMode.Create;
 
    //Determines if the zip file even exists
    bool archiveExists = File.Exists(archiveFullName);
 
    //Figures out what to do based upon our specified overwrite method
    switch (action)
    {
        case ArchiveAction.Merge:
            //Sets the mode to update if the file exists, otherwise
            //the default of Create is fine
            if (archiveExists)
            {
                mode = ZipArchiveMode.Update;
            }
            break;
        case ArchiveAction.Replace:
            //Deletes the file if it exists.  Either way, the default
            //mode of Create is fine
            if (archiveExists)
            {
                File.Delete(archiveFullName);
            }
            break;
        case ArchiveAction.Error:
            //Throws an error if the file exists
            if (archiveExists)
            {
                throw new IOException(String.Format("The zip file {0} already exists.", archiveFullName));
            }
            break;
        case ArchiveAction.Ignore:
            //Closes the method silently and does nothing
            if (archiveExists)
            {
                return;
            }
            break;
        default:
            break;
    }
 
    //Opens the zip file in the mode we specified
    using (ZipArchive zipFile = ZipFile.Open(archiveFullName, mode))
    {
        //This is a bit of a hack and should be refactored - I am
        //doing a similar foreach loop for both modes, but for Create
        //I am doing very little work while Update gets a lot of
        //code.  This also does not handle any other mode (of
        //which there currently wouldn't be one since we don't
        //use Read here).
        if (mode == ZipArchiveMode.Create)
        {
            foreach (string file in files)
            {
                //Adds the file to the archive
                zipFile.CreateEntryFromFile(file, Path.GetFileName(file), compression);
            }
        }
        else
        {
            foreach (string file in files)
            {
                var fileInZip = (from f in zipFile.Entries
                                    where f.Name == Path.GetFileName(file)
                                    select f).FirstOrDefault();
 
                switch (fileOverwrite)
                {
                    case Overwrite.Always:
                        //Deletes the file if it is found
                        if (fileInZip != null)
                        {
                            fileInZip.Delete();
                        }
 
                        //Adds the file to the archive
                        zipFile.CreateEntryFromFile(file, Path.GetFileName(file), compression);
 
                        break;
                    case Overwrite.IfNewer:
                        //This is a bit trickier - we only delete the file if it is
                        //newer, but if it is newer or if the file isn't already in
                        //the zip file, we will write it to the zip file
                        if (fileInZip != null)
                        {
                            //Deletes the file only if it is older than our file.
                            //Note that the file will be ignored if the existing file
                            //in the archive is newer.
                            if (fileInZip.LastWriteTime < File.GetLastWriteTime(file))
                            {
                                fileInZip.Delete();
 
                                //Adds the file to the archive
                                zipFile.CreateEntryFromFile(file, Path.GetFileName(file), compression);
                            }
                        }
                        else
                        {
                            //The file wasn't already in the zip file so add it to the archive
                            zipFile.CreateEntryFromFile(file, Path.GetFileName(file), compression);
                        }
                        break;
                    case Overwrite.Never:
                        //Don't do anything - this is a decision that you need to
                        //consider, however, since this will mean that no file will
                        //be writte.  You could write a second copy to the zip with
                        //the same name (not sure that is wise, however).
                        break;
                    default:
                        break;
                }
            }
        }
    }
}

Compression Comparisons

It was brought up in the comments that it would be beneficial to see how these zip methods compared to other commonly-available zip methods available to us.  I thought this was a great idea, so I started investigating.  To create a test pool of data, I created 10,000 text files that each had 10,000 lines of text in them.  Text files are easy to create and also easy to compress.  This gave me a uniform base to work with.  I then tested my code against the Windows "Send to Zip" and 7zip.  Here are the results:

Compression Method

Compression Level

Time Elapsed

Final Size

7zip (right-click and send to zip file)

Normal

14 minutes 19 seconds

66,698kb

Code (Debug Mode)

Optimal

10 minutes 13 seconds

62,938kb

Code (Release Mode)

Optimal

9 minutes 36 seconds

62,938kb

Windows Zip (right-click and send to zip file)

Normal

8 minutes 31 seconds

62,938kb

Code (Release Mode)

Fastest

8 minutes 5 seconds

121,600kb

7zip (zip file through UI)

Fastest

8 minutes 0 seconds

66,698kb

Note that my machine is fairly slow.  I'm sure you can get your system to run faster.  However, the thing to look at here is not the actual times but the comparisons between methods.  It looks as though the Optimal compression is the same as Windows zip (and probably uses the same libraries).  The actual compression is a bit better than 7zip but it is slower.  The first 7zip entry does confuse me a bit.  It seems to be doing the same task as the last entry, but it takes over six minutes longer.  I did repeat the test just to verify.  I also verified that I am using the latest version of 7zip.

I think the results of these tests indicate that the use of the Compression namespace in .NET 4.5 is a very viable candidate for all types of projects.  It competes well with a great competitor and the fact that you do not need to rely on a third-party library to use it, in my mind, makes this the obvious choice for archiving files.

Conclusion

The System.IO.Compression namespace in .NET 4.5 provides us with an easy way to work with zip files. We can create archives, update archives, and extract archives. This is a huge leap forward from what we had before. While there are a few shortcomings in the implementation, we saw how we can overcome them easily. I hope you found this article helpful. As always, I appreciate your constructive feedback.

History

  • May 10, 2012 - Initial Version 
  • May 11, 2012 - Added Compression Comparisons section

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here