Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Compression With Progress in VB.Net 4.5

0.00/5 (No votes)
2 Oct 2014 3  
Compress any size file with .Net 4.5 Compression namespace

Introduction

There are many examples of compressing files in VB.Net. So many of these examples are using third party libraries which is just not neccesarry. People were saying that files and archives exceeding 4Gb couldn't be used... I hate restrictions and decided to "Stick it to Microsoft"

To get a better understanding of Zip files and their contents I set about writing my own Zip file generator from the ground up (Still using Phil Katz's Zip algorythem). Eventually I successfully built my own Zip and ZIP64 archives byte by byte.

I then looked at my work and thought, now I have a grasp of the inner workings of a Zip file (With the PKZip algorythem) let's revist the compresion name space to make sure I wasn't simply wasting my time. It turns out that all I achieved from this project was to get a very solid working knowledge of Zip files, and how to build them internally as these romours about 4Gb limits etc were totally bogus, the Compression namespace converts large files (on the fly) to Zip64 as required. So...

Here is my Compression method, wrapped up nicely in an easy to use class. It uses the .Net4.5 Compression namespace and reads\writes files directly from\to your disk. If you observe your systems performance during compression, No additional memory is used.

I hope this helps a lot of people out in the future.

Section 1: The Zipper Class Breakdown

Section 2: Copy, Paste and Run. The Full Source

The Zipper Class

The properties:

As most of the property names are self-descriptive I'm not going to go through each one, however an addition could be made to the setter methods of each property to prevent changes during compression.

If we add a private member say _Compressing as a boolean to the Zipper Class and set it to true when the compression method is called and false when the process is finished, doing the following to each setter method will prevent potential errors during the compression process due to a user changing a property.

        Private _Source As String
        Public Property SourceURL As String
            Get
                Return _Source
            End Get
            Set(value As String)
                'Add this code
                If _Compressing = True Then Exit Property
                _Source = value
            End Set
        End Property

The Methods

GetSessionLength
        Private Function GetSessionLength() As Int64
            Dim sLen As Int64 = 0
            For Each SessionFile As String In _SessionFiles
                sLen += New FileInfo(SessionFile).Length
                If Cancel = True Then Exit For
            Next
            Return sLen
        End Function

When the Compression method is called all the files\file paths are added to the List(Of String) _SessionFiles. This method simply iterates through each entry and tallys the total length in bytes of all the files to be read and compressed. This information can later be used to report the current progress of the compression process.

IsDir
        Private Function IsDir(Source As String) As Int16
            If File.Exists(Source) Then
                Return 0
            ElseIf Directory.Exists(Source) Then
                Return -1
            Else
                Return 1
            End If
        End Function

This method quite simply returns a value which determins if the Source object is a file, a folder or if it doesn't exist. The value -1 or True is returned if it is actually a directory, 0 or False if it's a file or 1 if it doesnt exist. This method is used during the Zipper Constructor method New.

        If IsDir(Source) <> 1 Then
                _IsDir = IsDir(Source)
                _Source = Source
         Else
                Throw New Exception("Source file or directory _
                        doesn't exist or cannot be accessed.")
        End If

As you can see from this code, if the value returned is anything other than a Boolean value (-1 or 0) an exception is thrown. If the returned value is Boolean then the _IsDir and _Source private members are set.

InlineAssignHelper
        Private Function InlineAssignHelper(Of T)(ByRef target As T, _
        value As T) As T
            target = value
            Return value
        End Function

This method is (I believe) orginally from C# and there is no namespace for it in VB so the method has to be added manually. This method simply sets the Target reference to and returns the value of the argument "Value". It is used during the compression routine to determine how many bytes are read from a file. It basically returns the same value in two different locations (Traget and returned value) from the same call.

Compress

This is the main Public method for the user but doesn't contain the compression routine, this method prepares the information for the main compression method "ZipIt"

First, the _SessionFiles list is populated, the user can pass a directory containing sub-directories and files or a single file. If the source is a directory we use the Directory.Getfiles method to scan for all the files contained within.

    _SessionFiles = Directory.GetFiles(SourceURL, "*", SearchOption.AllDirectories)

Other than the Source argument, there is also a Pattern argument (Here I used "*" = All files) and SearchOptions (SearchOptions.AllDirectories = Include subfolders).

In this example I havent made these arguments available to the user but these options could quite easiloy be implemented.

The pattern argument, is as you might expect "*.exe" returns all files that end with .exe etc
The SearchOptions can also be set to TopDirectoriesOnly which will ignore sub-directories.

If the user has selected a single file, I still populate the _SessionFiles list but only with the individual file

    _SessionFiles = New String() {SourceURL}

Next I call the previousley mentioned GetSessionLength method and store it's value for later use

    _SessionLength = GetSessionLength()

The last thing I do before calling ZipIt is set the root directory for the entries in the zip file.

    If SourceIsDirectory And IncludeRootDir = False Then
            _RootDir = SourceURL & "\"
    Else
            _RootDir = String.Join("\", SourceURL.Split("\").ToArray, _
                  0, SourceURL.Split("\").ToArray.Length - 1) & "\"
    End If  

For those familiar with Zip files, you can opt to include or exclude the Root directory of the source object or objects. Because I don't wan't to include the entire file\directory path in the Zip entry name, I store the relevant Root path which is then removed later when creating entries in our Zip file.

For example if our File_To_Be_Zipped is "C:\Users\JoeBloggs\Documents\TestFile.doc" I only want the entry name to be "TestFile.doc" so I replace "C:\Users\JoeBloggs\Documents\" with ""

or a directory "C:\Users\JoeBloggs\Documents\My Projects\"

including the root Dir: the zip file may contain several entries like so

    My Projects\File1.doc   
    My Projects\File2.doc
    My Projects\File3.doc
    My Projects\SubDir\File1.doc    

or NOT including the root Dir: the zip file may contain several entries like so

    File1.doc   
    File2.doc
    File3.doc
    SubDir\File1.doc

Once all these details have been recorded we move onto the main compression method "ZipItUp"

ZipItUp

I start by declaring a few private members:

BlockSizeToRead is set to 1 Mib. This is the max size in bytes we attempt to read

    Dim BlockSizeToRead As Int32 = 1048576 '1Mib Buffer

Buffer is the byte arrat which will holds the read bytes in each repetiton

    Dim Buffer As Byte() = New Byte(BlockSizeToRead - 1) {}

BytesRead and TotalBytes are used to build a progress report. BytesRead holds the length of the actual amount of bytes read in each repetition

    Dim BytesRead As Int64, TotalBytesRead As Int64

LiveProg and PrevProg are used to compare current progress and previous progress in order to only update the progress if the progress percent has actually changed. Using an int value for huge compression sessions may not be reponsive enough on some slower machines

    Dim LiveProg As Int16 = 0
    Dim PrevProg As Int16 = 0

Next I determine if the ueser wants to prevent the overwriting of an existing file or remove an existing file

            If File.Exists(_Target) And OverwriteTarget = False Then
                Throw New Exception("Target File Already Exists.")
            Else
                File.Delete(_Target)
            End If

The Main Routine:

        Using FS As FileStream = New FileStream(_Target, _
                FileMode.CreateNew, FileAccess.Write)

First we Create a new file stream. It's important to note here that the FileAccess is Write. Many examples online fail due to large file sizes simply because the wrong FileAccess has been set eg ReadWrite will proccess the filestream in memory before committing to disk. This means that not only are you filling up your RAM (Which happens very fast) and can cause an OutOfMemoryException with larger files it also renders this progress report useless as the file is compressed to memory very fast. On terminating the filestream with End Using, the stream is then written to disk, again for larger files this can take an additional 20 seconds meaning the progress bar is sitting at 100% whilst the user is still left waiting for a (without disk read\write calculations) undetermind amount of time.

Next a new Archive is created and attatch it to our filestream

        Using Archive As ZipArchive = New ZipArchive(FS, _
                ZipArchiveMode.Create)

And create a new ZipEntry object

    Dim Entry As ZipArchiveEntry = Nothing

We then begin iterating through the files to be compressed.

    For Each SessionFile As String In _SessionFiles

SessionFile holds the current file to be added to the archive in each repetition. We will create a new filestream, this time to read the bytes of the current file to be added to the file. Again ensuring the FileAccess is Read

    Using Reader As FileStream = File.Open(SessionFile, _
            FileMode.Open, FileAccess.Read)

We now create a new ArchiveEntry handle

    Entry = Archive.CreateEntry(SessionFile.Replace(_RootDir, ""), _
                                    _Compression)

You can now see the _RootDir value in action. The entry name will be stripped of the fully qualified path and switched out to a relative value using the Replace method. The compression argument is set by the user, These are Optimal, Fastest or None.

We then create a stream to our Entry within the archive

        Using Writer As Stream = Entry.Open()

And read the source files bytes in chunks of up to 1Mib

    While (InlineAssignHelper(BytesRead, _
                    Reader.Read(Buffer, 0, Buffer.Length - 1))) > 0

Here you can see the InlineAssignHelper method in use. It's returned value is used by the While statement to see if it has reached the end of the file. BytesRead is also populated with the same value, which will be used in just a second.

Immediatly after we have read a chunk of bytes we write it back to the entry stream

    Writer.Write(Buffer, 0, BytesRead)

And the TotalBytesRead is updated

    TotalBytesRead += BytesRead

Checking and updating the progress:

    LiveProg = CInt((100 / _SessionLength) * TotalBytesRead)

LiveProg holds the current state of the progress, as this is an integer, for larger sessions (infact, anything over 100 bytes) this value may get updated any number of times without the progress actually changing. There's no point in attempting to update a UI object repeatedly unless the value has actually changed.

This next parts checks to see if the progress has changed and if so update the UI, whilst recording the new progress for later comparison.

    If LiveProg <> PrevProg Then
            PrevProg = LiveProg
            RaiseEvent Progress(LiveProg)
    End If

A try\Catch clause has been implemented to capture troublesome files, this tends to catch files that are protected or opened by other proccess on your system.

    Catch Ex As Exception
        TotalBytesRead += New FileInfo(SessionFile).Length
        Console.WriteLine(String.Format("Unable to add file to _
            archive: {0} Error:{1}", SessionFile, Ex.Message))
    End Try

It is important if an error does occur to update the TotalBytesRead manually from this point. As the file is skipped (Normally at the Read statement) the routines progress wont be updated accordingly and will leave the user with an un-accurate progress position.

Example if this session has 10 files, each 100Mib in length and one of those files is skipped, The TotalBytesRead wouldn't be updated in the main loop. The progress would report 70% complete when it's 80% complete or even 90% complete when the whole process has actually finished.

Cancel

Throughout the code you will see various references to the _Cancel member. Even if an instance of Zipper is running on a thread the user can set Cancel to True. If this happens the current proccess and all proceeding processes will be skipped.

Well, that's pretty much it. I hope this has given you a good insight to my method for compressing files of any size using the Compression namespace in .Net 4.5

 

Copy, Paste and Run.

The Full Source for the Zipper Class.

'INCLUDE FOLLOWING REFERENCES Requires .Net 4.5
'System.IO.Compression
'System.IO.Compression.FileSystem
Imports System.IO
Imports System.IO.Compression

Public Class Zipper

    Public Event Progress(Percent As Integer)
    Public Event Complete()
    Public Event StatusChanged(Status As String)

    Private _Cancel As Boolean
    Public Property Cancel As Boolean
        Get
            Return _Cancel
        End Get
        Set(value As Boolean)
            If _Cancel = True Then Exit Property
            _Cancel = value
        End Set
    End Property

    Private _Compression As CompressionLevel
    Public Property CompressionLevel As CompressionLevel
        Get
            Return _Compression
        End Get
        Set(value As CompressionLevel)
            _Compression = value
        End Set
    End Property

    Private _Target As String
    Public Property TargetURL As String
        Get
            Return _Target
        End Get
        Set(value As String)
            _Target = value
        End Set
    End Property

    Private _Source As String
    Public Property SourceURL As String
        Get
            Return _Source
        End Get
        Set(value As String)
            _Source = value
        End Set
    End Property

    Private _IsDir As Boolean
    Public ReadOnly Property SourceIsDirectory
        Get
            Return _IsDir
        End Get
    End Property

    Private _Overwrite As Boolean
    Public Property OverwriteTarget As Boolean
        Get
            Return _Overwrite
        End Get
        Set(value As Boolean)
            _Overwrite = value
        End Set
    End Property

    Private _IncludeRootDir As Boolean
    Public Property IncludeRootDir As Boolean
        Get
            Return _IncludeRootDir
        End Get
        Set(value As Boolean)
            _IncludeRootDir = value
        End Set
    End Property

    Private _SessionLength As Int64
    Private _SessionFiles As String()
    Private _RootDir As String

    Public Sub New(Source As String, Target As String, CompressionLevel As CompressionLevel)
        _Overwrite = False
        _IncludeRootDir = True
        _Target = Target
        _Compression = CompressionLevel
        _Cancel = False
        If IsDir(Source) <> 1 Then
            _IsDir = IsDir(Source)
            _Source = Source
        Else
            Throw New Exception("Source file or directory doesn't exist or cannot be accessed.")
        End If
    End Sub

    Private Function GetSessionLength() As Int64
        Dim sLen As Int64 = 0
        For Each SessionFile As String In _SessionFiles
            sLen += New FileInfo(SessionFile).Length
            If Cancel = True Then Exit For
        Next
        Return sLen
    End Function

    Private Function IsDir(Source As String) As Int16
        If File.Exists(Source) Then
            Return 0
        ElseIf Directory.Exists(Source) Then
            Return -1
        Else
            Return 1
        End If
    End Function

    Public Sub Compress()
        RaiseEvent StatusChanged("Gathering Required Information.")
        If SourceIsDirectory Then
            _SessionFiles = Directory.GetFiles(SourceURL, "*", SearchOption.AllDirectories)
        Else
            _SessionFiles = New String() {SourceURL}
        End If

        RaiseEvent StatusChanged("Examining Files.")

        _SessionLength = GetSessionLength()

        If SourceIsDirectory And IncludeRootDir = False Then
            _RootDir = SourceURL & "\"
        Else
            _RootDir = String.Join("\", SourceURL.Split("\").ToArray, _
                                   0, SourceURL.Split("\").ToArray.Length - 1) & "\"
        End If

        RaiseEvent StatusChanged("Compressing.")

        Try
            ZipItUp()
        Catch ex As Exception
            MsgBox(ex.Message)
            Exit Sub
        End Try

        If Cancel = True Then
            RaiseEvent StatusChanged("Cancelled.")
            RaiseEvent Progress(100)
        Else
            RaiseEvent StatusChanged("Complete.")
        End If

        RaiseEvent Complete()

    End Sub

    Private Sub ZipItUp()


        If Cancel = True Then Exit Sub
        Dim BlockSizeToRead As Int32 = 1048576 '1Mib Buffer
        Dim Buffer As Byte() = New Byte(BlockSizeToRead - 1) {}
        Dim BytesRead As Int64, TotalBytesRead As Int64
        Dim LiveProg As Int16 = 0
        Dim PrevProg As Int16 = 0

        If File.Exists(_Target) And OverwriteTarget = False Then
            Throw New Exception("Target File Already Exists.")
        Else
            File.Delete(_Target)
        End If

        Using FS As FileStream = New FileStream(_Target, FileMode.CreateNew, FileAccess.Write)
            Using Archive As ZipArchive = New ZipArchive(FS, ZipArchiveMode.Create)
                Dim Entry As ZipArchiveEntry = Nothing
                For Each SessionFile As String In _SessionFiles
                    Try
                        Using Reader As FileStream = File.Open(SessionFile, FileMode.Open, _
                                                FileAccess.Read)
                            Entry = Archive.CreateEntry(SessionFile.Replace(_RootDir, ""), _
                                _Compression)
                            Using Writer As Stream = Entry.Open()
                                While (InlineAssignHelper(BytesRead, _
                                    Reader.Read(Buffer, 0, Buffer.Length - 1))) > 0
                                    Writer.Write(Buffer, 0, BytesRead)
                                    TotalBytesRead += BytesRead
                                    LiveProg = CInt((100 / _SessionLength) * TotalBytesRead)
                                    If LiveProg <> PrevProg Then
                                        PrevProg = LiveProg
                                        RaiseEvent Progress(LiveProg)
                                    End If
                                    If Cancel = True Then Exit While
                                End While
                            End Using
                        End Using
                    Catch Ex As Exception
                        TotalBytesRead += New FileInfo(SessionFile).Length
                        Console.WriteLine(String.Format("Unable to add file to archive: _
                                  {0} Error:{1}", SessionFile, Ex.Message))
                    End Try
                    If Cancel = True Then Exit For
                Next
            End Using
        End Using
        If Cancel = True Then
            File.Delete(_Target)
        End If
    End Sub

    Private Function InlineAssignHelper(Of T)(ByRef target As T, value As T) As T
        target = value
        Return value
    End Function
End Class

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here