Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / XML

Backup of Data files - Full and Incremental

4.83/5 (8 votes)
16 May 2009CPOL5 min read 81.7K   4.1K  
Full and incremental backup of files.

main programcscreen

Introduction

We were backing up all of the files in a directory for an application. However, not all of the files get changed all of the time. Some of them are very large. Occasionally, only a couple of small files get changed. We were backing up the entire data folder every time, and needed to reduce the time and storage capacity for a backup. What this article goes to show is that a full backup and then an incremental will cut down the backup time substantially. The additional cost is the restore time, which is done only when needed.

Using the Code

To start, I needed a zip compression routine to zip the files up and restore them. This was the first problem I encountered. I was able to find the routines, and an article on how to use them. Thanks to VB Rocks and his article Zip Files Easy! This at least got me to understand how to zip the files. I found that there are issues with this methodology. I found the zip routines also include a file that is used to show what is in the compressed file. I also found these routines will only uncompress the files that were created with these routines but not all of the zip files that Windows creates. For my purpose, that was fine since I was only using this program as a proof of concept. In the final version, we probably will use GZip and not Zip.

The normal backup methodology is to do a full backup first and then do the incremental ones (differences from the full) which are done every time after. In my case, I was going to schedule the Full on Sunday, and then the incremental ones for Monday through Friday. The process then starts all over. To do a restore, you restore the full backup and then each incremental (in order) until the entire incremental backup set is finished. To test this, the program would take a source directory and a directory to place the zip file into. The resulting filename is a combination of the type of backup (Full, Incremental) and what sequence it is. To store the sequences of the full and increment, I use an XML file to save the settings. A good article Quick and Dirty Settings Persistence with XML by circumpunct describes a quick and easy way to save settings in an XML file, which I used. Each time a backup is done, the program updates the setting so the next time they use different file names. How I distinguish the sequences is a full backup has a name like BackupFull-(Full backup sequence#).zip. Each time a full backup is done, the sequence number is incremented. The incremental file name is BackupInc-(Full backup sequence#)-(Incremental sequence#).zip. The following screenshot shows several backups to illustrate this technique.

examples of filenames

This routine allows a restore to pick a sequence number, get the full backup, and then get all the incremental versions that have the full backup number, then restore each one in proper order. This is very similar to a tape backup system.

This sample program does a compress and an FTP backup. I did both since they were fairly easy to implement and showed the concept. On the restore, I wanted to only do the uncompress since the FTP restore would be more time consuming to write and would not show much more of the concept. The code below shows how to do the compress depending on if it is a full backup or not. How I handled the incremental is to check the archive bit for each file. If it is set, then we do not backup the file since the last backup. Once the file(s) are backed up, the archive bit is cleared so we do not back it up again the next time, unless it is actually changed.

C#
/// <summary>
/// Zip the files up depending on if this is a full or incremental. 
/// </summary>
/// <param name="zipPath">The directory that will contain the zip file when done</param>
/// <param name="sourceDirectory">The directory that contains the source files to zip</param>
/// <param name="bcktyp">The type of backup</param>
public void ZipFiles(string zipPath, string sourceDirectory, BackupType bcktyp)
{
    DirectoryInfo di = new DirectoryInfo(sourceDirectory);
    FileInfo[] filess = di.GetFiles();
    FileAttributes fileAttributes;

    // Open the zip file if it exists, else create a new one 
    Package zip = ZipPackage.Open(zipPath, FileMode.OpenOrCreate, FileAccess.ReadWrite);

    //Add as many files as you like
    for (int ii = 0; ii < filess.Length; ii++)
    {
        if (bcktyp == BackupType.Incremental)
        {
            // get the attibutes
            fileAttributes = File.GetAttributes(filess[ii].FullName);

            // check whether a file has archive attribute
            bool isArchive = ((File.GetAttributes(filess[ii].FullName) & 
                 FileAttributes.Archive) == FileAttributes.Archive);

            // if the archive bit is set then clear it
            if (isArchive)
            {
                // add to the archive file
                AddToArchive(zip, filess[ii].FullName);
            }
        }
        else
        {
            // add to the archive file
            AddToArchive(zip, filess[ii].FullName);
        }

        // clear the bit we archived it 
        File.SetAttributes(filess[ii].FullName, FileAttributes.Normal);
    }
    // Close the zip file
    zip.Close();
}
/// <summary>
/// Add the file to the zpi package
/// </summary>
/// <param name="zip">The package to add the fiel to </param>
/// <param name="fileToAdd">The fielname to add</param>
private void AddToArchive(Package zip, string fileToAdd)
{
    // Replace spaces with an underscore (_) 
    string uriFileName = fileToAdd.Replace(" ", "_");

    // A Uri always starts with a forward slash "/" 
    string zipUri = string.Concat("/", Path.GetFileName(uriFileName));

    Uri partUri = new Uri(zipUri, UriKind.Relative);
    string contentType = MediaTypeNames.Application.Zip;

    // The PackagePart contains the information: 
    // Where to extract the file when it's extracted (partUri) 
    // The type of content stream (MIME type):  (contentType) 
    // The type of compression:  (CompressionOption.Normal)   
    PackagePart pkgPart = zip.CreatePart(partUri, contentType, CompressionOption.Normal);

    // Read all of the bytes from the file to add to the zip file 
    Byte[] bites = File.ReadAllBytes(fileToAdd);

    // Compress and write the bytes to the zip file 
    pkgPart.GetStream().Write(bites, 0, bites.Length);
}

/// <summary>
/// Unzip the files
/// </summary>
/// <param name="zipPath">The file to unzip</param>
/// <param name="destinationDirecory">The destination directory</param>
public void UnZipFiles(string zipPath, string destinationDirecory)
{
    PackagePartCollection ppc;
    // Open the zip file if it exists, else create a new one 
    Package zip = ZipPackage.Open(zipPath, FileMode.Open, FileAccess.Read);

    ppc = zip.GetParts();

    foreach (PackagePart pp in ppc)
    {
        // Gets the complete path without the leading "/"
        string fileName = pp.Uri.OriginalString.Substring(1);

        Stream stream = pp.GetStream();

        // Read all of the bytes from the file to add to the zip file
        int il = (int)stream.Length - 1;
        Byte[] bites = new Byte[il] ;

        stream.Read(bites, 0, bites.Length);

        fileName = fileName.Replace("_", " ");  // replace underscore with space
        File.WriteAllBytes(String.Concat(destinationDirecory, "\\", fileName), bites);
    }
    // Close the zip file
    zip.Close();
}

This following is the code that I use to save and get the backup sequence number and the increment number:

C#
private void GetSettings()
{
    Settings settings;
    fullbacknum = 1;
    Incnumber = 1;

    settings = new Settings();

    fullbacknum = settings.GetSetting("BackupNumber", fullbacknum);
    Incnumber = settings.GetSetting("Incnumber", Incnumber);
}

private void SaveSettings()
{
    Settings settings;

    settings = new Settings();

    settings.PutSetting("BackupNumber", fullbacknum);
    settings.PutSetting("Incnumber", Incnumber);
}

private void btnCompress_Click(object sender, EventArgs e)
{
    string strsourcedir = "";    // the directory we want to compress
    string strzipdir = "";      // where the zip file is to be created
    string strfilename ="";     // the final filename we want to create
    HCompress   hc;
    BackupType bt = BackupType.Full;

    GetSettings();

    errorProvider1.Clear();
    if (txtSourceDir.Text.Length == 0)
    {
        errorProvider1.SetError(txtDestinationDir, "Enter Source dir");
        return;
    }
    if (txtZipDir.Text.Length == 0)
    {
        errorProvider1.SetError(txtZipFile, "Enter zip dir");
        return;
    }

    // Get the directory locations
    strsourcedir = txtSourceDir.Text;
    strzipdir = txtZipDir.Text;

    // check what radio button is enabled
    if (rdobtnFull.Checked)
    {
        bt = BackupType.Full;
        fullbacknum++;
        strfilename = string.Format("BackupFull-{0}.zip", fullbacknum);
        Incnumber = 0;  // On a full backup reset the increment number
    }
    if (rdobtnInc.Checked)
    {
        bt = BackupType.Incremental;
        Incnumber++;
        strfilename = string.Format("BackupInc-{0}-{1}.zip", fullbacknum, Incnumber);
    }
    strfilename = strzipdir + "\\" + strfilename;
    txtFilename.Text = strfilename;

    hc = new HCompress();
    hc.ZipFiles(strfilename, strsourcedir, bt);

    SaveSettings();
}

The following chart tries to show the amount of space we will be saving. I used 1000 files as an example of the total number of files there might be and what percentage of the total are in each size category to calculate the total amount of disk storage that would be used. This shows that a typical backup is 7.3 Gig every night using a full backup methodology. The second set of numbers shows what percentage of each file size category might change every night. You can change the percentages to explore the various sizes, but I used this as an example. The incremental is only 1.6 Gig, which is a sizable amount of disk storage saved, not to mention the time to process this much data. If we were to see what doing a full backup every night would cost, it is a total of 36.5 Gig compared to doing a full and incremental which is 14 Gig. These are very good numbers. Now, you can see why tape backup companies have always used these methodologies.

% of files# Files in data folderSize (Meg)Tot Size (Meg)Meg
1000101001100Full Backup7300
3030041200
3030051500Incremental backup1680
20200102000
10100252500
1007300
ALL FULL36500
Full/Inc14020
% changeTot Size (Meg)Change Size (Meg)
1010010
101200120
301500450Restore Full7300
302000600
202500500Restore Inc14020
1001680

Points of Interest

I have not found an example that works for the zip file compression routines. Of all the articles I have read, there are two compression libraries that I will be looking at. The first is SharpZipLib and the second is DotNetZip. Both have many very good comments about them, and they are free to use.

History

  • 16 May 2009 - Initial release.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)