Introduction
We were backing up all of the files in a directory for an application. However, not all of the files get changed all of the time. Some of them are very large. Occasionally, only a couple of small files get changed. We were backing up the entire data folder every time, and needed to reduce the time and storage capacity for a backup. What this article goes to show is that a full backup and then an incremental will cut down the backup time substantially. The additional cost is the restore time, which is done only when needed.
Using the Code
To start, I needed a zip compression routine to zip the files up and restore them. This was the first problem I encountered. I was able to find the routines, and an article on how to use them. Thanks to VB Rocks and his article Zip Files Easy! This at least got me to understand how to zip the files. I found that there are issues with this methodology. I found the zip routines also include a file that is used to show what is in the compressed file. I also found these routines will only uncompress the files that were created with these routines but not all of the zip files that Windows creates. For my purpose, that was fine since I was only using this program as a proof of concept. In the final version, we probably will use GZip and not Zip.
The normal backup methodology is to do a full backup first and then do the incremental ones (differences from the full) which are done every time after. In my case, I was going to schedule the Full on Sunday, and then the incremental ones for Monday through Friday. The process then starts all over. To do a restore, you restore the full backup and then each incremental (in order) until the entire incremental backup set is finished. To test this, the program would take a source directory and a directory to place the zip file into. The resulting filename is a combination of the type of backup (Full, Incremental) and what sequence it is. To store the sequences of the full and increment, I use an XML file to save the settings. A good article Quick and Dirty Settings Persistence with XML by circumpunct describes a quick and easy way to save settings in an XML file, which I used. Each time a backup is done, the program updates the setting so the next time they use different file names. How I distinguish the sequences is a full backup has a name like BackupFull-(Full backup sequence#).zip. Each time a full backup is done, the sequence number is incremented. The incremental file name is BackupInc-(Full backup sequence#)-(Incremental sequence#).zip. The following screenshot shows several backups to illustrate this technique.
This routine allows a restore to pick a sequence number, get the full backup, and then get all the incremental versions that have the full backup number, then restore each one in proper order. This is very similar to a tape backup system.
This sample program does a compress and an FTP backup. I did both since they were fairly easy to implement and showed the concept. On the restore, I wanted to only do the uncompress since the FTP restore would be more time consuming to write and would not show much more of the concept. The code below shows how to do the compress depending on if it is a full backup or not. How I handled the incremental is to check the archive bit for each file. If it is set, then we do not backup the file since the last backup. Once the file(s) are backed up, the archive bit is cleared so we do not back it up again the next time, unless it is actually changed.
public void ZipFiles(string zipPath, string sourceDirectory, BackupType bcktyp)
{
DirectoryInfo di = new DirectoryInfo(sourceDirectory);
FileInfo[] filess = di.GetFiles();
FileAttributes fileAttributes;
Package zip = ZipPackage.Open(zipPath, FileMode.OpenOrCreate, FileAccess.ReadWrite);
for (int ii = 0; ii < filess.Length; ii++)
{
if (bcktyp == BackupType.Incremental)
{
fileAttributes = File.GetAttributes(filess[ii].FullName);
bool isArchive = ((File.GetAttributes(filess[ii].FullName) &
FileAttributes.Archive) == FileAttributes.Archive);
if (isArchive)
{
AddToArchive(zip, filess[ii].FullName);
}
}
else
{
AddToArchive(zip, filess[ii].FullName);
}
File.SetAttributes(filess[ii].FullName, FileAttributes.Normal);
}
zip.Close();
}
private void AddToArchive(Package zip, string fileToAdd)
{
string uriFileName = fileToAdd.Replace(" ", "_");
string zipUri = string.Concat("/", Path.GetFileName(uriFileName));
Uri partUri = new Uri(zipUri, UriKind.Relative);
string contentType = MediaTypeNames.Application.Zip;
PackagePart pkgPart = zip.CreatePart(partUri, contentType, CompressionOption.Normal);
Byte[] bites = File.ReadAllBytes(fileToAdd);
pkgPart.GetStream().Write(bites, 0, bites.Length);
}
public void UnZipFiles(string zipPath, string destinationDirecory)
{
PackagePartCollection ppc;
Package zip = ZipPackage.Open(zipPath, FileMode.Open, FileAccess.Read);
ppc = zip.GetParts();
foreach (PackagePart pp in ppc)
{
string fileName = pp.Uri.OriginalString.Substring(1);
Stream stream = pp.GetStream();
int il = (int)stream.Length - 1;
Byte[] bites = new Byte[il] ;
stream.Read(bites, 0, bites.Length);
fileName = fileName.Replace("_", " ");
File.WriteAllBytes(String.Concat(destinationDirecory, "\\", fileName), bites);
}
zip.Close();
}
This following is the code that I use to save and get the backup sequence number and the increment number:
private void GetSettings()
{
Settings settings;
fullbacknum = 1;
Incnumber = 1;
settings = new Settings();
fullbacknum = settings.GetSetting("BackupNumber", fullbacknum);
Incnumber = settings.GetSetting("Incnumber", Incnumber);
}
private void SaveSettings()
{
Settings settings;
settings = new Settings();
settings.PutSetting("BackupNumber", fullbacknum);
settings.PutSetting("Incnumber", Incnumber);
}
private void btnCompress_Click(object sender, EventArgs e)
{
string strsourcedir = "";
string strzipdir = "";
string strfilename ="";
HCompress hc;
BackupType bt = BackupType.Full;
GetSettings();
errorProvider1.Clear();
if (txtSourceDir.Text.Length == 0)
{
errorProvider1.SetError(txtDestinationDir, "Enter Source dir");
return;
}
if (txtZipDir.Text.Length == 0)
{
errorProvider1.SetError(txtZipFile, "Enter zip dir");
return;
}
strsourcedir = txtSourceDir.Text;
strzipdir = txtZipDir.Text;
if (rdobtnFull.Checked)
{
bt = BackupType.Full;
fullbacknum++;
strfilename = string.Format("BackupFull-{0}.zip", fullbacknum);
Incnumber = 0;
}
if (rdobtnInc.Checked)
{
bt = BackupType.Incremental;
Incnumber++;
strfilename = string.Format("BackupInc-{0}-{1}.zip", fullbacknum, Incnumber);
}
strfilename = strzipdir + "\\" + strfilename;
txtFilename.Text = strfilename;
hc = new HCompress();
hc.ZipFiles(strfilename, strsourcedir, bt);
SaveSettings();
}
The following chart tries to show the amount of space we will be saving. I used 1000 files as an example of the total number of files there might be and what percentage of the total are in each size category to calculate the total amount of disk storage that would be used. This shows that a typical backup is 7.3 Gig every night using a full backup methodology. The second set of numbers shows what percentage of each file size category might change every night. You can change the percentages to explore the various sizes, but I used this as an example. The incremental is only 1.6 Gig, which is a sizable amount of disk storage saved, not to mention the time to process this much data. If we were to see what doing a full backup every night would cost, it is a total of 36.5 Gig compared to doing a full and incremental which is 14 Gig. These are very good numbers. Now, you can see why tape backup companies have always used these methodologies.
| % of files | # Files in data folder | Size (Meg) | Tot Size (Meg) | | | Meg |
1000 | 10 | 100 | 1 | 100 | | Full Backup | 7300 |
| 30 | 300 | 4 | 1200 | | | |
| 30 | 300 | 5 | 1500 | | Incremental backup | 1680 |
| 20 | 200 | 10 | 2000 | | | |
| 10 | 100 | 25 | 2500 | | | |
| 100 | | | 7300 | | | |
| | | | | | ALL FULL | 36500 |
| | | | | | | |
| | | | | | Full/Inc | 14020 |
| % change | Tot Size (Meg) | Change Size (Meg) | | | | |
| 10 | 100 | 10 | | | | |
| 10 | 1200 | 120 | | | | |
| 30 | 1500 | 450 | | | Restore Full | 7300 |
| 30 | 2000 | 600 | | | | |
| 20 | 2500 | 500 | | | Restore Inc | 14020 |
| 100 | | 1680 | | | | |
Points of Interest
I have not found an example that works for the zip file compression routines. Of all the articles I have read, there are two compression libraries that I will be looking at. The first is SharpZipLib and the second is DotNetZip. Both have many very good comments about them, and they are free to use.
History
- 16 May 2009 - Initial release.