Introduction
In general, if we need to detect changes in file system or directory of files, we generally use file system watcher provided in .NET. However, after learning its side effects, it seems that it is just a suggestive class which is not having any real benefits as such. Another reason not to use file system watcher class is that, it generally doesn't care about file content and it takes care about file system in general. So I have found hashing a better way.
Background
In this article, I will try to answer a common question programmers ask about hashing, i.e., what time it will take to compute hash of files in my directory, what if I am having sub folders in parent folder. Will it be fast enough for normal application deployment file structures having few Mbs of files. To answer these questions, I wrote a small utility and ran it on my file structure with around 45 files having few Mbs of size of whole directory. And the result was fast enough. It took only 50-60 milliseconds to compute hash and it took the same time to validate the hash.
Using the Code
Please observe below code file. I tried computing hash in both MD5 and SHA1 hash algos. Both algos take the same time to hash file content. Please note that we are here hashing actual file content. If there would be any change in file content, even a new space or a character, hash of whole file will be changed. However, it is also important to note that any change in file attributes like last file modification time, etc. won't effect hash result.
public class DeploymentFile
{
public string FilePath { get; set; }
public bool IsFilePathValid { get; set; }
public string HashedValue { get; set; }
public bool IsFileModified { get; set; }
public DeploymentFile(string filePath)
{
FilePath = filePath;
IsFilePathValid = true;
IsFileModified = false;
if (File.Exists(filePath))
HashedValue = ComputeHashSHA(filePath);
else
IsFilePathValid = false;
}
public bool IsExist(string FilePath)
{
return File.Exists(FilePath);
}
public string ComputeHashSHA(string filename)
{
using (var sha = SHA1.Create())
{
using (var stream = File.OpenRead(filename))
{
return (Encoding.Default.GetString(sha.ComputeHash(stream)));
}
}
}
}
Shown below is code for Form
which displays all controls. You may observe that I am using a stopwatch to measure the time taken for whole process of computation of hash.
IMPORTANT: Please note that if message box appears, the stopwatch measures all time while user clicks and closes the message box. So to measure accurately, one may disable the message box.
public partial class FileValidator : Form
{
public FileValidator()
{
InitializeComponent();
}
List<DeploymentFile> DeployList;
List<DeploymentFile> ValidationList;
String filePath;
#region ComputeHash
private void ComputeHash_Click(object sender, EventArgs e)
{
DeployList = new List<DeploymentFile>();
foreach (var item in GetListOfFilesInDeployFolder())
DeployList.Add(new DeploymentFile(item));
FilesGrid.DataSource = DeployList;
}
#endregion ComputeHash
#region ValidateFileHash
private void ValidateHash_Click(object sender, EventArgs e)
{
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
bool Abort = false;
List<string> filesList = GetListOfFilesInDeployFolder();
ValidationList = new List<DeploymentFile>();
foreach (var item in DeployList)
ValidationList.Add(new DeploymentFile(item.FilePath));
for (int i = 0; i < ValidationList.Count; i++)
{
if (ValidationList.Count != filesList.Count) Abort = true;
if (ValidationList[i].FilePath != filesList[i]) Abort = true;
}
if (!Abort && ValidationList.Exists((x)=>x.IsFilePathValid==false))
Abort = true;
if (Abort)
{
MessageBox.Show("Files/Folder structure changed or modified since last check");
}
if(!Abort)
{
for (int i = 0; i < ValidationList.Count; i++)
if (ValidationList[i].HashedValue != DeployList[i].HashedValue)
{
ValidationList[i].IsFileModified = true;
Abort = true;
}
}
FilesGrid.DataSource = ValidationList;
stopwatch.Stop();
label1.Text = "Time taken in Validation : " + stopwatch.Elapsed;
}
#endregion
private List<string> GetListOfFilesInDeployFolder()
{
filePath = textBox1.Text;
return Directory.GetFiles(@filePath,"*",SearchOption.AllDirectories).ToList();
}
private void FileValidator_Load(object sender, EventArgs e)
{
FilesGrid.AutoSizeColumnsMode = DataGridViewAutoSizeColumnsMode.DisplayedCells;
}
}
The above screenshot displays the time taken for computation and validation of hash. If there will be some file modification in between compute hash button click and check for modifications button click, then those modifications will display in IsFilemodified
column. I am also recording the file structure and comparing it with file structure, any change in file path will be shown in IsFilePathValid
column.
Points of Interest
It is interesting to find out that SHA1 and MD5 algo take similar time for fewer files. If file count increases and file size increases, MD5 algo is more efficient than SHA1. However, SHA1 is more trustful in the developer circles. I think MD5 is better because we are not really challenging security here, we are more concerned about integrity of file content.