Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / web / ASP.NET
Print

Checksum Verification

5.00/5 (3 votes)
22 Sep 2011CPOL3 min read 53.8K   1.4K  
Loop through all files in a folder/subfolders and run a checksum that is stored in a database. E-mail descrepancies and results.

Introduction

More and more we are seeing the importance of storage, archiving, and conservation of digital assets, potentially for decades or centuries. Digital Asset Management systems are mostly focused on ingest and user access, but what about the assets themselves? People are spending a lot of time creating videos, images, and other media. It makes sense that we should verify the data integrity of digital assets on a regular basis and alert when there are discrepancies. The program below does that using SHA-256 checksum. This is a console application written for .NET 4.0.

Background

Prerequisites

  • MySQL Server (and GUI Tools)* available at www.mysql.com
  • MySQL Connector/NET
  • *The database platform is MySQL but it could easily function with SQL Server. One of the things I like about MySQL Connector/NET is that the syntax for working with the database is almost the exact same as MS SQL Server (i.e., SqlCommand becomes MySqlCommand).

  • .NET 4.0/Visual Studio 2010.
  • Gmail account or other for authenticating and sending e-mail via SMTP.

Use Cases

The core Use Case is to check all files in a folder/subfolders and verify that the checksum value for a file has not changed. If it has, then a tech or business owner should be notified by e-mail. The program below does this; it also reports the total number of files, the number of unchanged files, and the number of new files.

The program writes a summary report into the body of the e-mail and attaches two spreadsheets (.csv) with detailed information.

Certain parts of the program are configurable through the app.config file, namely:

  • The folder/subfolders to verify
  • E-mail send to address
  • E-mail sent from address
  • Names and location of file attachments
  • Name and location of log files

The example uses a folder structure that looks like this:

folders.jpg

Using the code

First we will need a database. The attached file named DRMC.sql can be restored to MySQL or you can open the file in Notepad++ and see 1 table, 1 view, and 5 Stored Procedures. The program assumes that the database is called "drmc".

The entire program is in the download file. A few areas that are useful to point out are below:

Connecting and working with the MySQL database via a Stored Procedure is not always straightforward. The Stored Procedures contain some OUT parameters in order to get total counts.

The code below shows how to enumerate all the files in a folder and subfolders, run the checksum, and insert the results into a table:

C#
// Connect to database
string m_conn = 
  ConfigurationManager.ConnectionStrings["MySqlConnectionString"].ConnectionString;
MySqlConnection conn = new MySqlConnection(m_conn);
conn.Open();

try
{
    // Get the checksum and path for all files in directory.
    // Use stored proc to insert data. Log results and errors.
    MySqlCommand cmd = new MySqlCommand("drmc.proc_checksum", conn);
    cmd.CommandType = CommandType.StoredProcedure;

    string[] m_files = Directory.GetFiles(m_path, "*.*", 
                                 SearchOption.AllDirectories);

    foreach (string m_file in m_files)
    {
        m_filename = Path.GetFileName(m_file);
        cmd.Parameters.Clear();
        cmd.Parameters.Add(new MySqlParameter(@"M_FILEPATH", 
            MySqlDbType.VarChar) { Value = m_file.Replace("\\", "\\\\") });
        cmd.Parameters.Add(new MySqlParameter(@"M_FILENAME", 
            MySqlDbType.VarChar) { Value = m_filename });
        cmd.Parameters.Add(new MySqlParameter(@"M_SHA256", 
            MySqlDbType.VarChar) { Value = GetChecksum(m_file).ToString() });
        cmd.ExecuteNonQuery();

        using (StreamWriter sw = File.AppendText(m_results))
        {
            Logger.LogMessage("File " + m_file + 
                   " inserted in database.", sw);
            sw.Close();
        }
    }
  ...

It is also good to know how to work with a MySqlDataAdapter as well as how to get an OUT parameter from ExecuteNonQuery. The snippet below shows how to do this.

C#
// Get number and listing of new files
MySqlCommand cmd_newfiles = new MySqlCommand("drmc.proc_newfiles", conn);
cmd_newfiles.CommandType = CommandType.StoredProcedure;

cmd_newfiles.Parameters.AddWithValue("@M_NEWCOUNT", MySqlDbType.Int32);
cmd_newfiles.Parameters["@M_NEWCOUNT"].Direction = ParameterDirection.Output;
cmd_newfiles.ExecuteNonQuery();

Console.WriteLine("New files: " + cmd_newfiles.Parameters["@M_NEWCOUNT"].Value);
Console.WriteLine("\r\n");

string str_newfiles = cmd_newfiles.Parameters["@M_NEWCOUNT"].Value.ToString();

MySqlDataAdapter sda_newfiles = new MySqlDataAdapter(cmd_newfiles);
DataSet ds_newfiles = new DataSet();
ds_newfiles.DataSetName = "New Files";
sda_newfiles.Fill(ds_newfiles);
sda_newfiles.Dispose();

Finally, we start to build an e-mail with the results and send that, including two attachments created with the FileGenerator class.

C#
// Begin creating email content and attachments
DataTable dt_newfiles = ds_newfiles.Tables[0];
DataTable dt_changedfiles = ds_changedfiles.Tables[0];
DataTable dt_changedfiles1 = ds_changedfiles.Tables[1];

FileGenerator.CreateFile(dt_changedfiles, m_changedfiles).ToString();

string m_emailbody = "This e-mail is a summary of checksum file integrity " + 
                     "for files located here: \r\n\r\n" + m_path + "\r\n\r\n";
m_emailbody = m_emailbody + "There are a total of " + 
              str_totalfiles + " files. \r\n\r\n";
m_emailbody = m_emailbody + 
              "The file location, file name, and checksum are the same for " + 
              str_samefiles + " files. \r\n\r\n";
m_emailbody = m_emailbody + "There are " + str_newfiles + 
              " new files. These are listed below, if any.  " + 
              "Detailed information is in the attached " + 
              "file checksum_new_files.csv\r\n\r\n";
m_emailbody = m_emailbody + "There are " + str_changedfiles + 
              " files where the CHECKSUM HAS BEEN CHANGED. " + 
              "The integrity of the file is in doubt, " + 
              "or it has been changed by a user.  " + 
              "The files are listed below, if any.  Detailed information " + 
              "is in the attached file checksum_changed_files.csv\r\n\r\n";

string m_emailnewbody = "New Files: \r\n" + 
       FileGenerator.CreateFile(dt_newfiles, m_newfiles).ToString();
string m_emailchangedbody = "Changed Files: \r\n" + 
       FileGenerator.CreateFile(dt_changedfiles1).ToString();

m_emailbody = m_emailbody + m_emailnewbody + "\r\n" + m_emailchangedbody;

#endregion

#region Email configuration and send

var client = new SmtpClient("smtp.gmail.com", 587)
{
	Credentials = new NetworkCredential("gmailuserhere", 
                      "gmailpasswordhere"),
	EnableSsl = true
};

MailMessage m_message = new MailMessage(
m_notification_to,
m_notification_from,
m_notification_title + DateTime.Today.ToShortDateString(),
m_emailbody);

//...see downloads for code on creating full attchments

m_message.Attachments.Add(m_new_attachment);
m_message.Attachments.Add(m_changed_attachment);

client.Send(m_message);

The real work in the code is performed by the function below. This piece comes almost verbatim from Jeff Barnes, MS MVP, and his blog article here: http://jeffbarnes.net/blog/post/2007/01/12/File-Checksum-using-NET.aspx.

C#
private static string GetChecksum(string m_fileinput)
{
    try
    {
        string m_checksum;
        using (FileStream stream = File.OpenRead(m_fileinput))
        {
            SHA256Managed sha = new SHA256Managed();
            byte[] checksum = sha.ComputeHash(stream);
            m_checksum = 
              BitConverter.ToString(checksum).Replace("-", String.Empty);
        }

        return m_checksum;

    }
    catch (Exception ex)
    {
        using (StreamWriter swerr = File.AppendText(m_errors))
        {
            Logger.LogMessage(ex.Message.ToString(), swerr);
            swerr.Close();
        }

        return "unable to retrieve checksum";

    }
}

One of the issues you may run into with SHA-256 is that, because of its extended algorithm, it takes longer to create the checksum than say MD5. MD5 will run in less than half the time, which is important if you have terabytes to scan and you are more concerned with the checksum value and not the encryption. The snippet below shows how to use MD5 instead:

C#
MD5 m_md5 = new MD5CryptoServiceProvider();
//SHA256Managed sha = new SHA256Managed();
byte[] checksum = m_md5.ComputeHash(stream);
m_checksum = BitConverter.ToString(checksum).Replace("-", String.Empty);

History

  • 23 September 2011 - Added the MD5 comment to allow for quicker scanning of files.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)