Introduction
More and more we are seeing the importance of storage, archiving, and conservation of digital assets, potentially for decades or centuries. Digital Asset Management systems
are mostly focused on ingest and user access, but what about the assets themselves? People are spending a lot of time creating videos, images, and other media. It makes sense that
we should verify the data integrity of digital assets on a regular basis and alert when there are discrepancies. The program below does that using SHA-256 checksum.
This is a console application written for .NET 4.0.
Background
Prerequisites
- MySQL Server (and GUI Tools)* available at www.mysql.com
- MySQL Connector/NET
*The database platform is MySQL but it could easily function with SQL Server. One of the things I like about MySQL Connector/NET is that the syntax for working
with the database is almost the exact same as MS SQL Server (i.e., SqlCommand
becomes MySqlCommand
).
- .NET 4.0/Visual Studio 2010.
- Gmail account or other for authenticating and sending e-mail via SMTP.
Use Cases
The core Use Case is to check all files in a folder/subfolders and verify that the checksum value for a file has not changed. If it has, then a tech or business
owner should be notified by e-mail. The program below does this; it also reports the total number of files, the number of unchanged files, and the number of new files.
The program writes a summary report into the body of the e-mail and attaches two spreadsheets (.csv) with detailed information.
Certain parts of the program are configurable through the app.config file, namely:
- The folder/subfolders to verify
- E-mail send to address
- E-mail sent from address
- Names and location of file attachments
- Name and location of log files
The example uses a folder structure that looks like this:
Using the code
First we will need a database. The attached file named DRMC.sql can be restored to MySQL or you can open the file in Notepad++ and see 1 table,
1 view, and 5 Stored Procedures. The program assumes that the database is called "drmc".
The entire program is in the download file. A few areas that are useful to point out are below:
Connecting and working with the MySQL database via a Stored Procedure is not always straightforward.
The Stored Procedures contain some OUT
parameters in order to get total counts.
The code below shows how to enumerate all the files in a folder and subfolders, run the checksum, and insert the results into a table:
string m_conn =
ConfigurationManager.ConnectionStrings["MySqlConnectionString"].ConnectionString;
MySqlConnection conn = new MySqlConnection(m_conn);
conn.Open();
try
{
MySqlCommand cmd = new MySqlCommand("drmc.proc_checksum", conn);
cmd.CommandType = CommandType.StoredProcedure;
string[] m_files = Directory.GetFiles(m_path, "*.*",
SearchOption.AllDirectories);
foreach (string m_file in m_files)
{
m_filename = Path.GetFileName(m_file);
cmd.Parameters.Clear();
cmd.Parameters.Add(new MySqlParameter(@"M_FILEPATH",
MySqlDbType.VarChar) { Value = m_file.Replace("\\", "\\\\") });
cmd.Parameters.Add(new MySqlParameter(@"M_FILENAME",
MySqlDbType.VarChar) { Value = m_filename });
cmd.Parameters.Add(new MySqlParameter(@"M_SHA256",
MySqlDbType.VarChar) { Value = GetChecksum(m_file).ToString() });
cmd.ExecuteNonQuery();
using (StreamWriter sw = File.AppendText(m_results))
{
Logger.LogMessage("File " + m_file +
" inserted in database.", sw);
sw.Close();
}
}
...
It is also good to know how to work with a MySqlDataAdapter
as well as how to get an OUT
parameter from ExecuteNonQuery
.
The snippet below shows how to do this.
MySqlCommand cmd_newfiles = new MySqlCommand("drmc.proc_newfiles", conn);
cmd_newfiles.CommandType = CommandType.StoredProcedure;
cmd_newfiles.Parameters.AddWithValue("@M_NEWCOUNT", MySqlDbType.Int32);
cmd_newfiles.Parameters["@M_NEWCOUNT"].Direction = ParameterDirection.Output;
cmd_newfiles.ExecuteNonQuery();
Console.WriteLine("New files: " + cmd_newfiles.Parameters["@M_NEWCOUNT"].Value);
Console.WriteLine("\r\n");
string str_newfiles = cmd_newfiles.Parameters["@M_NEWCOUNT"].Value.ToString();
MySqlDataAdapter sda_newfiles = new MySqlDataAdapter(cmd_newfiles);
DataSet ds_newfiles = new DataSet();
ds_newfiles.DataSetName = "New Files";
sda_newfiles.Fill(ds_newfiles);
sda_newfiles.Dispose();
Finally, we start to build an e-mail with the results and send that, including two attachments created with the FileGenerator
class.
DataTable dt_newfiles = ds_newfiles.Tables[0];
DataTable dt_changedfiles = ds_changedfiles.Tables[0];
DataTable dt_changedfiles1 = ds_changedfiles.Tables[1];
FileGenerator.CreateFile(dt_changedfiles, m_changedfiles).ToString();
string m_emailbody = "This e-mail is a summary of checksum file integrity " +
"for files located here: \r\n\r\n" + m_path + "\r\n\r\n";
m_emailbody = m_emailbody + "There are a total of " +
str_totalfiles + " files. \r\n\r\n";
m_emailbody = m_emailbody +
"The file location, file name, and checksum are the same for " +
str_samefiles + " files. \r\n\r\n";
m_emailbody = m_emailbody + "There are " + str_newfiles +
" new files. These are listed below, if any. " +
"Detailed information is in the attached " +
"file checksum_new_files.csv\r\n\r\n";
m_emailbody = m_emailbody + "There are " + str_changedfiles +
" files where the CHECKSUM HAS BEEN CHANGED. " +
"The integrity of the file is in doubt, " +
"or it has been changed by a user. " +
"The files are listed below, if any. Detailed information " +
"is in the attached file checksum_changed_files.csv\r\n\r\n";
string m_emailnewbody = "New Files: \r\n" +
FileGenerator.CreateFile(dt_newfiles, m_newfiles).ToString();
string m_emailchangedbody = "Changed Files: \r\n" +
FileGenerator.CreateFile(dt_changedfiles1).ToString();
m_emailbody = m_emailbody + m_emailnewbody + "\r\n" + m_emailchangedbody;
#endregion
#region Email configuration and send
var client = new SmtpClient("smtp.gmail.com", 587)
{
Credentials = new NetworkCredential("gmailuserhere",
"gmailpasswordhere"),
EnableSsl = true
};
MailMessage m_message = new MailMessage(
m_notification_to,
m_notification_from,
m_notification_title + DateTime.Today.ToShortDateString(),
m_emailbody);
m_message.Attachments.Add(m_new_attachment);
m_message.Attachments.Add(m_changed_attachment);
client.Send(m_message);
The real work in the code is performed by the function below. This piece comes almost verbatim from Jeff Barnes, MS MVP,
and his blog article here: http://jeffbarnes.net/blog/post/2007/01/12/File-Checksum-using-NET.aspx.
private static string GetChecksum(string m_fileinput)
{
try
{
string m_checksum;
using (FileStream stream = File.OpenRead(m_fileinput))
{
SHA256Managed sha = new SHA256Managed();
byte[] checksum = sha.ComputeHash(stream);
m_checksum =
BitConverter.ToString(checksum).Replace("-", String.Empty);
}
return m_checksum;
}
catch (Exception ex)
{
using (StreamWriter swerr = File.AppendText(m_errors))
{
Logger.LogMessage(ex.Message.ToString(), swerr);
swerr.Close();
}
return "unable to retrieve checksum";
}
}
One of the issues you may run into with SHA-256 is that, because of its extended algorithm, it takes longer to create the checksum than say MD5. MD5 will run in less than
half the time, which is important if you have terabytes to scan and you are more concerned with the checksum value and not the encryption. The snippet below shows how to use MD5 instead:
MD5 m_md5 = new MD5CryptoServiceProvider();
byte[] checksum = m_md5.ComputeHash(stream);
m_checksum = BitConverter.ToString(checksum).Replace("-", String.Empty);
History
- 23 September 2011 - Added the MD5 comment to allow for quicker scanning of files.