Introduction
Many of my projects involve a need to track files -- when we see them, where they go, and what happens to them along the way. We also need to know when we have duplicate contents under different names and/or from different sources. For both of these goals, generating a summary of the file in the form of a cryptographic hash can help. (For an introduction to the concept, see Wikipedia's Cryptographic hash function article.) Keeping track of and comparing hash values quickly tells us whether we have seen a file's contents before.
A number of the examples I've seen in search results for file hashing with VB.NET left me wanting something more like what follows in the How To section below. Early on, I ran into a problem from reading the file as a Byte
array and passing that.
MyByteArray = MySHA.ComputeHash(My.Computer.FileSystem.ReadAllBytes(FileUNCPath))
This worked great for small files, but failed badly for larger files. By contrast, using a stream (as demonstrated below) works fine, even for files in the GB range.
After creating the hash digest, most other examples convert the byte array to a string representation before sending the result to a database table. In many implementations (e.g., encoding with hexadecimal, Base64, etc.), this will increase the amount of storage space required for the digest. And, for this extra space, all that seems to be gained is the ability to send the data through a SQL statement without parameters. Using parameters is worthwhile here, since sending and storing a binary value is more efficient for this purpose than using a string representation.
How To
These examples assume the following namespace imports (and that your project imports "System
", as per the default Windows Application project):
Imports System.IO
Imports System.Data.SqlClient
The following example demonstrates how to create the hash value. This example uses the a class for the SHA256 algorithm. Another of the algorithms implemented by System.Security.Cryptography
could be substituted with slight modifications:
Dim MyFilePath As String = "c:\test\example.pdf"
Dim MyByteArray As Byte()
Dim MySHA As New Security.Cryptography.SHA256Managed
MyByteArray = MySHA.ComputeHash(File.OpenRead(MyFilePath))
The following example demonstrates how to store the hash value (assuming "db
" is an open SqlConnection
and the Files table contains a column HashVal
of type Binary
with length 32).
Dim MyCmd As SqlCommand = db.CreateCommand
MyCmd.CommandText = "INSERT INTO dbo.Files (HashVal) VALUES (@HashVal);"
Dim SHAParameter As New SqlParameter( _
"@HashVal", Data.SqlDbType.Binary, 32, Data.ParameterDirection.Input)
SHAParameter.Value = MyByteArray
MyCmd.Parameters.Add(SHAParameter)
MyCmd.ExecuteNonQuery()
The following example demonstrates how to compare the hash value and return a count based on whether it exists in the table:
Dim MyCmd As SqlCommand = db.CreateCommand
MyCmd.CommandText = "SELECT COUNT(*) FROM dbo.Files WHERE HashVal=@HashVal;"
Dim SHAParameter As New SqlParameter( _
"@HashVal", Data.SqlDbType.Binary, 32, Data.ParameterDirection.Input)
SHAParameter.Value = MyByteArray
MyCmd.Parameters.Add(SHAParameter)
Dim MyCount As Int32 = MyCmd.ExecuteScalar
Summary
These basic steps can serve as building blocks for utilizing cryptographic hashes as a message digest for tracking and duplicate-checking of files in coordination with SQL Server.