Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / VB

Basic Cryptographic Hashing of a File and Storage to SQL Server

2.85/5 (4 votes)
6 Oct 2006CPOL2 min read 2  
Basic VB.NET example for generating a hash value and using SQLClient to store to a binary column in SQL Server.

Introduction

Many of my projects involve a need to track files -- when we see them, where they go, and what happens to them along the way. We also need to know when we have duplicate contents under different names and/or from different sources. For both of these goals, generating a summary of the file in the form of a cryptographic hash can help. (For an introduction to the concept, see Wikipedia's Cryptographic hash function article.) Keeping track of and comparing hash values quickly tells us whether we have seen a file's contents before.

A number of the examples I've seen in search results for file hashing with VB.NET left me wanting something more like what follows in the How To section below. Early on, I ran into a problem from reading the file as a Byte array and passing that.

VB
MyByteArray = MySHA.ComputeHash(My.Computer.FileSystem.ReadAllBytes(FileUNCPath))

This worked great for small files, but failed badly for larger files. By contrast, using a stream (as demonstrated below) works fine, even for files in the GB range.

After creating the hash digest, most other examples convert the byte array to a string representation before sending the result to a database table. In many implementations (e.g., encoding with hexadecimal, Base64, etc.), this will increase the amount of storage space required for the digest. And, for this extra space, all that seems to be gained is the ability to send the data through a SQL statement without parameters. Using parameters is worthwhile here, since sending and storing a binary value is more efficient for this purpose than using a string representation.

How To

These examples assume the following namespace imports (and that your project imports "System", as per the default Windows Application project):

VB
Imports System.IO
Imports System.Data.SqlClient

The following example demonstrates how to create the hash value. This example uses the a class for the SHA256 algorithm. Another of the algorithms implemented by System.Security.Cryptography could be substituted with slight modifications:

VB
' example filename
Dim MyFilePath As String = "c:\test\example.pdf"
' define a Byte Array to hold the result of the hashing algorithm
Dim MyByteArray As Byte()
' make an object for the implementation of the HashAlgorithm
Dim MySHA As New Security.Cryptography.SHA256Managed
' calculate the hash value and keep it for later
MyByteArray = MySHA.ComputeHash(File.OpenRead(MyFilePath))

The following example demonstrates how to store the hash value (assuming "db" is an open SqlConnection and the Files table contains a column HashVal of type Binary with length 32).

VB
Dim MyCmd As SqlCommand = db.CreateCommand
MyCmd.CommandText = "INSERT INTO dbo.Files (HashVal) VALUES (@HashVal);"
Dim SHAParameter As New SqlParameter( _
"@HashVal", Data.SqlDbType.Binary, 32, Data.ParameterDirection.Input)
SHAParameter.Value = MyByteArray
MyCmd.Parameters.Add(SHAParameter)
MyCmd.ExecuteNonQuery()

The following example demonstrates how to compare the hash value and return a count based on whether it exists in the table:

VB
Dim MyCmd As SqlCommand = db.CreateCommand
MyCmd.CommandText = "SELECT COUNT(*) FROM dbo.Files WHERE HashVal=@HashVal;"
Dim SHAParameter As New SqlParameter( _
"@HashVal", Data.SqlDbType.Binary, 32, Data.ParameterDirection.Input)
SHAParameter.Value = MyByteArray
MyCmd.Parameters.Add(SHAParameter)
Dim MyCount As Int32 = MyCmd.ExecuteScalar

Summary

These basic steps can serve as building blocks for utilizing cryptographic hashes as a message digest for tracking and duplicate-checking of files in coordination with SQL Server.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)