Introduction
The virtual device interface of SQL Server allows developers to define "virtual devices" for SQL Server to backup and restore databases and transaction logs. Having access to these virtual devices gives you access to the raw data that is passed into and out of SQL Server during a backup or restore operation. This access allows you to perform any number of tasks on this data though the most common tasks are compressing and/or encrypting the data before it is written to disk. Microsoft has published an API to access this functionality of SQL Server. The only catch is that to the best of my knowledge, there is no easy way to access this API via the .NET Framework. With that being the case, I set out to develop a Managed C++ component which will act as a bridge between the "Virtual Device Interface" (VDI) API and the .NET Framework.
Background
This article is meant to cover the basics of the VDI API and how to integrate it into a .NET application. As such, only a very small subset of the VDI API is covered in this article. There are many other capabilities of the VDI such as multiple input and output streams, and various performance tweaking opportunities. Once you have a firm handle on the technologies outlined here, you can see this reference for more details on the VDI specification.
Using the Code
General Approach
Since the API for the VDI is designed to be consumed by C/C++, it made sense to develop a .NET wrapper for the VDI using Managed C++. Since the nature of the VDI is streaming data into and out of the SQL Server, it made sense to me to use a System.IO.Stream
for communication between the .NET application and VDI.
Interface
ExecuteCommand Method
The component in this article has one method named ExecuteCommand
which, as the name implies, executes a backup or restore command to the VDI. This method takes two parameters as input and returns a void
. The first argument for this method is a string
which represents a normal SQL Server backup or restore command with one requirement. When specifying the device to which you will backup or from which you will restore, you must specify a VIRTUAL_DEVICE
. This virtual device will be named within the component so you will need to use the {} string
format syntax to specify the actual device name. For example:
BACKUP DATABASE AdventureWorks TO VIRTUAL_DEVICE='{0}' WITH STATS = 1
The purpose of this functionality is to allow for maximum flexibility when issuing commands to the SQL Server. Of course, in a production grade application, you would likely lock this functionality down, but for the educational purposes of this article, I left it open.
I would like to point out at this point that while the example above backs up a database, you can also backup transaction logs, perform differential backups, verify a backup with restore verifyonly or any other command that you would normally issue to SQL Server via T-SQL.
The second parameter to the method is any object that derives from System.IO.Stream
. In the case of a backup, this will be the stream to which the raw bytes from SQL Server are written. The most basic usage would be passing a FileStream
object for this parameter which would just write the entire backup directly to disk. The resulting file will be exactly as if you issued a normal BACKUP DATABASE
command via Query Analyzer.
Another option might be to wrap your FileStream
object in a DeflateStream
object and pass the DeflateStream
object as your parameter. What you will end up with is a compressed SQL Server backup. The advantage here is that rather than writing the entire backup to disk and then compressing that backup, you can simply write compressed data directly to disk which will save a step, disk I/O, time and disk space requirements.
In order to restore the database, you simply reverse the process by passing a DeflateStream
object wrapped around another FileStream
object that reads from your compressed file as the second argument along with the appropriate restore command.
CommandIssued Event
This event is fired whenever a command is issued to the component. If you intercept this event, you will see the exact command that is being issued to the VDI.
InfoMessageReceived Event
This event is fired whenever SQL Server sends back an informational message. For example, when backing up or restoring a database using the WITH STATS
option, the "1 percent processed, 2 percent processed…" messages will show up in real time to allow you to monitor long running operations.
Internals
The ExecuteCommand Method
The first thing the ExecuteCommand
method does is setup and configure the VDI for our use and creates one virtual device. VDI supports up to 64 virtual devices. This means that on the .NET end of things, you could modify this component to support up to 64 separate streams. This would be of use if you were planning to write the output to multiple disks, or if you would like to take advantage of multiple processors by processing data in multiple streams (and therefore multiple threads). The current ExecuteCommand
method only supports one device and one thread for the sake of simplicity.
The next thing the ExecuteCommand
method does is format the command. As I mentioned earlier, the command that you pass to this method must be in a "format string
" format and specify VIRTUAL_DEVICE
as the output medium. At this point, we need to generate a unique name for the virtual device. A great mechanism for this is to generate a GUID
which is exactly how we name our virtual device in this article. Once we apply the newly created name of our device to the command, we spawn a thread to execute the command.
You may be wondering why we are not just using the asynchronous BeginExecuteNonQuery
method of the SqlCommand
object to execute the command. You could do this, but the InfoMessage
event of the SqlConnection
object which handles all messages back from SQL Server does not fire until the command is complete. This is a problem when you want real-time status updates in your client application. The workaround to this is to use the OdbcCommand
in its own thread as the OdbcConnection
object's InfoMessage
event fires immediately.
After the command has been issued to the SQL Server, we finish configuring the device set and open our one device.
Once the command has been issued and our devices are configured, we call the ExecuteDataTransfer
method from within the ExecuteCommand
method.
ExecuteDataTransfer
Once the device has been configured and the command has been sent to SQL Server to perform the backup or restore operation, the ExecuteDataTransfer
method handles the rest.
There are two parameters that are passed to this method. The first parameter is the virtual device that we created previously in our code. The second parameter is the Stream
object that will either contain the data needed for a restore or will accept the data passed to us from a backup.
The main loop in this function simply calls the GetCommand
method of the virtual device, and then analyzes the commandCode
property of the command structure to determine the next appropriate steps.
The first command we look for is VDC_Read
. This is the command we would expect to see during a restore operation. This is SQL Server's request for data. In this particular component, we first read the requested number of bytes from the stream that the .NET client application passed. We store the number of bytes read from the stream in the variable bytesTransferred
for later use. Then we copy the ".NET bytes" to the memory location specified by cmd->buffer
using the System.Runtime.InteropServices.Marshal.Copy
method. At this point, we make sure that bytesTransferred
is equal to the cmd->size
parameter. This ensures that we transferred all of the data that SQL Server sent to us. If we get a match, then we can set the completionCode
to ERROR_SUCCESS
. If for some reason, we were not able to move that many bytes, we need to specify a completion code of something else to indicate the appropriate error circumstances. For the purposes of this exercise, we'll just use the ERROR_READ_FAULT completionCode
if we cannot read the appropriate number of bytes.
The next command we look for is VDC_Write
. This is the command we expect to see during a backup operation. In this particular case, we first copy the data from the cmd->buffer
into a .NET array, and then we write that .NET array to the stream. In this case, we don't expect bytesTransferred
to be anything other than cmd->size
because if there was a problem writing to the stream, an exception would be raised. As such, we always set the bytesTransferred
parameter to cmd->size
.
The other commands are VDC_Flush
which is just a cue for us to flush the stream; VDC_ClearError
doesn't do much, but is required by the VDI specification and finally a default case to handle any unknown command that may come to us from VDI.
Once we've handled the command that has been passed to us, we call the CompleteCommand
method of the virtual device. This gives feedback to the SQL Server and lets it know that we are ready for another command. This loop will continue until SQL Server is finished processing the T-SQL command (i.e. the backup or restore command issued to SQL Server) at which time it will issue the VD_E_CLOSE HRESULT
indicating that it is finished and closing down the virtual device. At this point, the back up or restore operation is complete.
Points of Interest
One of the greatest opportunities access to the VDI affords us is the ability to compress SQL Server backups in memory before they are written to disk. By simply passing a compressed file stream as the input to the ExecuteCommand
method, you will achieve the following benefits:
- Reduced disk I/O as less data is written to the disk.
- Reduced disk space requirements as the backup is compressed.
- Possible backup speed improvements if your compression engine is sufficiently fast.
- Almost certain restore speed improvements as there will be less disk I/O and decompression is almost always faster than compression.
The easiest way to implement compression is by using the DeflateStream
and GZipStream
objects in the .NET Framework. While there are certainly faster compression algorithms and implementations available, these are free and do a reasonable job of compressing your data.
There are two warnings when using these classes however. The first is that if your database has a lot of compressed data already (i.e. image files, compressed binary files, etc.) you may in fact see an increase in size. This is due to a verbatim implementation of the Deflate algorithm (GZipStream
is simply a wrapper around Deflate
with a CRC32 checksum). I won't go into the details of this, but suffice it to say that other implementations of Deflate
optimize their code to minimize these effects.
The second issue which applies to both the DeflateStream
and GZipStream
classes is file size. The Microsoft documentation says the following about both classes:
This class cannot be used to compress files larger than 4 GB.
Now, with that said, I don't know if they mean the input file cannot be larger than 4 GB or the output file cannot be larger than 4 GB. In either case, I have successfully backed up and restored databases that are much larger than 4 GB. I have also created compressed backup files much larger than 4 GB using at least the DeflateStream
. I have not tested this on the GZipStream
, but I would expect much the same result. The point is that while this seems to work in my tests, Microsoft says that it doesn't. So if you have databases / backups larger than 4 GB, you may want to look for an alternate compression stream.
There are countless other potential opportunities available for example:
You can also use the CryptoStream
object to easily encrypt any SQL Server backup before it is even written to disk ensuring that the unencrypted bytes of the backup never even hit the disk.
You could create an encrypted TCP stream and backup a database directly across any network without fear of the backup being intercepted.
The bottom line is that you can backup and restore from any .NET stream. This gives .NET developers of any language a very powerful tool to add to their tool chest.
History
- 3rd July, 2007: Initial post