Introduction
Let's assume you are developing a desktop application for managing a fitness center. The members information is, of course, to be stored in a database, so you define a "Members" table with fields such as name, birth date, telephone, and the like. But, your happiness ends when you are faced with the requirement of recording the member's photo together with the rest of the data. Then, the second most transcendental question in human history (right after the meaning of life) comes to your mind: Should I store the photo in the database for the sake of data consistency, or in the local file system for the sake of performance?
This article presents a C# class that will allow you to grab the best of two worlds.
Background
There are times when we need to store some large binary data as part of a larger, relational set of data. The debate about where this kind of data should be stored is not new. Basically, there are two alternatives, each one offering its own advantages: using a database, which offers transactional integrity and allows to tie all the data together in a consistent way; and using disk files, which can't be beaten when data retrieval speed is a must, and can indeed save a lot of network traffic.
This article presents a C# class, DatabaseFileCache
, that aims to get the best of the two approaches. The basic idea is to store the authoritative binary data in the database, and once retrieved, to cache this data in the local file system.
So, the first time a given binary object is requested by the application, it is retrieved from the database and cached in a local file. The next time the same object is requested, the cached file is returned instead, unless the object has changed since the last request. To keep track of changes, timestamp columns are used; the timestamp value is stored as part of the file name in the local cache.
How it works
To use the class, we first need a database table to store the binary objects (a "binary object" is any opaque sequence of bytes). The default name for this table is Objects ("default names" are the names used by the class unless you explicitly specify different ones), and it has the following columns:
- A string (
varchar
or similar) column to store the object name (default name is Name). - An image,
varbinary
, or similar type column to store the actual binary data of the object (default name is Value). - A timestamp column to keep track of object changes (default name is timestamp).
Some flexibility is allowed here. The value column may be of any type that ADO.NET can convert to a byte array, and the timestamp column may be of any type, provided that it automatically changes whenever the value column changes (the timestamp data type was created exactly for this purpose, by the way). Extra columns may be present in the table as long as they either allow null values, have a default value, or are automatically filled by the database engine when a new record is created.
We also need a directory in the local file system to store the cached files. The default is a directory named DatabaseCache placed under the directory specified by the DataDirectory
application domain property (yes, that's the same DataDirectory used by SQL Server 2005 Express connection strings; you can set it with the AppDomain.CurrentDomain.SetData
method). If DataDirectory
is not set, the DatabaseCache directory is assumed to be in the main application executable directory (Assembly.GetEntryAssembly().Location
is used to obtain this information).
The main operation provided by the class is to retrieve a binary object, given its name. This is the pseudo-code algorithm for this operation:
Is there a file in the local cache with a matching name?
No
Retrieve the object from the database
If no object with such name exists in database, return NULL
Store the object in a file in the local cache
Return the cached file
Yes
Obtain the current timestamp for the object in the database
If no object with such name exists in database
Delete the cached file
Return NULL
Obtain the timestamp of the cached file
Do both timestamps match?
Yes
Return the cached file
No
Delete the cached file
Retrieve the object from the database
Store the object in a file in the local cache
Return the cached file
As you can see, the binary object is retrieved from the database only when needed; that is, when the object is requested for the first time, and when it has changed since the time it was cached; in all other cases, only the timestamp (a few bytes) is retrieved from the database, and the cached file is directly returned.
Note also that dummy cached files (objects that exist in the local cache but not in the database) are appropriately detected and deleted. These dummies may appear, for example, when the database is accessed by multiple users, and one of them deletes an object in the database while another user had already cached the object.
About the object names
To name your binary objects in the database, any string can be used. When storing these objects in the local file system, the file name is composed from the object name in the following way. First, characters that are not valid for a file name (such as "*" or "/") are replaced by the sequence %uuuu, where uuuu is the Unicode representation of the character in hexadecimal; the "%" character itself and the dot are replaced as well. Next, the timestamp value is converted to its hexadecimal representation, and appended to the file name as the extension.
For example, assume you are naming your object as follows:
Fast retransmit *really* increases speed in 20% over TCP/IP.pdf
Then, the cached file name would be the following, assuming a fictitious 4 byte timestamp:
Fast retransmit %002Areally%002A increases speed
in 20%0025 over TCP%002FIP%002Epdf.0045AB7F
Such encoded strings can be easily converted back to their original un-encoded versions.
Using the code
To use this code, you need to create an instance of the DatabaseFileCache
class. Each instance is associated with a given table in a given database and with a given local cache path (you can specify these parameters in the class constructor, or later via class properties). Once the instance is created, you use its methods to store objects in the database and to retrieve them using the cache mechanism explained.
To instantiate the class, you have three constructors:
- A simple one which admits a single parameter: the connection object for the database. All other parameters (database table and column names, and the local cache location) are set to their default values.
- A mid-complex one, which admits as parameters the connection object, the database table name, and the local cache location. The table column names are set to their default values.
- The complete one, which admits as parameters the connection object, the database table name, the table column names, and the local cache location.
You can change all of the class parameters at any time, by accessing these public properties whose names are self-explaining: Connection
, CachePath
, TableName
, NameColumn
, ValueColumn
, and TimestampColumn
. Note that if you supply a relative path for CachePath
, it will be combined with either the DataDirectory path or with the application executable path, as explained above.
There is an extra property that you may want to adjust: CommandTimeout
, which controls the maximum time the SQL commands execution may take. It defaults to 30 seconds; you may want to set it to a larger value if you are dealing with very large objects or very slow networks.
As for the public methods exposed by the class, these are the ones:
SaveObject
: Will store in the database the specified byte array or the contents of the specified file, with the specified name or with the same name as the file itself (there are three method overloads). If there is already an object with the same name in the database, its contents will be overwritten. This method does not access the local cache at all.GetObject
: Will retrieve an object from either the database or the local cache, following the algorithm explained above. Returns the path to the cached file, or null
if the object with the specified name does not exist in the database.GetCachedFile
: Will return the path of the cached file for the specified object name, if it exists, or null
otherwise, without accessing the database at all. This method can be used as a "life vest" if the database becomes unreachable, but only if it is acceptable to use data that may be out of date.DeleteObject
: Deletes the object with the specified name from both the database and the local cache, if it exists.RenameObject
: Changes the name of an object in both the database and the local cache, if it exists.ObjectExists
: Returns a boolean value indicating whether an object exists in the database with the specified name or not.GetObjectNames
: Returns an array of strings with the names of all the objects stored in the database.PurgeCache
: Deletes from the local cache all the files that have no matching object in the database. If the database is accessed by multiple users, it may be convenient to execute this method once when the application execution begins and/or ends.
Note that you can, at any time, manually delete some or all of the cached files, besides/instead of using the PurgeCache
method. The class does not maintain any state information about the cached files, it only searches for existing files as needed.
Here are some simple examples of how to use the class. For example, for creating an instance, having the database table named Photos and using C:\PhotosCache as the cache directory, you would do:
SqlConnection connection=new SqlConnection(@"Data Source" +
@"=.\SQLEXPRESS;AttachDbFilename="|DataDirectory" +
@"|Data\MyDatabase.mdf";Integrated security=true;");
DatabaseFileCache cache=new DatabaseFileCache(connection, "Photos", @"C:\PhotosCache");
Then, to store objects in the database:
cache.SaveObject(@"C:\Photos\DSCF0100.jpg", "Kaito's first birtday.jpg");
cache.SaveObject(new byte[] { 1,3,5,7,11 }, "Some primes");
And to retrieve the data:
string filePath=cache.GetObject("Kaito's first birtday.jpg")
MyForm.MyPictureBox.Load(filePath);
filePath=cache.GetObject("Some primes");
byte[] somePrimes=File.ReadAllBytes(filePath);
Points of interest
Nobody is perfect (not even me!). The DatabaseCache
class could be improved/extended in a number of ways. I'll point here the ones that seem more interesting to me.
The class uses SQL Server as the database engine, hence all the connection, command, and parameter objects used belong to the System.Data.SqlClient
namespace. It should not be difficult to adapt it for another provider, or even better, to use the ADO.NET factory classes for supporting any provider with one single code base.
Binary data is stored to and retrieved from the database the crude way: by directly using parameters of byte[]
type in a single command execution. This will work fine for relatively small objects, but for very large objects, it would be more convenient to use some method for sending and retrieving data in small chunks.
Final note
This is my first submission to The Code Project, so please don't beat me too hard.
History
- 14 December 2007: First version.