MyCache: Distributed caching engine for an ASP.NET web farm - Part II: The internal details

Al-Farooque Shubho

4.97/5 (20 votes)

27 Dec 2010CPOL17 min read

83.1K

1.2K

Internal implementation of MyCache: A distributed caching engine for ASP.NET applications deployed under a load-balanced environment in web farms, which is built on top of WCF and the Microsoft Caching Application Block.

Download source code - 793 KB

Introduction

If you are reading this article, you may already have gone through the demonstration of MyCache in my previous article: MyCache: Distributed caching engine for an ASP.NET web farm - Part I: The demonstration.

Well, if not, you are suggested to go through the above article to have a look at the usage, capability, and high level architecture of the distributed caching engine.

This is Part 2 of the article, which would cover the detailed explanations of the building block of the caching engine, performance, and other issues. Here we go:

MyCache architecture

As you may already know, MyCache is entirely built upon leveraging the very familiar and proven .NET related technology. Following are the two main building blocks of MyCache:

Microsoft Caching Application Block: This has been used as the core caching engine for storing/retrieving objects to/from in-memory. A class library has been built around the Caching Application Block to implement the basic caching functionality.
WCF with net.tcp binding: WCF has been used as the communication media between the caching service and the client applications. The WCF service uses a net.tcp binding to let the communication take place in the fastest possible time.
IIS 7: The WCF service with net.tcp binding is hosted under an ASP.NET application in IIS to let the service be exposed to the outside world.

To recall, the following diagram depicts the high level architecture of MyCache:

MyCache architecture

Following is the basic working principle of the MyCache based caching management:

There is a cache server which is used by all servers in the web farm to store and retrieve cached data.
While there is a request in a particular server in the load-balanced web farm, the server asks the cache server to get the data.
The cache server serves the data from the cache if it is available, or lets the calling ASP.NET application know about the absence of the data. In that case, the application at the calling server loads the data from the data storage/service, stores it inside the cache server, and processes the output to the client.
A subsequent request arrives at a different server which needs the same data. So, it asks the cache server to get the data.
The cache server has that particular data in its cache now. So it serves the data.
For any subsequent request to any server in the web farm requiring the same data, the data is served fast from the cache server, and the ultimate performance makes everybody happy.

A few technical issues

Despite the fact that there are a few well-established distributed caching engines like NCache, MemCache, Velocity, and a few others, I decided to develop MyCache.

But, why another one?

Fair question. To me, the answer is as follows:

MyCache is simple and Open Source. It uses technologies which are very basic, stable, and known by most .NET developers. The overall implementation is very familiar, and very easy for someone to customize according to their needs. So, if you use MyCache, you won't get a feeling that you are using a third party service or component or product which you don't have much control on, or which you are not sure about. MyCache is not really a "product", rather an implementation of a simple idea which lets you build your own home made distributed caching engine.

Why choose Caching Application Block as the core caching engine?

I could have tried to develop a caching engine from scratch for storing and retrieving objects to and from in-memory, but this would require me spending much time developing something which has already been developed, well tested, and well accepted across the community. So, my obvious choice was to use the "Caching Application Block" as the "in-memory" caching engine.

Why choose WCF with net.tcp binding in IIS?

The caching service is built around the Caching Application block, and the service has to be exposed to ASP.NET client applications to let them consume it. The following options were available:

Socket programming
.NET Remoting
Web Service
WCF

Socket programming or .NET Remoting were the fastest possible ways to let client applications consume the caching service. But these two were not chosen simply because they would require too much effort to implement a solid communication mechanism, which is already available in WCF via different binding options.

The Web Service (equivalent to basicHttpBinding or wsHttpBinding in WCF) was the easiest way to expose and consume the caching service, but it was not chosen simply because the SOAP based communication over HTTP protocol is the slowest among all WCF binding options.

So, WCF with net.tcp binding was the obvious choice to expose and consume the service in the fastest possible way, and in a most reliable manner. Even though there was an alternative to host the WCF Service via a Windows Service, IIS based hosting of the WCF Service with net.tcp binding was chosen because this would allow to let everything be managed from within the IIS (though, only IIS 7.0 and later versions support net.tcp binding).

OK, I want to use MyCache, what do I have to do?

Thanks, if you want to do so. Before deciding to use MyCache, you may need to know a few facts:

MyCache is built upon .NET Framework 4.0, and requires IIS 7 or higher on a Windows Vista, Windows 7, or Windows 2008 machine. If you think your deployment environment meets the above criteria, you need to perform the followings steps to use MyCache:

Deploy the MyCache WCF Service on IIS, enabling net.tcp binding (the previous article covers this in detail).
Add a reference to the MyCacheAPI.dll in your ASP.NET client application(s), and configure your application's identity (configure a "WebFarmId" variable in AppSettings).
Add a reference to the WCF Service (caching service) and configure the service in web.config if required (optional).
Start using MyCache (you already know how to do this; see Part 1 of the article).

MyCacheAPI: The only API you need to know

Yes. Once the caching service is configured properly in IIS, the only thing you need to know is MyCacheAPI. Simply adding MyCacheAPI.dll would let you start using the caching service straight away.

Following is the simplest code to use MyCache inside your ASP.NET applications:

//Declare an instance of MyCache
MyCache cache = new MyCache();

if(data == null)
{
        //Data is not available in Cache. So, retrieve it from Data source
        data = GetDataFromSystem();
        //Store data inside MyCache
        cache.Add("Key", data);
}

//Remove data from Cache
cache.Remove("Key");

//Add data to MyCache with specifying a FileDependency
cache.Add("Key", Value, dependencyFilePath, 
          Cache.NoAbsoluteExpiration, new TimeSpan(0, 5, 10), 
          CacheItemPriority.Normal, 
          new CacheItemRemovedCallback(onRemoveCallback));

//Reload the data from dependency file and put into MyCache in the callback
protected void onRemoveCallback(string Key, object Value, 
               CacheItemRemovedReason reason)
{

    if (cache == null)
    {
        cache = new MyCache();
    }

    if (reason == CacheItemRemovedReason.DependencyChanged)
    {
        //Aquire lock on MyCache service for the Key and proceed only if
        //no lock is currently set for this Key. This has been done to prevent
        //multiple load-balanced web application update the same data on MyCache service
        //sumultaneously when the underlying file content is modified
        if (cache.SetLock(Key))
        {
            string dependencyFilePath = GetDependencyFilePath();
            object modifiedValue = GetObjectFromFile(dependencyFilePath);
            cache.Add(Key, modifiedValue, dependencyFilePath, 
              Cache.NoAbsoluteExpiration, new TimeSpan(0, 5, 60), 
              CacheItemPriority.Normal, 
              new CacheItemRemovedCallback(onRemoveCallback));

            //Release lock when done
            cache.ReleaseLock(Key);
        }
    }
}

Serving multiple load-balanced web farms

MyCache is able to serve multiple load-balanced web farms, and objects stored from web application(s) of one web farm is not accessible to web applications deployed in a different web farm. How does MyCache manage this?

Interestingly, this was too simple to implement. I just had to distinguish each web farm by a WebFarmId, which was needed to be configured in the web.config of each application, by configuring a different value for each web.config.

Let us assume we have two different kinds of ASP.NET code bases (two different ASP.NET applications), and each one is deployed in its own load-balanced web farm in IIS. These two applications are configured to use MyCache (by adding service references to the MyCache WCF service and adding a reference to MyCacheAPI.dll).

As we do not want one application to be able to get access to another application's data in MyCache, we need to configure the WebFarmId parameter as follows:

In the web.config of Application1: <add Key="WebFarmId" Value="1"/>

In the web.config of Application2: <add Key="WebFarmId" Value="2"/>

Now, whenever a key is provided by the application to store/retrieve to/from MyCache, the MyCacheAPI appends the WebFarmId with the key before invoking the MyCache service methods.

The following method is used to append the WebFarmId value with the key, which is used by MyCacheAPI.dll to build an appropriate key for the corresponding web farm before invoking any service method:

public string BuildKey(string Key)
{
    string WebFarmId = ConfigurationManager.AppSettings["WebFarmId"];
    Key = WebFarmId == null ? Key : 
             string.Format("{0}_{1}", WebFarmId, Key);
    return Key;
}

cache.Add(Key, Value);

This adds an object to the caching server using the provided key, and before adding the object to the MyCache service, it builds the appropriate key for distinguishing the web farm. Following is the function definition:

public void Add(string Key, object Value)
{
        //Builds appripriate key for corresponding web farm
        Key = BuildKey(Key);

        cacheService.Add(Key, Value);
}

Implementation of FileDependency in MyCache

ASP.NET cache has a cool "FileDependency" feature which lets you add an object into the cache and specify a "file dependency" so that when the file content is modified in the file system, the callback method is invoked automatically and you can reload the object into the cache (possibly re-reading the content from the file) within the callback method.

There was a potential approach to implement this feature in MyCache. The approach was to use FileDependency of the Caching Application Block in the MyCache Service. But this approach wasn't successful because of the following reason:

Nature of the "Duplex" communication of WCF

Yes, it was the "duplex" communication nature of WCF, which didn't let us implement the FileDependency feature in an ideal and cleaner approach. The following sections have detailed explanations on this issue.

But, before that, we have a pre-condition. In order to be able to use the FileDependency feature in MyCache, the caching service is needed to be deployed within the same Local Area Network where the ASP.NET client applications are deployed. The following section explains this issue in detail:

The dependency file access requirement

Like the ASP.NET cache, the Microsoft Caching Application Block also has a CacheDependency feature. So, when an ASP.NET client application needs to add an object into MyCache along with specifying FileDependency, it is possible to send the necessary parameters to the MyCache WCF Service and specify a FileDependency while adding an object into the Caching Application Block within MyCache. But, there is a fundamental file access issue.

MyCache is a distributed caching service, and as long as the client ASP.NET applications can consume the caching service at a particular end point (URL), the caching service could be hosted on any machine on any network. But, in order to implement FileDependency, it is required that the caching service be deployed on the same Local Area Network where both the ASP.NET client applications and the caching server application has access to a network file location.

To understand this better, let us assume we have a web farm where a single ASP.NET application has been deployed in multiple load balanced web servers. All these web servers point to the same codebase, and they all reside in the same Local Area Network. As a result, they can access a file stored somewhere in the Local Area Network using a UNC path (say, \\Network1\Files\\Cache\Data.xml).

Now, no matter where the MyCache WCF Service is deployed, it is necessary that the server application has access to the same file (\\Network1\Files\\Cache\Data.xml) stored in the same LAN where the load balanced servers have access. This will allow to detect a change in the underlying file for the MyCache service application.

The WCF "Duplex" communication issue

The caching service application shouldn't implement any logic which should belong to the client ASP.NET applications. So the responsibility of reading the dependency file and storing into the caching service actually belongs to the corresponding client application, and the caching service application should only worry about how to load and store data into the caching engine (Caching Application Block).

So, assuming that the dependency file is stored in a common network location which is accessible both by the caching service and the ASP.NET client applications on the load balanced servers, the client application should only send the file location and the necessary parameters to the server method indicating that FileDependency should be specified while adding the object in the cache. On the other hand, the caching service should add the object in the Caching Application Block by specifying CacheDependency (to the specified file location), and when the file content is modified, instead of reloading the file content from disk, the Caching Application Block should fire a callback to the corresponding client ASP.NET application to re-read the file content and store the updated data into the cache.

It seemed promising that WCF supports duplex communication where not only the client can invoke server functionality, but the server is also able to invoke functionality on the client application. But unfortunately, this is only possible (or feasible) as long as the client has a "live" communication status with the server application.

In our case, the flow of information should happen as follows:

The client application invokes a WCF method on the caching service by sending the necessary parameters to add the object in the cache with CacheDependency.
The WCF Service should add the object into the Caching Application Block specifying FileDependency and return (client-server conversation stops here and is no longer live).
The underlying file content is modified somehow (either manually, or by an external application).
Microsoft Caching Application Block at the server-end detects that change, and invokes the callback method at the server-end, which in turn tries to invoke the callback at the client-end.

Unfortunately, at this point, the client ASP.NET application has no longer a "live" communication channel with the WCF Service, and hence the client callback method invocation fails. Ultimately, the client ASP.NET application gets no signal about the modification to the file from the server-end, and hence it cannot reload the file content and store it into MyCache.

So how was CacheDependency implemented then?

It was implemented with a very simple approach. I just took the help of the ASP.NET Cache. Yes, you heard it right!

Even though we are using MyCache for our distributed cache management needs, we shouldn't forget that our old good friend ASP.NET Cache is still available there with the ASP.NET client applications. So, we could easily use it inside MyCacheAPI.dll only to get notification about the event when the file content is modified at client-end (because the server application was unable to send us such a signal). Once we get the event notification in the client ASP.NET applications, it is easy to re-read the file content and update the data in MyCache.

Here is how the ASP.NET Cache was used in conjunction with the MyCache WCF Service to implement the CacheDependency feature.

The object is added to the MyCache caching service with the usual cache.Add() method, along with specifying the FileDependency.

cache.Add("Key", Value, dependencyFilePath, 
      Cache.NoAbsoluteExpiration, new TimeSpan(0, 5, 10), 
      CacheItemPriority.Normal, 
      new CacheItemRemovedCallback(onRemoveCallback));

The MyCache server application adds the object to the Caching Application Block, along with specifying the CacheDependency file location. At the same time, MyCacheAPI.dll adds the key to the ASP.NET Cache (both as a key and a value) along with specifying FileDependency and a callback method.

//Set the Key (Both as a Key and a Value) in Asp.net Cache
// with specifying FileDependency and CallBack method
// to get event notification when the underlying
//file content is modified
cache.Add(Key, Key, dependency, absoluteExpiration, 
          slidingExpiration, priority, onRemoveCallback);
cacheService.Insert(Key, objValue, dependencyFilePath, 
                    strPriority, absoluteExpiration, slidingExpiration);

At the MyCache Service end, when the file content is modified, the Microsoft Caching Application block removes the object from its cache, but doesn't call any callback method as no callback method is specified.
At the same time, at the ASP.NET client end, the callback method is invoked by the ASP.NET Cache, in all load balanced sites. Each site tries to obtain a lock for the key, and one site gets the lock and updates the object value in the MyCache Service, while other sites fail to obtain the lock and hence doesn't proceed with the unnecessary update operation of the same object in MyCache (which is already updated by one of the load balanced applications).

if (cache.SetLock(Key))
{
    string dependencyFilePath = GetDependencyFilePath();
    object modifiedValue = GetObjectFromFile(dependencyFilePath);
    cache.Add(Key, modifiedValue, dependencyFilePath, 
      Cache.NoAbsoluteExpiration, new TimeSpan(0, 5, 60), 
      CacheItemPriority.Normal, 
      new CacheItemRemovedCallback(onRemoveCallback));

    //Release lock when done
    cache.ReleaseLock(Key);
}

Locking and unlocking

The distributed caching service is used by the load-balanced ASP.NET client sites, and multiple sites may try to update and read the same data (within the same web farm) on the MyCache service at the same time. So, it is important to maintain consistency of data in update operations so that:

An Update operation (cache.Add(Key,Value)) for a particular key does not overwrite another ongoing (unfinished) update operation for the same key.
A Read operation (cache.Get(Key)) for a particular key does not read data in a "dirty state" (read operation does not read data before finishing a current update operation for the same key).
An Update operation on CacheService from within CacheItemRemovedCallBack does not get called multiple times for each load balanced application.

Fortunately, each and every operation on the Microsoft Caching Application Block is "thread safe". That means, as long as a particular thread doesn't complete its operation, no other thread will be allowed to accessed the shared operation, and hence there will not be any occurrence of "Dirty Read" or "Dirty Write".

However, there was a need to implement some kind of locking to prevent the CacheItemRemovedCallBack method updating the same data multiple times (when the underlying file content is being modified) on the caching service for each load balanced application.

That's why LockManager was born.

What is LockManager?

LockManager is a class which encapsulates the locking/unlocking logic for the MyCache service. Basically, this class constitutes a "Lock Key" for a particular key, and puts it into the cache (by using the locking key both as the key and the value) to indicate that MyCache is currently locked for that particular key.

For example, let us assume the current key is 1_Key1 (WebFarmId_Key). As long as the Microsoft Caching Application Block has the key "1_Key1" available in its cache, it is assumed that a lock is available on MyCache for the particular data with key 1_Key1.

When the underlying file content is modified, the CacheItemRemovedCallBack method is fired on each load balanced application, and each application tries to obtain a lock on the specified key. The first application which obtains the lock for that particular key gets the opportunity to update the data on the caching service, and the other application simply doesn't do anything.

Such "key based" individual locking ensures that locking occurs at each individual object's operation level, and hence a locking for a particular object doesn't affect another. Ultimately, this reduces the chance of building up a long waiting queue of Read/Write operations on MyCache, and thus improves overall performance.

Please note that the following two methods are available on MyCacheAPI, which are meant only to be used inside the CacheItemRemovedCallBack method for ASP.NET applications (for ensuring that only one load balanced application updates the data on the MyCache server by setting a lock with the Key).

SetLock(string Key): Obtains an update lock for the key on MyCache
ReleaseLock(string Key): Releases the update lock

So, in normal Read and Write operations on MyCache (except the CacheItemRemovedCallBack method), the client code doesn't need to write any locking functionality, as the locking logic is implemented on the MyCache service-end.

The LockManager class is defined as follows:

/// <summary>
/// Manages locking functionality
/// </summary>
class LockManager
{
    ICacheManager cacheManager;

    public LockManager(ICacheManager cacheManager)
    {
        this.cacheManager = cacheManager;
    }
    
    /// <summary>
    /// Releases lock for the speficied Key
    /// </summary>
    /// <param name="Key"></param>
    public void ReleaseLock(string Key)
    {
        Key = BuildKeyForLock(Key);
        if (cacheManager.Contains(Key))
        {
            cacheManager.Remove(Key);
        }
    }

    /// <summary>
    /// Obtains lock for the specified Key
    /// </summary>
    /// <param name="Key"></param>
    /// <returns></returns>
    public bool SetLock(string Key)
    {
        Key = BuildKeyForLock(Key);
        if (cacheManager.Contains(Key)) return false;

        cacheManager.Add(Key, Key);
        return true;
    }

    /// <summary>
    /// Builds Key for locking an object in Cache
    /// </summary>
    /// <param name="Key"></param>
    /// <returns></returns>
    private string BuildKeyForLock(string Key)
    {
        Key = string.Format("Lock_{0}", Key);
        return Key;
    }
}

Performance

MyCache manages cached data inside a different process, possibly on a different machine. So, it is obvious that the performance is nowhere near the ASP.NET Cache, which stores data "in-memory".

Given the distributed cache management requirement, the "out of the process" storage of cached data is a natural demand, and hence the inter-process communication (or inter-machine network communication) and data serialization/de-serialization overheads cannot just be avoided. So, it's necessary that the communication and serialization/de-serialization overhead is minimum.

The net.tcp binding is the fastest possible way of communication mechanism within WCF across two different machines, which have been used in the MyCache architecture. Besides, the WCF Service and client applications could always be configured to take the most possible performance out of the system.

I have developed a simple page (ViewPerformance.aspx) to demonstrate the performance of MyCache in my Core-2 Duo 3 GB Windows Vista Premium PC. The client and server components are both deployed within the same machine, and here is a sample performance output:

A sample performance measurement of MyCache

Despite the fact that the testing environment isn't convincing in any way (no real server environment, no real load on system, everything in the same single PC), the above data signifies that the overall performance is promising enough to be considered as a distributed caching engine. After all, if retrieval operation from MyCache for a moderate sized data completes within at most 1 second in a real application, I would be confident to use it as my next distributed caching engine.

Give it a try, and let me know about any issues or improvement suggestions. I'd be glad to hear from you.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)