Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C#

The FactoryDictionary

4.64/5 (4 votes)
3 Dec 2009CPOL7 min read 23.2K   74  
Safely blocking heavyweight object instantiation in a dictionary without blocking unrelated requests.

Introduction

The purpose of the FactoryDictionary is to be able to provide a thread-safe (in that operations are synchronized) implementation of a Dictionary<T,T> that also incorporates the ability to block the calling thread if a new instance needs to be created for a particular key (or if an instance is in the process of being created for a particular key) without blocking threads that were retrieving values for other keys. This is primarily targeted at heavyweight objects that are created infrequently and retrieved frequently. The approach of the FactoryDictionary involves offsetting the heavy instantiation process into an internal wrapper class that uses a user-supplied delegate to create the object. In doing so, the lock on the internal dictionary (used for retrieving instances of the wrapper class) can be released relatively quickly, but the calling thread can still be blocked by establishing a lock on an instance-specific synchronization object.

Background

The idea for this class originated while I was answering this StackOverflow question. To summarize, the poster was looking for a way to maintain a dictionary of heavyweight objects, and block subsequent calls for the same key while an instance was being created (rather than having subsequent calls create instances that aren't added to the dictionary) while not blocking calls for other objects.

Using the Code

The majority of the code in the FactoryDictionary is a boilerplate implementation of the IDictionary<TKey, TValue> interface (and the other associated interfaces), which I won't go into here since they're fairly straightforward. Using these functions, the FactoryDictionary operates in the same manner as the Dictionary<TKey, TValue> class. The segment we're actually interested in is contained in two parts: the ValueWrapper class and the GetOrCreate function. The relevant code appears below:

C#
private class ValueWrapper
{
    public TValue Value { get; private set; }
    private volatile object valueLock;
    private bool isCreated;
    private Func<TValue> constructor;

    public void WaitForInitialization()
    {
        object sync = valueLock;

        if (sync != null)
        {
            lock (sync)
            {
                if (!isCreated)
                {
                    isCreated = true;

                    Value = constructor();

                    valueLock = null;
                }
            }
        }
    }

    public ValueWrapper(TValue value)
    {
        Value = value;
        isCreated = true;
    }

    public ValueWrapper(Func<TValue> constructor)
    {
        valueLock = new object();
        isCreated = false;
        this.constructor = constructor;
    }
}

public TValue GetOrCreate(TKey key, Func<TValue> constructor)
{
    ValueWrapper wrapper;

    lock (backingLock)
    {
        if (!backing.TryGetValue(key, out wrapper))
        {
            wrapper = new FactoryDictionary<TKey, TValue>.ValueWrapper(constructor);

            backing.Add(key, wrapper);
        }
    }

    wrapper.WaitForInitialization();

    return wrapper.Value;
}

The implementation is pretty simple: the dictionary inspects its internal dictionary to see if a wrapper exists for the specified key. If one does not, it creates a wrapper, and supplies the user-specified delegate that will actually do the heavy lifting in creating the object. Once the wrapper is in hand, it calls WaitForInitialization to pause, if necessary, for object instantiation.

Upon the first call, an exclusive lock is acquired on the synchronization object, the creation flag is set, the object is created, the synchronization object is cleared, and the exclusive lock is released. If a subsequent call occurs before the object is created (or, more specifically, before the synchronization object is cleared), the thread blocks until prior locks are released. Once the locked code block is entered the creation flag has already been set, so no action is performed and the lock is immediately released. Calls that occur after the object has been created will return immediately, since the synchronization object has been cleared.

Taking this approach yields a fairly transparent code, and consumption of the FactoryDictionary is simple. Consider a dictionary with a string key and a heavyweight object value type named HeavyweightObject.

C#
FactoryDictionary<string, HeavyweightObject> dictionary;

All that is required to use the pseudo-double-checking dictionary functionality is the following:

C#
dictionary.GetOrCreate("key", () => new HeavyweightObject());

This will either retrieve or create the object associated with the "key" string, using the lambda-declared delegate to perform the actual instantiation. In this way, subsequent calls to GetOrCreate with "key" as the key will only block (rather than instantiate a throw-away instance), and calls for other keys will not be affected.

Points of Interest

The delegate approach was taken to support objects with parameterized constructors; while I could have specified the new() generic condition on the TValue argument, that would only allow objects with a parameterless constructor to be stored.

Example Usage

Let's consider a mortgage company named XYZMortgage (XYZ for short). XYZ has an automated mortgage foreclosure analysis application that reviews a subset of their mortgage portfoloio on a daily basis to determine eligibility for foreclosure, and it uses a fairly heavyweight object to define a particular mortgage type (FHA, VA, Conventional, etc.) and term. These mortgage type objects have a relatively long creation time, since they have to retrieve information about current interest rates, revised business rules, and other ancillary information whenever they're created. Because there is a sufficient number of profile objects to make creating them all up front too expensive for only a subset of the mortgages held by the company, XYZ has implemented a lazy-loading approach so that mortgage profiles that aren't processed aren't actually created and loaded.

Every day, a pool of 100 threads is created for processing the loans concurrently. These threads are not specific to any particular mortgage profile, they each just process a loan and obtain the next one from a synchronized queue. Because of this, there will be multiple threads at any given time that try to retrieve the mortgage profile object from the shared FactoryDictionary that is holding on to the cache of these heavyweight objects. For the sake of argument, we'll say that it takes an average of 700ms to create a given mortgage profile object, since there are multiple other systems that must be "touched" in order to fully populate its data, but once the object has been created it does not need any further updating for that day. Each loan is associated with a profile by a shared string key called ProfileKey.

C#
private static FactoryDictionary<string, LoanProfile> profiles = 
			new FactoryDictionary<string, LoanProfile>();

So now we're getting somewhere; we at least have a declaration for our factory. But how do we get a profile? For this first version, let's say that all LoanProfile objects are the same except for some property/variable values, so we have a simple constructor that takes the ProfileKey and retrieves the necessary data itself. In that case, at the start of the body of the loop inside each thread method we would have something like this:

C#
// loop declaration code here, with "mortgage" as the loop variable
{
    LoanProfile profile = profiles.GetOrCreate(mortgage.ProfileKey, () => 
					new LoanProfile(mortgage.ProfileKey));
    
    // business logic depending on the values in the mortgage and the profile...
}

This will look for a LoanProfile object in the profiles object that corresponds to the supplied key. If one isn't found, it will call the supplied delegate (supplied here as a lambda expression) to create a new one. It's important to note that the delegate is only called when the key doesn't exist. If the key is present, then the delegate itself is never called, so an additional object is never actually created.

But, you say, it's unlikely that all LoanProfile objects can be represented without some form of polymorphism! That's probably true. We may have a subclass of LoanProfile for each of the various loan types (FHA, VA, Conventional, etc.) and a (true) factory object to create them. Ordinarily, we have something like this:

C#
public static class LoanProfileFactory
{
    public static LoanProfile CreateProfile(LoanType type)
    {
        // logic for creating the appropriate concrete class
    }
}

The beauty of the delegate-based approach is that it fits perfectly with a factory like this. All we need to do is:

C#
// loop declaration code here, with "mortgage" as the loop variable
{
    LoanProfile profile = profiles.GetOrCreate(mortgage.ProfileKey, () => 
    {
        LoanProfile created = LoanProfileFactory.CreateProfile(mortgage.LoanType);
        
        created.LoadProfile(mortgage.ProfileKey);
    });
    
    // business logic depending on the values in the mortgage and the profile...
}

We've made the profile creation logic more complex than it was, but it still fits with the pattern. The newly complex logic is only called when the key doesn't exist.

Of course, thread safety could be implemented in an ordinary dictionary by simply locking a common object and blocking all threads during the creation process. However, in this scenario, while there are enough profiles to warrant lazy loading, they are reused enough to warrant caching them. If we block all threads while these heavy objects are being created, then we could substantially increase the running time of the process without any real benefit (there's no legitimate reason why a thread that is requesting a key for profile "abc" should have to wait for a thread requesting the profile for "def", since they have nothing to do with each other). The advantage of the FactoryDictionary, though, is that it takes care of all of the thread blocking for us. Even though the creation process is long, requests from other threads for a different key won't be blocked by the fact that this key is being created, but requests from other threads for an object with this key will block until the creation process has finished so that they may use the object that we are in the process of creating.

History

  • Originally developed and submitted on 11/27/2009
  • Updated the code to use volatile on the synchronization object on 12/1/2009
  • Added Example Usage on 12/2/2009

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)