Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Creating PartitionedDictionary

0.00/5 (No votes)
2 Nov 2014 1  
A suggested solution to create Partition based Dictionary (reducing chances of OOM errors and adding capability to hold bigger KVP set in Dictionary).

Now, Serializable ConcurrentPartitionedDictionary is also available as Code Project article.

Introduction

This article presents a generic solution, called PartitionedDictionary, which is capable of creating partitions based dictionary. Each partition of this dictionary, can theoritically hold an entire dictionary of maximum possible size in .Net framework. And, theoritically, this dictionary itself can grow as a miximum size dictionary.

Thus, holding total of Square(Max.Size) dictionary values (ONLY IN IDEAL SITUATION, even for that you might require to compile with built flags, gcAllowVeryLargeObjects and LARGEADDRESSAWARE for x86 programs).

Proposed dictionary can be useful in situations, such as:

1. Data is too huge to fit in a single C# dictionary.

2. Data size is comparatively small but you are not sure about runtime memory fragmentation and OOM are very probable.

3. You just want to try it for some reason.

NOTE: Before using this dictionary, you must have a good partitioning function (design of which is not a big pain in most of the situation) available to you.

A good partition function:

1. Must be very simple.
2. Must not throw any exception for the range of possible keys.
3. Must map same key to same partition.
4. Is capable of distributing KeyValuePair, uniformly, over generated partitions.

Background

Most of the times, while creating huge dictionaries in C# application, we encounter OutOfMemory exception during dictionary resize operation. It happens due to the fact that C# dictionary, as it grows, tries to reallocate its internal array and unable to find a contiguous space big enough to hold the array.

Very large collection in .Net causes out-of-memory exception

Dictionary Out Of Memory Exception

In some extreme cases, such errors occurs for smaller size dictionaries. The reason is same, and its a after effect of extreme memory fragmentation. Of course, GC comes to help us here, but at times GC itself is helpless.

Large Object Heap fragmentation causes OutOfMemoryException

OutOfMemoryException Class

The proposed solution, creates a dictionary KeyValuePair and places it in a dedicated partition (identified by the supplied partition function to the Ctor). In fact, these partitions are nothing but a dictionary itself. Thus, PartitionedDictionary is a collection of smaller dictionaries which actually holds the KeyValuePairs based on partition logic. Such design, thus:

1. Resizes only the associated partition and not the whole dictionary.

2. Reduces the probability of encountering OOM based on the fact that partition is significantly smaller than the whole dictionary itself, thus, chances of finding required contiguous space in memory are higher.

3. Provides an opportunity to add more items (each partition can be as big as a single c# dictionary) than otherwise possible with a single c# dictionary.

NOTE: This design does not completely eliminate OOM problem.

Implementation

Ctor and Partition Function

PartitionedDictionary has three generic parameters:

TPartitionKey : Type of Partition values (Partition Names)
TKey : Type of Keys of Dictionary
TValue : Type of values of Dictionary

It internally holds a Dictionary of Dictionaries, as shown below:

public class PDictionary<TPartitionKey, TKey, TValue>
{
        private Dictionary<TPartitionKey, Dictionary<TKey, TValue>> _pDictionary;
        private readonly Func<TKey, TPartitionKey> _partitionFunc;

        public PDictionary(Func<TKey, TPartitionKey> partitionFunction)
        {
            _partitionFunc = partitionFunction;
            _pDictionary = new Dictionary<TPartitionKey, Dictionary<TKey, TValue>>();
        }
}

The Ctor asks for a function which will map each key to a partition of that key. Following are some examples of such functions:

Ex.1. Dictionary Key (TKey) is string and the goal is to define partitions based on the first letter of each KEY. Then, Partition names (TPartitionKey) are char, and partition function (as lambda expression) will look like:

x => x.Trim()[0]
//This function will return the first character of Key string, where x is TKey
(DO NOT USE THIS FUNCTION IN PRODUCTION, System.ArgumentOutOfRangeException may occur)

Ex.2. Dictionary Key (TKey) is string and the goal is to define partitions based on the first two (2) letters of each KEY. Then ,Partition names (TPartitionKey) are string, and partition function (as lambda expression) will look like:

x => x.Trim().Substring(0, 2); 
//This function will return the first two characters of Key string, where x is TKey 
(DO NOT USE THIS FUNCTION IN PRODUCTION, System.ArgumentOutOfRangeException may occur)

Ex.2. Dictionary Key (TKey) is NumberType (int, long etc) and the goal is to define partitions based on MOD 10 operation. Then ,Partition names (TPartitionKey) are int/int16, and partition function (as lambda expression) will look like:

x => x%10;//For Int
x => (int16)(x%10);//For Int16
//This function will return the modulo 10

Class Methods

For adding/updating values, following functions are available:

public bool TryAdd(TKey key, TValue value)
{
	var retVal = false;
	TPartitionKey pKey = _partitionFunc(key);
	if (_pDictionary.ContainsKey(pKey))
	{
		var pDict = _pDictionary[pKey];
		if (!pDict.ContainsKey(key))
		{
			pDict.Add(key, value);
			retVal = true;
		}
	}
	else
	{
		_pDictionary.Add(pKey, new Dictionary<TKey, TValue> { { key, value } });
		retVal = true;
	}

	return retVal;
}

public void AddOrUpdate(TKey key, TValue value)
{
	TPartitionKey pKey = _partitionFunc(key);
	if (_pDictionary.ContainsKey(pKey))
	{
		_pDictionary[pKey][key] = value;
	}
	else
	{
		_pDictionary.Add(pKey, new Dictionary<TKey, TValue> { { key, value } });
	}
}

public TValue this[TKey key]
{
    get
    {
        var pKey = _partitionFunc(key);
        return _pDictionary[pKey][key];
    }
    set
    {
        AddOrUpdate(key, value);
    }
}

Other misc. functions/properties are as follows:

public bool Remove(TKey key)
{
	var retVal = false;
	TPartitionKey pKey = _partitionFunc(key);
	if (_pDictionary.ContainsKey(pKey))
	{
		var pDict = _pDictionary[pKey];
		retVal = pDict.Remove(key);

		if (pDict.Count == 0)
			_pDictionary.Remove(pKey);
	}
	return retVal;
}

public void DropAllPartition()
{
	_pDictionary = new Dictionary<TPartitionKey, Dictionary<TKey, TValue>>();
}

public bool DropPartitionByPartitionKey(TPartitionKey pKey)
{
	return _pDictionary.Remove(pKey);
}

public bool DropPartitionWhichHasKey(TKey key)
{
	TPartitionKey pKey = _partitionFunc(key);
	return DropPartitionByPartitionKey(pKey);
}

public IEnumerable<TPartitionKey> AllPartitionKey
{
	get
	{
		return _pDictionary.Keys;
	}
}

/// <summary>
/// Gets all the existing Keys inside the partition identified by the partition key.
/// </summary>
/// <param name="pKey">Partition key</param>
/// <exception cref="System.Collections.Generic.KeyNotFoundException"></exception>
public IEnumerable<TKey> AllKeyInsidePartition(TPartitionKey pKey)
{
	return _pDictionary[pKey].Keys;
}

/// <summary>
/// Gets all the existing values inside the partition identified by the partition key.
/// </summary>
/// <param name="pKey">Partition key</param>
/// <exception cref="System.Collections.Generic.KeyNotFoundException"></exception>
public IEnumerable<TValue> AllValuesInsidePartition(TPartitionKey pKey)
{
	return _pDictionary[pKey].Values;
}

/// <summary>
/// Gets all the existing KeyValuePair inside the partition identified by the partition key.
/// </summary>
/// <param name="pKey">Partition key</param>
/// <exception cref="System.Collections.Generic.KeyNotFoundException"></exception>
public IEnumerable<KeyValuePair<TKey, TValue>> GetPartitionByPartitionKey(TPartitionKey pKey)
{
	return _pDictionary[pKey];
}

/// <summary>
/// This function also checks whether the partition key exists or not.
/// </summary>
/// <param name="key">Key on which partition key will be computed</param>
/// <param name="pKey">Parititon key computed from given key, if exists, else default value.</param>
public bool TryGetExistingPartitionKeyOfKey(TKey key, out TPartitionKey pKey)
{
	pKey = _partitionFunc(key);
	if (_pDictionary.ContainsKey(pKey))
	{
		return true;
	}
	else
	{
		pKey = default(TPartitionKey);
		return false;
	}
}

/// <summary>
/// Gets the count of all existing KeyValuePair inside the partition identified by the partition key.
/// </summary>
/// <param name="pKey">Partition key</param>
/// <exception cref="System.Collections.Generic.KeyNotFoundException"></exception>
public int RecordCountInsidePartitionByPartitionKey(TPartitionKey pKey)
{
	return _pDictionary[pKey].Count;
}

/// <summary>
/// Gets the count of all existing KeyValuePair inside the partition identified by given key.
/// </summary>
/// <param name="pKey">Partition key</param>
/// <exception cref="System.Collections.Generic.KeyNotFoundException"></exception>
public int RecordCountInsidePartitionWhichContainsKey(TKey key)
{
	var pKey = _partitionFunc(key);
	return RecordCountInsidePartitionByPartitionKey(pKey);
}

public long TotalRecordCount
{
	get
	{
		return _pDictionary.Sum(x => (long)x.Value.Count);
	}
}

public int TotalPartitionCount
{
	get
	{
		return _pDictionary.Count;
	}
}

public bool ContainsPartition(TPartitionKey pKey)
{
	return _pDictionary.ContainsKey(pKey);
}

public bool ContainsKey(TKey key)
{
	var pKey = _partitionFunc(key);
	return _pDictionary.ContainsKey(pKey) && _pDictionary[pKey].ContainsKey(key);
}

Using the code

Include the above code (or dll) in your project and start using PDictionary class. I commented the class wherever I felt comments are required (keep the PartitionedDictionary.XML file in the same folder as PartitionedDictionary.dll).

using System;
using PartitionedDictionary;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            var partitionedDictionary = new PDictionary<char, string, int>(x => x.Trim()[0]);

            partitionedDictionary.TryAdd("one", 1);
            partitionedDictionary.TryAdd("two", 2);
            partitionedDictionary.TryAdd("three", 3);
            partitionedDictionary.TryAdd("four", 4);
            partitionedDictionary.TryAdd("five", 5);

            partitionedDictionary.AddOrUpdate("five", 7);
            partitionedDictionary["five"] = 10;

            Console.WriteLine(partitionedDictionary["one"]);
            Console.WriteLine(partitionedDictionary["two"]);
            Console.WriteLine(partitionedDictionary["three"]);
            Console.WriteLine(partitionedDictionary["four"]);
            Console.WriteLine(partitionedDictionary["five"]);//Prints 10

            //Must print 3 as 1 parititon for "o", one for "t", and one for "f"
            Console.WriteLine("Total Partition: " + partitionedDictionary.TotalPartitionCount);

            Console.ReadLine();
        }
    }
}

Points of Interest

The above article shows the implementation of NonConcurrent dictionary, in similar manner a ConcurrentDictionary can be implemented.

Now, Serializable ConcurrentPartitionedDictionary is also available as Code Project article.

NOTE: As it comes with some benefits, use of this dictionary may have some performance hits. Before using it, in production, you must perform stress tests (stats like look-up time, add operation etc) against regular C# dictionary.

History

Its is the V1 of the suggested solution.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here