Now, Serializable ConcurrentPartitionedDictionary is also available as Code Project article.
Introduction
This article presents a generic solution, called PartitionedDictionary, which is capable of creating partitions based dictionary. Each partition of this dictionary, can theoritically hold an entire dictionary of maximum possible size in .Net framework. And, theoritically, this dictionary itself can grow as a miximum size dictionary.
Thus, holding total of Square(Max.Size) dictionary values (ONLY IN IDEAL SITUATION, even for that you might require to compile with built flags, gcAllowVeryLargeObjects and LARGEADDRESSAWARE for x86 programs).
Proposed dictionary can be useful in situations, such as:
1. Data is too huge to fit in a single C# dictionary.
2. Data size is comparatively small but you are not sure about runtime memory fragmentation and OOM are very probable.
3. You just want to try it for some reason.
NOTE: Before using this dictionary, you must have a good partitioning function (design of which is not a big pain in most of the situation) available to you.
A good partition function:
1. Must be very simple.
2. Must not throw any exception for the range of possible keys.
3. Must map same key to same partition.
4. Is capable of distributing KeyValuePair, uniformly, over generated partitions.
Background
Most of the times, while creating huge dictionaries in C# application, we encounter OutOfMemory exception during dictionary resize operation. It happens due to the fact that C# dictionary, as it grows, tries to reallocate its internal array and unable to find a contiguous space big enough to hold the array.
Very large collection in .Net causes out-of-memory exception
Dictionary Out Of Memory Exception
In some extreme cases, such errors occurs for smaller size dictionaries. The reason is same, and its a after effect of extreme memory fragmentation. Of course, GC comes to help us here, but at times GC itself is helpless.
Large Object Heap fragmentation causes OutOfMemoryException
OutOfMemoryException Class
The proposed solution, creates a dictionary KeyValuePair and places it in a dedicated partition (identified by the supplied partition function to the Ctor). In fact, these partitions are nothing but a dictionary itself. Thus, PartitionedDictionary is a collection of smaller dictionaries which actually holds the KeyValuePairs based on partition logic. Such design, thus:
1. Resizes only the associated partition and not the whole dictionary.
2. Reduces the probability of encountering OOM based on the fact that partition is significantly smaller than the whole dictionary itself, thus, chances of finding required contiguous space in memory are higher.
3. Provides an opportunity to add more items (each partition can be as big as a single c# dictionary) than otherwise possible with a single c# dictionary.
NOTE: This design does not completely eliminate OOM problem.
Implementation
Ctor and Partition Function
PartitionedDictionary has three generic parameters:
TPartitionKey : Type of Partition values (Partition Names)
TKey : Type of Keys of Dictionary
TValue : Type of values of Dictionary
It internally holds a Dictionary of Dictionaries, as shown below:
public class PDictionary<TPartitionKey, TKey, TValue>
{
private Dictionary<TPartitionKey, Dictionary<TKey, TValue>> _pDictionary;
private readonly Func<TKey, TPartitionKey> _partitionFunc;
public PDictionary(Func<TKey, TPartitionKey> partitionFunction)
{
_partitionFunc = partitionFunction;
_pDictionary = new Dictionary<TPartitionKey, Dictionary<TKey, TValue>>();
}
}
The Ctor asks for a function which will map each key to a partition of that key. Following are some examples of such functions:
Ex.1. Dictionary Key (TKey
) is string
and the goal is to define partitions based on the first letter of each KEY. Then, Partition names (TPartitionKey
) are char
, and partition function (as lambda expression) will look like:
x => x.Trim()[0]
(DO NOT USE THIS FUNCTION IN PRODUCTION, System.ArgumentOutOfRangeException may occur)
Ex.2. Dictionary Key (TKey
) is string
and the goal is to define partitions based on the first two (2) letters of each KEY. Then ,Partition names (TPartitionKey
) are string
, and partition function (as lambda expression) will look like:
x => x.Trim().Substring(0, 2);
(DO NOT USE THIS FUNCTION IN PRODUCTION, System.ArgumentOutOfRangeException may occur)
Ex.2. Dictionary Key (TKey
) is NumberType (int, long
etc) and the goal is to define partitions based on MOD 10
operation. Then ,Partition names (TPartitionKey
) are int/int16
, and partition function (as lambda expression) will look like:
x => x%10;x => (int16)(x%10);
Class Methods
For adding/updating values, following functions are available:
public bool TryAdd(TKey key, TValue value)
{
var retVal = false;
TPartitionKey pKey = _partitionFunc(key);
if (_pDictionary.ContainsKey(pKey))
{
var pDict = _pDictionary[pKey];
if (!pDict.ContainsKey(key))
{
pDict.Add(key, value);
retVal = true;
}
}
else
{
_pDictionary.Add(pKey, new Dictionary<TKey, TValue> { { key, value } });
retVal = true;
}
return retVal;
}
public void AddOrUpdate(TKey key, TValue value)
{
TPartitionKey pKey = _partitionFunc(key);
if (_pDictionary.ContainsKey(pKey))
{
_pDictionary[pKey][key] = value;
}
else
{
_pDictionary.Add(pKey, new Dictionary<TKey, TValue> { { key, value } });
}
}
public TValue this[TKey key]
{
get
{
var pKey = _partitionFunc(key);
return _pDictionary[pKey][key];
}
set
{
AddOrUpdate(key, value);
}
}
Other misc. functions/properties are as follows:
public bool Remove(TKey key)
{
var retVal = false;
TPartitionKey pKey = _partitionFunc(key);
if (_pDictionary.ContainsKey(pKey))
{
var pDict = _pDictionary[pKey];
retVal = pDict.Remove(key);
if (pDict.Count == 0)
_pDictionary.Remove(pKey);
}
return retVal;
}
public void DropAllPartition()
{
_pDictionary = new Dictionary<TPartitionKey, Dictionary<TKey, TValue>>();
}
public bool DropPartitionByPartitionKey(TPartitionKey pKey)
{
return _pDictionary.Remove(pKey);
}
public bool DropPartitionWhichHasKey(TKey key)
{
TPartitionKey pKey = _partitionFunc(key);
return DropPartitionByPartitionKey(pKey);
}
public IEnumerable<TPartitionKey> AllPartitionKey
{
get
{
return _pDictionary.Keys;
}
}
public IEnumerable<TKey> AllKeyInsidePartition(TPartitionKey pKey)
{
return _pDictionary[pKey].Keys;
}
public IEnumerable<TValue> AllValuesInsidePartition(TPartitionKey pKey)
{
return _pDictionary[pKey].Values;
}
public IEnumerable<KeyValuePair<TKey, TValue>> GetPartitionByPartitionKey(TPartitionKey pKey)
{
return _pDictionary[pKey];
}
public bool TryGetExistingPartitionKeyOfKey(TKey key, out TPartitionKey pKey)
{
pKey = _partitionFunc(key);
if (_pDictionary.ContainsKey(pKey))
{
return true;
}
else
{
pKey = default(TPartitionKey);
return false;
}
}
public int RecordCountInsidePartitionByPartitionKey(TPartitionKey pKey)
{
return _pDictionary[pKey].Count;
}
public int RecordCountInsidePartitionWhichContainsKey(TKey key)
{
var pKey = _partitionFunc(key);
return RecordCountInsidePartitionByPartitionKey(pKey);
}
public long TotalRecordCount
{
get
{
return _pDictionary.Sum(x => (long)x.Value.Count);
}
}
public int TotalPartitionCount
{
get
{
return _pDictionary.Count;
}
}
public bool ContainsPartition(TPartitionKey pKey)
{
return _pDictionary.ContainsKey(pKey);
}
public bool ContainsKey(TKey key)
{
var pKey = _partitionFunc(key);
return _pDictionary.ContainsKey(pKey) && _pDictionary[pKey].ContainsKey(key);
}
Using the code
Include the above code (or dll) in your project and start using PDictionary class. I commented the class wherever I felt comments are required (keep the PartitionedDictionary.XML file in the same folder as PartitionedDictionary.dll).
using System;
using PartitionedDictionary;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
var partitionedDictionary = new PDictionary<char, string, int>(x => x.Trim()[0]);
partitionedDictionary.TryAdd("one", 1);
partitionedDictionary.TryAdd("two", 2);
partitionedDictionary.TryAdd("three", 3);
partitionedDictionary.TryAdd("four", 4);
partitionedDictionary.TryAdd("five", 5);
partitionedDictionary.AddOrUpdate("five", 7);
partitionedDictionary["five"] = 10;
Console.WriteLine(partitionedDictionary["one"]);
Console.WriteLine(partitionedDictionary["two"]);
Console.WriteLine(partitionedDictionary["three"]);
Console.WriteLine(partitionedDictionary["four"]);
Console.WriteLine(partitionedDictionary["five"]);
Console.WriteLine("Total Partition: " + partitionedDictionary.TotalPartitionCount);
Console.ReadLine();
}
}
}
Points of Interest
The above article shows the implementation of NonConcurrent dictionary, in similar manner a ConcurrentDictionary can be implemented.
NOTE: As it comes with some benefits, use of this dictionary may have some performance hits. Before using it, in production, you must perform stress tests (stats like look-up time, add operation etc) against regular C# dictionary.
History
Its is the V1 of the suggested solution.