Before Starting
I initially started this work to simply exercise my C++ skills as I stopped using it for some time. Then I compared the performance of my first version with the STL's unordered_map
class and I was impressed that this solution was much faster and also consumed less memory (in fact, the .NET Dictionary
is already much faster, even if it is managed code), so I decided to finish the job and have a fully functional Dictionary
in C++.
I know, I am not using the C++ standard naming convention. I am really using the .NET naming convention here. Also, at work I use Visual Studio 2008, so I am not using C++11 features and, even if I can use C++11 as I have Visual 2013 at home, I don't plan to update this code to be C++11 compliant so soon, as I actually know lots of places that still use old C++ versions. If you like the code and want to make it C++11 compliant, feel free to do it.
Introduction
I constantly see people affirming that C++ (native code) is much faster than C#/.NET/managed code, yet many times when I see the .NET ports of C++ applications they execute faster than their C++ counterparts. I attribute this to three main factors:
- Memory allocations;
- Better associative classes found in .NET;
- Better strings (this is explained in the end of the article and is related to the previous item).
In many performance comparisons people say that C++ is not slower than C#, that it appears to be slower because it is actually freeing the used memory while .NET is accumulating memory in loops and will only free it sometime later. Well, if the "sometime later" is not seen by the users, then C# will be giving the results faster, even if it takes more time to free the memory. Yet the C++ problem is usually not the time spent deallocating memory, it is the time spent allocating it. The constructors and destructors, in many cases, are so fast they aren't noticed. This problem made me write the O(1) Object Pool in C++ article that tries to give the performance advantage back to C++.
Then, there's the second problem: The classes used as "associative arrays". They are named maps in C++ and dictionaries in .NET. .NET was built on the idea of hash-codes (Java too, but I am not discussing it here). All the primitive types, strings and the most important structs expected to be used as keys have good hash-code generators, which are expected to create well distributed results and to return really fast. C++, unfortunately, doesn't come with such support and for many time the only solution were the non-hashed maps, which are pretty slow. Now STL has the unordered_map
class, yet some of the default hash-generators are terrible and the memory consumption is big (at least the Visual Studio implementations that I checked).
So, to try to reduce such a problem, I am giving a C++ implementation of a .NET like dictionary. Instead of giving many default hash-generators that can be slow, I don't give a default generator for those types that I don't know how to hash. In my opinion, it is better to be forced to implement a good hash-generator than to lose performance by using a bad one.
Hashing and buckets
The entire idea of dictionaries and unordered maps is to associate values to keys, using a hash-code as the "indexer". The generated hash-code must always be the same for the same key. It is expected that hash-codes are well distributed and so, if possible, two different values should not generate the same hash-code. Yet, as the hash-code is a single numeric value, the dictionaries/maps must be prepared to deal with 2 or more different keys that generate the same hash-code.
When 32-bit values are used the hash-codes can have more than 4 billion different values (in 64 bits, multiply 4 billions by 4 billions), so the hash-code can't be used directly as a position in an array to find an item. Some math is done over the hash-code to chose a "bucket" where the key/value pair will be stored. Those buckets, then, can store many pairs.
I don't know if this is how the STL's unordered_map
is implemented or if it is a Visual C++ specific implementation, but each bucket is a vector
that can be resized independently and then there are some other rules to actually increase the number of buckets. In .NET we have an array of buckets and, maybe because .NET doesn't allow references to structs (which could be solved differently), these buckets are indexes that point to an array of structs (the real data).
My implementation is similar to the .NET one, but instead of having 2 arrays, the array of buckets is already an array of structs. Then, if there are more items placed in the same bucket, I allocate the new struct instances from an O(1) Object Pool and point to it as C++ allows me to use pointers to structs. The rule for resizing the array is the same as the .NET one. When the number of items becomes bigger than the number of buckets, there's a resize. This means that with really well distributed hash-codes (like sequential ones) we will never have two items in the same bucket.
As a C++ specific detail, I don't allocate an array of structs directly. This would invoke the default constructor if the TKey
or TValue
is a class (not a pointer) and has a default constructor. Instead, I simply allocate the block of memory and when an item is added I use the placement new to initialize the struct there.
Using the class
If you ever used the .NET dictionaries you will probably consider this class very easy to use. Considering you don't want to change the hash-generator, you could declare a dictionary like this:
Dictionary<int, int> dictionary;
Where the parameters inside the < and > are the type of key and the type of values, respectively.
Then, there are the following methods:
Insertion and modification
- Add: Adds a new key/value pair to the dictionary. If there's already a pair that possesses the same key, an exception is thrown;
- Set: Associates a value to a key, adding a new pair to the dictionary or replacing the value of an already existing pair;
- TryAdd: Tries to add a new key/value association to the dictionary, returning
true
if it was added or false
, without changing anything, if there's a pair that uses the given key.
Reading
- GetCount: Returns the number of pairs that exist in the dictionary;
- ContainsKey: Verifies if a given key exists in the dictionary, returning
true
or false
, but doesn't return the associated value; - GetValue: Returns the value associated to a given key or throws an exception if such key is not in the dictionary;
- GetValueOrDefault: Gets a value associated to a given key or returns the default value. This method is overloaded. One version returns the default for the
TValue
type while the other allows you to provide such a default value; - TryGetValue: Actually the
ContainsKey
, the GetValue
and the GetValueOrDefault
methods call this one. It returns the pointer for the value associated to the given key or NULL
if there's no such association. So, considering that ContainsKey
and GetValue
call it, prefer using this method instead of doing ContainsKey
and then a GetValue
.
Multi-action
-
GetOrCreateValue: This method searches for a value for the given key. If one is found it is returned directly. If not, it invokes a valueCreator
to create such a value, then it adds it to the dictionary before returning it. This method is overloaded and one of the overloads works pretty well with lambdas while the other receives a function pointer and an additional context pointer, which is passed to the function pointer when invoking it.
As I consider this method to be the hardest to use, I will give one example per overload.
With lambda:
string foundOrCreatedString = dictionary.GetOrCreateValue(57, [](int key){return to_string(key);});
In this case the GetOrCreateValue
will look for a string bound to the key 57. If one is found, it is returned (and if it was added by hand, it may not be "57"). If it is not found, then one will be generated (using the to_string(key)
), added to the dictionary and finally returned.
With function pointer:
string StringFromIntCreator(const int &value, void *ignoredContext)
{
return to_string(value);
}
string foundOrCreatedString = dictionary.GetOrCreateValue(57, &StringFromIntCreator, (void *)NULL);
Removal and clearance
- Clear: Removes all items from the dictionary. This method does not reduce the capacity of the dictionary.
- Remove: Tries to remove a pair from the dictionary. The search is done by the key only. Returns
true
if the item was found and removed, false
if the item was not found.
Memory Management
The dictionary allows fast insertions because of its indexing and because in many situations it can simply put the new items directly inside an array. Yet, it starts with a small array and many "resizes" may be required, which consume time. So, if you know how many items you will put in the dictionary, you can set its capacity before doing any work (which can be set even on the constructor).
Also, if you just finished adding items, or if you cleared the dictionary and want it to free its inner array, you will probably want to set its capacity to a smaller one.
So, here are the methods:
- GetCapacity: Informs the size of the buckets array used to store items. While this value is greater than count, new items can be added without causing the dictionary to resize;
- SetCapacity: Tries to set the size of the buckets array. The value passed as parameter may be changed to a bigger one that's not divisible by the values ranging from 2 to 31. Any value smaller than 31 is changed to 31;
- TrimExcess: Tries to change the capacity to be the same as count.
Enumeration
The following three methods can be used to enumerate all the items added to the dictionary. It is your responsibility to delete the returned enumerator.
The results of the enumerations are "bucket ordered" and those items that fit the same bucket may be reordered on resizes so, except if you are debugging the buckets or something like that, consider that the enumerations will be unordered.
- CreateEnumerator: Creates an enumerator that enumerates all items in the dictionary as
Pair<TKey, TValue>
; - CreateKeysEnumerator: Creates an enumerator that enumerates all the keys of the dictionary;
- CreateValuesEnumerator: Creates an enumerator that enumerates all the values of the dictionary.
Differences to .NET
Even if this dictionary was inpired by the .NET one, it has many differences:
- The equality comparer is a template parameter given to the class, not a normal parameter given to the constructor. Actually I didn't find any performance loss by using "interfaces", yet I considered that it became harder to use it with the interfaces. Using the comparer as a template parameter may be bad in .NET, but in C++ it works great and conforms better with the STL classes (even if this was never my purpose);
- I added methods that I consider useful, like the
GetValueOrDefault
, GetOrCreateValue
(which is ideal for caches and has better performance than doing a search and later an add), SetCapacity
and TrimExcess
. The .NET dictionary doesn't offer those methods and I can only say that I consider that a shame, as they are extremely useful (and for the memory ones, .NET is expected to free memory automatically, it is pretty bad that the capacity is never reduced and that you can't do it by hand); - The enumerators are the biggest difference. The .NET uses enumerables, which can generate many enumerators. Well, I consider it useless for a method like
EnumerateItems
to create an IEnumerable
, which has the only purpose of creating enumerators. The method can already create the enumerator and if we really want an "enumerable" object, we can create one by using a lambda to invoke such method. Actually this solves a problem as many .NET collections are enumerable and have other methods that generate different enumerables. In my opinion it will be better to only generate enumerators; - Continuing with the enumerators, the .NET version requires two virtual calls for a single enumerator step. It would be possible to use a single method with an
out
parameter, but this is not how it works. Well, in C++ returning pointers is common, so I decided to return a pointer to the current item or NULL
. This reduces the number of calls needed and, in my opinion, simplifies writing those enumerators; - The
size_t
type is used for the hash-codes, the count and the capacity. This actually simplifies the math done with the hash-codes, as in .NET a mod operation over a negative value will give a negative result (used as the bucket number) so the .NET version always loses the first bit. As size_t
is unsigned, it is not necessary to remove the first bit. The use of size_t
also means it is 32-bit in 32-bit computers and 64-bit in 64-bit computers. Aside from better hashing, this allows to create dictionaries that use more than 4 GB in memory (you can consider this a bad practice, but there are some specific caching scenarios that may benefit from this).
Differences to STL's unordered_map?
Well, too many. Even if both have the same purpose, their utilization is pretty different, and I really consider this one easier to use.
What I know is that for very well distributed hash-codes this implementation is really faster on additions and in all my tests it was faster to remove items and to destroy the dictionary itself. It also consumes less memory to store items, yet it needs more memory in the exact moment that it is doing a resize, as a single big array must be resized while the STL's unordered_map
may resize each bucket individually. It is important to note that my comparisons were made against the STL that comes with Visual C++. I don't know how well Visual C++ optimizes the STL classes and I am not sure if the Visual C++'s STL library is the real one or a Microsoft version of the STL.
Creating a comparer/hasher
To create a comparer/hasher for your own types you must create a class that has two public static methods:
- GetHashCode: This function should receive an input (of type
TKey
of the dictionary, it can be a reference if passing references is faster) and should generate a size_t hash-code for it. For integral numeric types up to the size of size_t returning the input value directly is valid, but for bigger values, like 64-bit values in 32-bit computers, it is good to combine the high 32-bit with the low 32-bit of the value; - Equals: This function should receive two input parameters (again, the
TKey
used by the dictionary) and should return if the values are equal. Note that it is not necessary to be the same "instance", only to be considered equal. Two identical strings can exist in memory and have different addresses, yet they should be considered equals (and should generate the same hash-code too).
If you create such a class as an specialization of the DefaultEqualityComparer
template class, then it will be used as the default comparer for dictionaries that don't choose a different key comparer.
To create the default hash generators (that only exist for the small integer types and for the void*
) I used this #define
(which you can use too):
#define IMPLEMENT_DEFAULT_EQUALITY_COMPARER_AS_DIRECT_COMPARISON_AND_CAST(T) \
template<> \
class DefaultEqualityComparer<T> \
{ \
public: \
static size_t GetHashCode(T value) \
{ \
return (size_t)value; \
} \
static bool Equals(T value1, T value2) \
{ \
return value1 == value2; \
} \
}
This will simply return the input value itself cast as size_t
as the hash-code and will use the ==
operator to do the comparison.
If we have a different type, like a string
, we would need to use a more complex logic. The following is an example of a hash-code generator for the std::string
:
template<>
class DefaultEqualityComparer<std::string>
{
public:
static bool Equals(const std::string &value1, const std::string &value2)
{
return value1 == value2;
}
static size_t GetHashCode(const std::string &value)
{
size_t length = value.length();
if (length < sizeof(size_t))
{
if (length == 0)
return 0;
const char *cString = value.c_str();
size_t result = 0;
for(size_t i=0; i<length; i++)
{
result <<= 8;
result |= *cString;
cString++;
}
return result;
}
const char *cString = value.c_str();
size_t *asSizeTPointer = (size_t *)cString;
size_t result = *asSizeTPointer;
size_t lastCharactersToUseCount = length - sizeof(size_t);
if (lastCharactersToUseCount > sizeof(size_t))
lastCharactersToUseCount = sizeof(size_t);
if (lastCharactersToUseCount > 0)
{
size_t otherResult;
if (lastCharactersToUseCount == sizeof(size_t))
otherResult = *((size_t *)(cString + length - sizeof(size_t)));
else
{
otherResult = 0;
cString += length;
for(size_t i=0; i<lastCharactersToUseCount; i++)
{
cString --;
otherResult <<= 8;
otherResult |= *cString;
}
}
result ^= otherResult;
result ^= length;
}
return result;
}
};
It is important to note that it only considers the first and last bytes of the string to calculate the hash-code. This makes it an almost constant time hash-code generator, but surely doesn't generate the best possible hash-codes (so, the hash-code may be generated very fast, yet the dictionary may become pretty slow because of bad hashing). If your strings have different contents but start and finish with the same characters, it will do a terrible job.
Actually strings are another point that may give advantage to .NET, Java and some other languages considered "slow", as the strings objects are immutable (or have really good copying logic), avoiding a memory copy each time one string variable is assigned to another, all the strings written directly in the code are really saved as string objects, avoiding the conversion from char*
to string (which may happen per call in C++) and can store a pre-calculated hash-code.
Considering that C++ strings don't have a pre-calculated hash-code, some algorithms will need to read the entire string contents to generate it, which will become slow if the strings are big. In this situation, dictionaries/unordered_maps that have big string keys may become much slower than normal maps, as normal maps will stop reading the contents of the compared strings at the first difference while the hash generator will be forced to read the entire string before giving a result.
The sample
The sample application is a performance test between the Dictionary
and a map
(not an unordered_map
). I made it use a map
because I wrote the solution in Visual Studio 2008 and it doesn't come with an unordered_map
, yet you can easily replace all references of a map< by unordered_map< and check the results against it.
The sample will add some millions of items, will search all of them by key, will remove them and will do some other actions, measuring how much time it takes. It doesn't measure the memory consumption, yet I can say that in my computer the Dictionary
consumed 575 MB for 20 million int/int pairs while the map
consumed 670 MB for the same amount. Note: The sample doesn't have any exception handling and, as it creates lots of items, it may crash. If that's the case, try reducing the number of items from 20 millions to 10 millions (or another smaller value) or, if possible, try compiling it in 64-bit instead of 32.
Also, even when compiled in Release mode, the code runs slower inside the Visual Studio. The Dictionary
code is only a little slower, but the map
executed about 10 times slower on inserts and took 21 minutes for the removes (it takes only 5 seconds running outside Visual Studio). So, to get the correct results, compile in Release and run it outside Visual Studio. If you don't do that you will only benefit the Dictionary
code even more.
This is the result running it outside Visual Studio:
We will first compare the speed difference with sequential values:
Testing Dictionary...
Adding 20000000 items: 1045 milliseconds.
Searching all the items by key: 172 milliseconds.
Removing all items by key: 156 milliseconds.
Re-adding all items: 374 milliseconds.
Time to destroy the entire collection: 78 milliseconds.
Full test finished in 1825 milliseconds.
Testing Map...
Adding 20000000 items: 6302 milliseconds.
Searching all the items by key: 2372 milliseconds.
Removing all items by key: 4836 milliseconds.
Re-adding all items: 6178 milliseconds.
Time to destroy the entire collection: 1170 milliseconds.
Full test finished in 20858 milliseconds.
Now we will compare the speed of random values:
Dictionary
Adding: 7878 milliseconds.
Searching for random values: Found: 77317, Not Found: 19922683, Time: 4446 milliseconds.
Deleting dictionary: 312 milliseconds.
Map
Adding: 30623 milliseconds.
Searching for random values: Found: 77317, Not Found: 19922683, Time: 32574 milliseconds.
Deleting map: 5335 milliseconds.
The user Arild Fiskum did the tests using Visual Studio 2012, comparing this dictionary with the unordered_map
and using strings as keys. I got impressed that the unordered_map
actually adds items faster, yet my implementation was faster in the overall tests, and in many cases we only add items once and then read them many, many times, so I think my implementation is good enough. Here are his results:
Larger test set compiled with x64:
We will first compare the speed difference with sequential values:
Testing Dictionary...
Adding 20000000 items: 21090 milliseconds.
Searching all the items by key: 5846 milliseconds.
Removing all items by key: 6821 milliseconds.
Re-adding all items: 5742 milliseconds.
Time to destroy the entire collection: 2799 milliseconds.
Full test finished in 42305 milliseconds.
Testing Map...
Adding 20000000 items: 16752 milliseconds.
Searching all the items by key: 6070 milliseconds.
Removing all items by key: 7948 milliseconds.
Re-adding all items: 16845 milliseconds.
Time to destroy the entire collection: 4266 milliseconds.
Full test finished in 51886 milliseconds.
Bugs?
If you find a bug, let me know. I didn't had time to do extensive testings on the code. I tested all the methods individually but I didn't use many different combinations, so it is still possible that I missed something. I really hope there are no bugs but, if you find one, tell me and I will do my best to solve it.