Introduction
This article covers the implementation of a LargeListDictionary
class that keeps the order of the items like a ListDictionary
, but performs more like a Hashtable
as the list gets large.
Background
I don't quite know where to begin. I absolutely loved the article by Jason Smith: Add Support for "Set" Collections to .NET (Iesi.Collections
namespace). I was looking for an implementation of sets and his was very generic and elegant. Unfortunately, I needed to use his Iesi.Collections.ListSet
class to preserve the order of the items in my Iesi.Collections.Set
, and the performance became unbearable as the set became large because it used a ListDictionary
underneath, which performed horribly for both adding and removing items from the Iesi.Collections.Set
. (A key point here is that sets use keys lookups when adding and removing items, so key lookup was a requirement.)
I then began my search for an ordered list that implements IDictionary
so it could be used in a Iesi.Collections.DictionarySet
(example: Iesi.Collections.ListSet
inherits Iesi.Collections.DictionarySet
and uses a ListDictionary
). I needed a class with better performance than .NET's ListDictionary
. I found the following two implementations here on Code Project:
Both of these implementations basically used a Hashtable
and an ArrayList
internally, the Hashtable
for lookup performance, and the ArrayList
to keep the sequence of the items. They had two catches, though. Removing items from the list was very expensive as the list got large, due to the ArrayList
; and the Iesi.Collections.Set.Intersect
method called the Remove
method. The second was that enumerating them gave you the hashed order, not the order that you added the items.
To sum up, I then created LargeListDictionary
, which will perform similar to a Hashtable
for both adding and removing items by key. I then created LargeListSet
, to test my class' performance with the set operations.
If any of this introduction lost you, please read the three articles mentioned above, then return to finish this one.
Using the code
LargeListDictionary.cs consists of two classes and one struct
:
LargeListDictionary
- a class implementing IDictionary
, ICollection
, and IEnumerable
LargeListDictionaryEnumerator
- a struct
implementing IDictionaryEnumerator
and IEnumerator
IndexedObject
- a simple class containing a key, value and index.
LargeListSet.cs just contains one class: LargeListSet
inheriting Iesi.Collections.DictionarySet
.
For LargeListDictionary
, the key concept is that it uses two Hashtable
members. Both store IndexedObject
objects, but one stores by the original key, and one by an internally generated int
index.
public class LargeListDictionary : IDictionary, ICollection, IEnumerable
{
private int _maxIndex;
private Hashtable _itemsByKey;
private Hashtable _itemsByIndex;
Whenever an item is added, an IndexedObject
is created with its given key and value, and the next index is generated for it. The _maxIndex
keeps track of the highest index given so far.
public void Add(object key, object value)
{
_maxIndex++;
int newIndex = _maxIndex;
IndexedObject io = new IndexedObject(newIndex, key, value);
_itemsByKey.Add(io.Key, io);
_itemsByIndex.Add(io.Index, io);
}
Here, you can see that the expense of the Add
is the overhead of creating the IndexedObject
, and adding it to two Hashtable
members. I am willing to accept this extra expense, because the real benefit comes when I check for objects or remove objects.
public bool Contains(object key)
{
return _itemsByKey.ContainsKey(key);
}
public void Remove(object key)
{
if (_itemsByKey.ContainsKey(key))
{
IndexedObject io = (IndexedObject)_itemsByKey[key];
_itemsByKey.Remove(key);
_itemsByIndex.Remove(io.Index);
}
}
Here, you can see that checking for an item by key gives us straight-up HashTable
performance, while removing an item gives us performance based on removing from two HashTable
s. Both of these operations are a huge improvement over a ListDictionary
when the list gets large.
Why couldn't I just use a single Hashtable
??? Using a wrapper object and two HashTable
s seems overly expensive. What gives?
Remembering the requirements from above, the list has to be enumerable in the order that the items are added. If I add 100, 90, 80, 70, etc. then enumerating them should return 100, 90, 80, 70. If there is one thing we all rudely learn at some point, it's that this is not the case with a HashTable
. That is why I used the second Hashtable
that is keyed by an internal index.
The Secret Sauce: LargeListDictionaryEnumerator
This is where the fun began. At first, I tried to implement something with one HashTable
and one ArrayList
where I would adjust the indexes after something was removed- BAD PLAN. I won't go into anymore detail on that one. With the two HashTable
implementations, I could remove easily, but there remained one problem. Let's say, I added the following objects to the LargeListDictionary
:
Key |
Value |
555-1212 |
J. Smith |
555-4567 |
M. Rose |
555-3333 |
B. Jones |
The items would be wrapped with an IndexedObject
, and the HashTable
s would contain the following:
Key |
Index |
Value |
555-1212 |
0 |
J. Smith |
555-4567 |
1 |
M. Rose |
555-3333 |
2 |
B. Jones |
Looking at the result, it is easy to see how I could enumerate. Just use a counter and look up the IndexedObject
from the HashTable
keyed by index. WAIT! It's not that simple once someone starts mucking with the list. After removing an item and adding a new one, the HashTable
s would contain the following:
Key |
Index |
Value |
555-1212 |
0 |
J. Smith |
555-3333 |
2 |
B. Jones |
555-2727 |
3 |
New Guy |
In the above example, I removed M. Rose from the list and added New Guy. Because the LargeListDictionary
only increments its index generator, I end up with a gap where M. Rose used to be. This brings us to the implementation of LargeListDictionaryEnumerator
.
public struct LargeListDictionaryEnumerator :
IDictionaryEnumerator, IEnumerator
{
private int _currentIndex;
private Hashtable _itemsByIndex;
private int _maxIndex;
public LargeListDictionaryEnumerator(Hashtable itemsByIndex, int maxIndex)
{
_itemsByIndex = itemsByIndex;
_currentIndex = -1;
_maxIndex = maxIndex;
}
public bool MoveNext()
{
while (_currentIndex <= _maxIndex)
{
_currentIndex++;
if (_itemsByIndex.ContainsKey(_currentIndex))
return true;
}
return false;
}
}
As you can see, the secret is in the MoveNext
method of the enumerator. In the constructor, I pass in the HashTable
that is keyed by our given index and the max index that LargeListDictionary
has assigned so far. When the enumerator moves next, we run through a while
loop to get to the next item that exists. Voila! Enumeration returns the items in the order they were added.
Other Performance Considerations
If you perform many remove and add operations on a LargeListDictionary
, you will get more 'gaps' in the indexes. This will slow down enumeration. To address this, you could easily add a method to 'reindex' that would remove the gaps.
This kind of reindexing, or some other kind of counter may also be needed if you wanted to implement IList
. True indexed (not the internal index) access like this[int index]
or RemoveAt(int index)
was not one of my requirements. Once you have gaps in the internal index, item[2]
is not guaranteed to be IndexedObject.Index = 2
.
Bonus: LargeListSet
The LargeListSet
class is just a bonus. I is a class that inherits Iesi.Collections.DictionarySet
and uses LargeListDictionary
as the IDictionary
underneath.
The TestApp
The test app provided is just a little GUI I put together to compare relative performance between sets based on different underlying classes that implement IDictionary
. It also has a test button to verify ordered enumeration. It includes the source code from Jason Smith's article on sets as well as the source from Mike McPhail's article on HashList
.
Conclusion
In summation, I hope that someone else finds LargeListDictionary
useful where a large dictionary is needed but the sequence of items added is important to preserve.
History
March 5th, 2004 : Initial article created and submitted.