Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C#

AddCollection and FragmentableQueue

4.94/5 (13 votes)
5 Mar 2014CPOL24 min read 29.5K   186  
This article presents two collections optimized for good memory consumption and for inserts at the end, being always O(1). The AddCollection can also create immutable views without creating copies of the data.

Purpose

Before presenting how things work, I want to explain the "why".

The classes presented in this article were created to be really fast and to avoid problems found in other collections, which are related to memory consumption and fragmentation. They aren't generic purpose collections, yet they are really optimized to what they do. So, if you need one of the following, use these classes:

  • Store a random number of items in memory and do a single foreach, as fast as possible;
  • Store a random number of items in memory, do a single foreach and allow already used items to be collected as soon as possible;
  • Store a random number of items in memory and create an immutable version of the data as fast as possible, avoiding new memory allocations or copies.

Introduction

The .NET Framework has many generic purpose collection classes, which actually do much more than what's required for most situations and maybe because of doing too much they aren't optimized for the most common uses.

To me, the biggest example is the List<T> class. In many situations if we simply want to load a variable number of data in memory before doing a foreach (this happens if we read a file or get data from the database and we want to close such resource before the foreach) we will use a list to put such data in memory. The List<T> is actually the simplest class in the .NET Base Class Library that can be used to do this kind of job, so it seems to be the ideal one... but actually it is possible to do better and this article is all about it.

The Big O Notation

In this article I will use some O(n) and O(1) notations. Such notation is used to express how the methods of a collection work considering the actual number of elements in such collection.

O(n) means that the specific action is directly affected by the actual number of items in the collection. For example, copying the contents of one collection to another is an O(n) operation because all items must be accessed to be copied.

O(1) means that the specific action has a constant complexity. It is not important if the collection is empty or if it has millions of items. Usually getting the number of items in a collection is an O(1) because most collections store the actual number of items in a specific field, so it is not necessary to iterate the entire collection to know how many items are there.

It is important to note that two O(1) operations can have different speeds. The O(1) only means the action is not becoming slower by the number of items in the collection, but it is not saying the action is fast or that it can't have "different decisions" from time to time.

Problems of the List<T> class

Before presenting the solution, let's understand the problems of using the List<T> class for such a buffer. The List<T> uses a single array to put all its items, this gives an O(1) complexity for the getter and the setter, but not to the Add() method. Considering the default configuration for .NET applications, such single array to put all items means that:

  • From time to time a bigger array must be allocated and all items must be copied from one array to the other;
  • We can't have lists that use more than 2GB of memory, even if we have many GB of free memory in the computer (note that this is the size of the array in bytes, the actual array length may be much smaller as, for example, an array of int will occupy 4 times its length in memory bytes);
  • We need a single block of free memory of the requested size. It is possible that our computer has the right amount of free memory, but in many fragmented blocks;
  • For some time we will have the bigger and the smaller arrays in memory, as the garbage collector takes some time to run.

We can actually configure the .NET applications to use arrays bigger than 2GB, but as such configuration changes all the array allocations it is considered relatively unsafe (as those arrays may be passed to unmanaged code) and, even then, the array Length will be limited to 2 billion of items (31 bit lengths) and we will still have the other problems.

Performance

It is said that the List<T>.Add() has best case complexity of O(1) and a worst case of O(n). This is even used to compare the performance of the List<T> with the ImmutableList<T>, for example (you can see the table at http://blogs.msdn.com/b/andrewarnottms/archive/2013/01/07/immutable-collection-algorithmic-complexity.aspx and in the previous article inside the Performance characteristics topic).

This O(n) looks worse than it really is. It is true that if we add items at random moments one of the actions may become an O(n), as at some moment the internal array used to store items in the list may need to be resized. But considering this doesn't happen so often, we can say that the list class has an average complexity of O(2).

To exemplify this, let's understand how the List<T> class works.

By default, it starts with an array of 4 items.

So, adding the first, second, third and fourth items is an O(1) per operation, as the item is simply put in the inner array. So the total for the 4 actions is an O(4), as we have four Add() calls.

Then, to add the fifth item a new array needs to be allocated (which has the double length of the previous array, so in this case it is 8), then the 4 original items need to be copied (an O(4) operation) and finally the new item is added (an O(1) operation). This means the 5th insertion alone was an O(5), giving a total of O(9) to insert 5 items (note that the array allocation is not considered for the O() calculation, but it should be considered for performance).

So, if we simply wanted to insert 5 items at once (not 5 items at random moments) we can say that each action was near O(2) (1,8 to be very precise). Considering it is always the same pattern, the new resizes will arrive when adding the 9th item, the 17th item, the 33rd item and so on, and all other operations between the resizes will be O(1). This makes the average result become closer to O(2) when we have more items.

So, if you only want to use a list to insert a random number of items in a single method and return it, we can really consider a list to be an O(2) for Add()s, so to insert 1000 items we can say it is near O(2000) in total complexity.

This is pretty good. It could be better (always O(1) for Add()s) but considering how CPU caches work, the copy of items from a smaller array to a bigger one is really fast, which means that for adds that happen together we will not see a list taking twice the time it needs compared to an O(1) collection, it is probably only taking 10% more time than needed.

So, the List<T> with a worse case of O(n) is actually pretty fast because that worst case is going to become less common the more items we have and because the CPU caches help in that worst case. For methods that insert all items at once, we can really say that it has an average of O(2) per action but, as this is a constant, we should say that it is an O(1), even if that 1 is a little slower than it should.

Doing better

If our purpose is to put some amount of data in memory to then do a foreach we can do better.

The idea is that instead of copying the items of an array to a bigger one each time we need to "resize", we allocate a new array but we simply "link" one array to the other (we will have nodes with arrays). That is, the old array is kept for the first items and the new items are put directly in the new array. This makes the Add() calls become O(1) as there's never a copy of the previous items.

Then, to do a foreach, every time we finish reading all the items from one array we go to the next one.

The fact of "linking" the arrays instead of copying them also means that we will not have some arrays lost in memory waiting for collection after a resize and this will naturally support a memory that's a little fragmented.

The Little Details

We must be aware that those O(1) insertions may lie about the real performance. If I put each item in its own node (like a linked list) I will have an O(1) operation that's pretty slow (yeah, the LinkedList<T> is almost never a solution, even if it is an O(1) list for inserts at the end). That's why I decided to use arrays, so for some time I can put the items directly into the array avoiding new allocations.

That's also why I use the rule of increasing the size of the new arrays (which is pretty similar to what List<T> does, but without the copies). And as I don't need everything to be in a single large array, I also apply a limit on how big these arrays can be. The default limit (which is configurable) is 1 million items. For a collection of ints this represents that those arrays are limited to 4 MB in size (for reference types this will also be 4 MB in size for 32-bit processors and 8 MB in size for 64-bit processors).

Considering such size limit we can have a great performance and also a collection that can use fragmented memory, without actually becoming a source of fragmentation (as happens if we allocate a new node per add). Also, considering that we can allocate as many arrays as needed, we don't have a 2 GB limit. Except for the Count property (which is merely informative), we can really say that we can support collections of any size as long as the computer supports it.

The code to do such O(1) Add() method is this:

C#
public void Add(T item)
{
  if (_countInActualArray == _actualArray.Length)
  {
    long count = _count;
    int maximumSize = _getMaximumBlockSize();

    if (maximumSize < 1)
      throw new InvalidOperationException("The GetMaximumBlockSize delegate returned an invalid value.");

    if (count > maximumSize)
      count = maximumSize;

    var newNode = new _Node(count);
    _actualNode._nextNode = newNode;
    _actualNode = newNode;
    _actualArray = newNode._array;
    _countInActualArray = 0;
  }

  _actualArray[_countInActualArray] = item;
  _count++;
  _countInActualArray++;
}

You should note that I always store the actual array (the last one) so for the new Add()s I don't need to navigate until the last one to add an item.

Immutability: The ForEachCollection

When talking about performance I used a link to the immutable collections. There's a new "boom" of solutions related to immutability as such immutability gives some automatic guarantees, like supporting parallel accesses without any kind of lock and also guaranteeing that you can pass the collection to any method knowing that such method will not "destroy" the contents of your collection. Also, for the method receiving an immutable collection it has the guarantee that the contents aren't going to be changed by the creator of the collection at some random time, so it doesn't need to copy the contents if it wants to be sure the contents will not change.

Considering that the actual class will never replace any item in its inner arrays (it can only add new items), we can create an immutable view of it with ease by copying the reference to the first node and the number of items to another class. This will be very fast (an O(1) operation) as we will not need to copy the contents of the arrays and it will have a full immutability guarantee. Even if we keep adding items to the original AddCollection, the immutable view will know how many items it can read, so the new added items will not be seen by the immutable view.

I named this collection ForEachCollection as this is the only action that it is optimized to do (putting an Immutable before the AddCollection seems wrong as it is not going to do adds any more).

The actual code for the ForEachCollection copies a little more information (like the last array and the number of items in that last array) but this is done only to improve performance and doesn't affect the immutability trait.

Here is its code:

C#
public sealed class ForEachCollection<T>
{
  private readonly AddCollection<T>._Node _firstNode;
  private readonly T[] _lastArray;
  private readonly int _countInLastArray;
  private readonly long _count;

  internal ForEachCollection(AddCollection<T>._Node firstNode, T[] lastArray, int countInLastArray, long count)
  {
    _firstNode = firstNode;
    _lastArray = lastArray;
    _countInLastArray = countInLastArray;
    _count = count;
  }

  public long Count
  {
    get
    {
      return _count;
    }
  }

  public Enumerator GetEnumerator()
  {
    return new Enumerator(_firstNode, _lastArray, _countInLastArray);
  }
  
  // Note: The real code has some more methods and most of the work
  // actually happens in the Enumerator. But you can already see that
  // this class is immutable, as all its fields are readonly.
}

Note that these collections are faster than both the List<T> and the ImmutableList<T> to add new items (and if you really want to compare, to the LinkedList<T> too). To enumerate items these collections should be almost as fast as the List<T> (I don't know why, but enumerations and the ToArray() of these collections were faster than doing the same with the List<T> in all my tests, even if I expected the List<T> to be a little faster). Obviously these collections are much less powerful than both the List<T> and the ImmutableList<T> as they aren't expected to be used as lists that can search for items, remove items etc.

If your purpose is to put some data in memory and then iterate it (making the object immutable or not), then these classes are perfect for the job.

The Clear() method

If you see the AddCollection you will see that it has a Clear() method. So does this break the ForEachCollection immutability?

The answer is no. The Clear() method doesn't change the contents of the existing nodes and arrays (that may be in use by a ForEachCollection). It simply allocates a new first array and node and sets the other fields to say that it is at the start of such node. So, don't worry, even if there's a method that clears the AddCollection, it will not affect the immutability of the ForEachCollection.

IEnumerable<T>, IReadOnlyCollection<T>, IReadOnlyList<T> and LINQ

When I first created these classes I implemented the IReadOnlyList<T> interface, which consequently implemented the IEnumerable<T> and the IReadOnlyCollection<T> interfaces.

I thought this was OK at first but later I changed my mind. This was bad. Really bad.

By being IEnumerable<T> the LINQ methods were available. But the LINQ methods like Count(), ToArray() and ElementAt() actually only look for the non-readonly interfaces, even if those are read-only methods.

This means that if someone received the AddCollection or the ForEachCollection seen as an IEnumerable<T>, doing a Count(), a ToArray() or an ElementAt() call was pretty slow, as the LINQ methods search for the mutable interfaces and, if they aren't found, use a default algorithm that iterates all items (and in the particular case of the ToArray() also creates a lot of intermediate copies, exactly like the List<T> class does).

I could also implement the mutable interfaces, but the AddCollection doesn't implement methods like Remove() or the indexer set. Implementing those methods is not possible or it will break the guarantee required by its immutable view. Worse than that, the ForEachCollection, that is immutable, will look like code-smell implementing mutable interfaces and being read-only.

Finally, as our types support 64-bit Counts, not only 32-bit ones, I decided that it was better to avoid any interfaces and, if you really need it, you can easily create an adapter that implements the interfaces you need (be it only the read-only ones or the mutable ones to be able to use the LINQ methods with full performance). Creating an adapter for the entire collection is also an O(1) operation, so it should not be that bad.

You can look at the sample download, as I am already giving two extension classes that already allow you to create the adapters that implement those interfaces if you need.

Can it be even better?

Actually the AddCollection can be considered better than the List<T> class if our need is to put a large amount of items in memory and iterate through those items one or even more times, having or not the need for an immutable collection.

But a common situation I see very often is people loading all the database records into a memory collection so they can close the connection as soon as possible (I consider this a bad practice but some DBAs insist that this is how data should be read). Then the code does a foreach over the memory collection doing some transactional work that can take hours.

The main point here is: All the data needs to be loaded at once (usually very fast, taking some seconds when we have millions of records to load) but it can be collected as soon as it is processed. But normal lists (or the AddCollection) will not allow the already processed items to be removed from memory until the entire collection can be garbage collected.

So, can we improve this particular situation?

Answer: Yes and No

We can't change the AddCollection class to remove items that were already read as this will break the ForEachCollection immutability guarantee, but we can create a class similar to the AddCollection optimized for this scenario.

I named this class FragmentableQueue. Its Enqueue() operation is pretty similar to the AddCollection.Add() method, but when Dequeuing (or iterating using an enumerator or foreach) this class always removes the item that was just read (puts the default(T) value in the array) and, when it arrives the end of an array, it allows that entire array/node to be collected.

If this collection is used for fast foreachs it is probably slower than the AddCollection/ForEachCollection as it has that extra logic to clear read items. But if it is used on very large (and time consuming) batches, it has the advantage of allowing the memory of already processed items to be collected.

It also has another trait that may be useful in some particular situations, as you can actually Enqueue() new items while iterating it, and those items will be seen by the running foreach. This can be useful if some items can actually generate sub-items of the same type.

Compared to the .NET Queue<T> type it has the following important differences:

  • It doesn't has the Peek(), Contains(), CopyTo(), ToArray() or the TrimExcess() methods;
  • It has a 64-bit Count;
  • It actually removes items while doing a foreach;
  • It allows new items to be enqueued while doing the foreach;
  • If you keep enqueuing and then dequeuing a specific amount of items it will allocate new arrays while the .NET one will be able to reutilise the same array (after the initial resizes that may be needed).

The comparison to the ImmutableQueue<T> is the hardest one to do. In that performance/complexity table the Enqueue() is an O(1) operation, but there's no information on how the Dequeue() works. Actually it seems that the first Dequeue() is an O(N) and then the others are O(1). I am not really sure because I did not see the last implementation, but I already saw some documents explaining how to create an immutable queue based on two immutable stacks, one that must be reversed on the first use, and I saw that the ImmutableQueue<T> uses such immutable stacks.

This also means that the ImmutableQueue<T> uses more memory than the other solutions and it is probably going to be slower if we keep enqueuing new items while dequeuing (I didn't test it, but I believe that after each Enqueue() the next Dequeue() will become an O(N) again).

So, considering the amount of memory used to store all items, the O(1) for Enqueue() and Dequeue() and the fact that dequeued items and arrays can be collected, I can't see how the FragmentableQueue can't be the winner. Also, it is important to note that even if the ImmutableQueue<T> is thread-safe to be accessed, an action like queue = queue.Enqueue(value); is not, so if you need two threads to queue/dequeue items in parallel you will need to use a lock or an alternative technique instead of simply setting the queue variable, as there's a race condition between creating the new queue with the added/removed item and setting such a variable.

Memory Consumption

When comparing normal mutable collections to the new immutable collections it is not only the performance or the immutability guarantee that counts, it is also important to understand the memory impacts, which can be seen as memory consumption and garbage collections.

It was somewhat said in the article, but not explicitly, so I will try to sum it here:

  • The List<T> and the Queue<T> use a single array as their repository, so they generate garbage only when the inner array needs to be resized. Also, to avoid too many resizes the array length is doubled. This means that if we stop adding items just after the resize we will have lots of unused space;
  • The ImmutableList<T> and the ImmutableQueue<T> use a new node per item. This means that each item uses more memory than the normal mutable lists, but there's never an array with empty places allocated. There are some situations in which it can use less memory, but in most of my tests those collections used more memory than the List<T>. Independently on how much memory they use, by using nodes as independent objects (and by creating new nodes to do the mutations) the immutable collections generate lots of garbage that needs to be collected;
  • The classes presented here use both nodes and arrays. But the nodes aren't per item, they are per array, which starts at size 32 and can grow up to 1 million items (both values are configurable). When new items are added, garbage is never generated. If you put more than 1 million items, the biggest wasted space (if you stop putting items at that terrible moment) is an array with 999,999 unused items. Yet it is going to use less memory than the List<T> most of the time (and when it do uses more, it will be only some bytes of difference, caused by the nodes) while it is going to use much less memory than the immutable collections in most cases (the overhead of one node per item used by the ImmutableList<T> is so big that for 1.2 million ints it was using 18 MB while the List<T> used 8 MB and this solution used 7 MB).

And for the particular case of queues, we must know that:

  • A list used as a queue (enumerated once) will not allow any item to be collected until the list itself can be collected;
  • A mutable queue will allow all dequeued items to be collected as soon as possible, but the inner array used to store the items will not be collected until the queue itself can be collected;
  • The FragmentableQueue will allow items to be collected as soon as they are dequeued and will allow the inner arrays (the nodes) to be collected individually when all their items were read. This is really an "in the middle" situation, as the arrays will not be collected immediately, but we don't need to finish using the entire queue to free some arrays;
  • The ImmutableQueue<T> allows every node to be collected when it is dequeued, but actually it allocates memory even when dequeuing, which can cause exceptions in some extreme situations and seems wrong as we are simply "dequeuing".

In any case, we must remember that for big queues the data is either being allocated in the Large Object Heap or it is probably going to live until generation 2, so allowing to free memory immediately is not going to be a big advantage (and we must remember again that in most cases the immutable collections do use more memory).

Millions of items

I can be a little extremist sometimes (most of the time?). In my examples I always talk about millions of items. But what if we only need to use some items, like 10 or 30?

The answer is that all the collections are optimized to start small and when they are small I think there's nothing to worry about. Lists, queues, immutable lists, immutable queues or the solutions presented in this article will all work and you will probably never find a problem related to those small collections.

I focus on the extreme situations, like having millions of items, because that's where we are going to see if there are problems, be it a large number of garbage collections (which also means slowdowns), be it exceptions caused because we are out-of-memory (even if we actually have enough memory, but a fragmented one) be it a general performance loss even if we don't have garbage collections.

I am not saying that we should put millions of items in memory (in most cases we can find better solutions) but if we need to do it, we must be ready to deal with it.

Conclusion

I hope that these classes are useful to you as they are to me. I really believe they are the best alternative to load some unknown amount of data and either enumerate it only once (the FragmentableQueue) or make it immutable so it can be given to any amount of methods that need an immutable collection and, the best of all, being much faster than using an ImmutableList<T> (even with a builder) and using less memory.

The Sample

The sample application does some speed comparison and tries to show how much memory was used. I consider it useful to see that the classes work and that they are fast, yet we should not focus only on numbers.

Independently on which collection is faster or slower, the difference is insignificant for 10 million items and in real situations the CPU caches are probably not going to be used so frequently (as instead of only doing an empty enumeration we will actually do something with the data before continuing the enumeration). So, it is better to judge by the sum of memory consumption, fragmentation, performance and expected uses than to judge only from performance or memory consumption.

Here is a sample result from the application:

This application only uses some methods of the classes AddCollection,
ForEachCollection and FragmentableQueue. It is only useful if you actually
see its code as the tests done here only prove the collections are working.

If you use this sample to see the performance comparison, you should compile it
in Release mode and you should execute it outside Visual Studio to see how well
it performs.


Testing AddCollection<string>
Adding 10000000 items: 00:00:03.0040895
ToArray: 00:00:00.0378765
Iterating the immutable version: 00:00:01.6909224
Memory used in MB: 304

Testing .NET List<string>
Adding 10000000 items: 00:00:03.0394706
ToArray: 00:00:00.0399380
Iterating: 00:00:01.7178831
Memory used in MB: 330


Testing FragmentableQueue<string>
Enqueuing 10000000 items: 00:00:03.1558260
Memory used with all items in MB: 304
Dequeuing half the number of items: 00:00:00.8744547
Memory used now in MB: 156

Testing .NET Queue<string>
Enqueuing 10000000 items: 00:00:03.2705367
Memory used with all items in MB: 330
Dequeuing half the number of items: 00:00:00.8774198
Memory used now in MB: 197


Testing FragmentableQueue<int>
Enqueuing 100000000 items: 00:00:00.8012018
Memory used with all items in MB: 381
Dequeuing half the number of items: 00:00:00.5150972
Memory used now in MB: 194

Testing .NET Queue<int>
Enqueuing 100000000 items: 00:00:01.3980741
Memory used with all items in MB: 512
Dequeuing half the number of items: 00:00:00.3947550
Memory used now in MB: 512

Tests finished. Press [ENTER] to quit.

To give a brief analysis of the result, we have:

  • The AddCollection<string> was a little faster and used a little less memory than the .NET List<string>;
  • The FragmentableQueue<string> uses exactly the same amount of memory as the AddCollection<string> when all the items are loaded. The same rule applies to the .NET List<string> compared to the Queue<string>;
  • After dequeuing some items, the FragmentableQueue<string> is able to free 148 MB (48%) while the Queue<string> only frees 133 MB (40%);
  • When dealing with ints the FragmentableQueue was still capable of freeing memory (from 381 MB to 194 MB, so a reduction of 49% of memory consumption) while the .NET Queue kept using the same amount of memory. This happens because the .NET queue removes the unused items (allowing strings to be collected, but ints have nothing to collect) but it never reduces the size of its inner array;
  • The FragmentableQueue<int> was slower dequeuing items compared to the .NET class. This was the only situation in which it was slower but considering that it allowed memory to be collected, this could be a consequence of a collection that happened in the middle. Maybe rerunning the tests again I can get different results.

I didn't put the results of the ImmutableList<T> and the ImmutableQueue<T>, but the immutable versions used much more memory for theses tests and where slower (the ImmutableList<T> was almost 9x slower even using a builder). If you want, download the sample, add the Microsoft.Bcl.Immutable and uncomment the #define IMMUTABLE_TEST so those tests are run.

What most impressed me was the ImmutableQueue<int> test. While the normal .NET Queue<int> used 512 MB for 100 million items, the immutable version used 1525 MB. And, when dequeuing, instead of freeing memory, it crashed. This is because new objects are allocated even when dequeing. It really allows the old nodes to be collected, but it uses so much memory that it didn't help.

The source files

The most important source files are:

  • AddCollection.cs: This is actually the code for the AddCollection and for the ForEachCollection. Both classes are in the same file because I think it is easier if you simply want to copy a file and have the classes working and, well, those classes work together;
  • FragmentableQueue.cs: This file contains only the FragmentableQueue class and it can be used independently of any other files in the sample probject, so you are free to copy it to your projects if you want.

And there are two Extensions files. They actually add extension methods to create adapters for the AddCollection and for the ForEachCollection so you can see them either by the IReadOnly* interfaces or by the mutable interfaces. They don't actually fully implement the mutable interfaces but it may be useful to see those objects by the mutable interfaces to be able to use some LINQ methods.

These extensions are in separate files because I understand that some users may not want to have support for such interfaces if they don't need or don't want to.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)