- Green:
LinkedArray(of Integer)
- Yellow:
LinkedArray
- Red:
List(of Integer)
- Blue:
ArrayList
The bars show the duration of the tests (shorter is better).
Introduction
In the .NET Framework, we were first introduced to the ubiquitous ArrayList
, then since .NET 2.0 we can enjoy the easy to use and type safe generic lists.
At the back of my mind, I always wondered what the cost of using generics was.
As my server was down yesterday and I had nothing better to do, I decided to investigate.
It surprised me to discover that generics are doing pretty well. They are surprisingly faster than the ArrayList
. Unfortunately some operations, like insertions, are painfully slow for both generic lists and ArrayList
s.
I then tried to find some alternatives to ArrayList
s and propose a new LinkedArray
collection.
Performance Measurement
My article is about Lists. By list, I mean a collection of items, which can be fetched by index and where you can add or delete items at the end or insert at any position.
In order to compare the overall performance of the lists, I decided to take different measures:
- Append 100,000 items at the end
- Remove last item 100,000 times
- Insert 100,000 items at the beginning
- Remove first object 100,000 times
- Insert 100,000 items at a random position
- Get 100,000 items by index
ForEach
enumerator - Remove 100,000 items randomly
I wrote a Test
function which can be used on any collection as long as it implements IList
:
Sub Test(ByVal TestName As String, ByVal l As IList)
Console.WriteLine("*** Test speed of " & TestName & " ****")
Append(l)
RemoveLast(l)
InsertFirst(l)
RemoveFirst(l)
InsertRandom(l)
FetchItems(l)
RunEnumerator(l)
RemoveRandom(l)
...
End Sub
This function is then called on different collections.
Test("ArrayList", New ArrayList())
Test("List(Of Integer)", New List(Of Integer))
Test("LinkedArray", New LinkedArray())
Test("LinkedArray(Of Integer)", New Generic.LinkedArray(Of Integer))
Each test is very simple. Let's see the Append
test:
Sub Append(ByVal l As IList)
l.Clear()
Console.Write("Append {0} items at the end", NB_ITEMS.ToString("#,##0"))
Dim t As DateTime = Now
For cpt As Integer = 1 To NB_ITEMS
l.Add(cpt)
Next
WriteResult(Now.Subtract(t))
End Sub
The write result displays the elapsed time during the test
function itself.
The ArrayList Results
First, I run the test on an ArrayList
:
*** Test speed of ArrayList ****
Append 100,000 items at the end..............: 0.016 sec
Remove last item 100,000 times...............: 0.002 sec
Insert 100,000 items at the beginning........: 11.630 sec
Remove first object 100,000 times............: 11.919 sec
Insert 100,000 items at a random position....: 5.669 sec
Get 100,000 items............................: 0.002 sec
ForEach enumerator...........................: 0.002 sec
Remove items randomly........................: 5.924 sec
Total........................................: 35.188 sec
So the ArrayList
is pretty good for adding and removing items at the end. It is also extremely efficient for fetching a given record. It is however pretty bad at inserting at a random position and worse at the beginning.
I digged a bit into the .NET Framework, to explain the numbers. I will spare you the boring details and try to just emphasize the bottlenecks.
ArrayList : Insert at the Beginning
Many of you already know why it behaves that badly. This is because the ArrayList
is only a wrapper around an Array
. The array uses a single contiguous block of memory to store all of its items.
Each time you insert an element at the beginning, the Framework has to shift all the items of the array using a command like this one:
Array.Copy(ItemsArray, 0, ItemsArray, 1, Me._size - 1)
This is done for every single insert and takes up a lot of processor time.
ArrayList : Insert at Random Position
Inserting in the middle is pretty bad too, the test completed in 5.669 seconds.
Here again, the culprit is the same function. Each time you insert an item in the middle, the Framework has to shift about half the items in the array
using the same Array.Copy
function.
Array.Copy(ItemsArray, index, ItemsArray, 1, Me._size - index)
Appending at the end is pretty good and the Framework does a very good job here.
You have to be aware however that there is still a substantial issue here.
ArrayList : Memory Usage
As I said, the ArrayList
uses an Array
internally.
In .NET, the arrays are fixed size, you cannot resize them. If you want an array to become larger, you have to allocate a new array and copy the original values in the new array, using the ubiquitous Array.Copy
.
In order to avoid having to copy the complete Array
each time you add an item, the .NET Framework allocates a larger array than necessary.
The default size is 4. The size doubles each time the limit is reached.
It means that if you store 32769 items in an arraylist
, it will in fact use the space for 65536 items. This is not very good especially when you start to hit big arrays containing a few millions items.
The List(of Integer) Results
I must admit I did not expect the generic List(of Integer
) to behave faster than the ArrayList
.
*** Test speed of List(Of Integer) ****
Append 100,000 items at the end..............: 0.018 sec
Remove last item 100,000 times...............: 0.003 sec
Insert 100,000 items at the beginning........: 4.106 sec
Remove first object 100,000 times............: 4.008 sec
Insert 100,000 items at a random position....: 2.067 sec
Get 100,000 items............................: 0.003 sec
ForEach enumerator...........................: 0.005 sec
Remove items randomly........................: 2.081 sec
Total........................................: 12.303 sec
I was wrong.
The List(of Integer)
is more than twice as fast as ArrayList
.
The code of the List(of Integer)
is virtually the same as the ArrayList
, and it took me some time to understand why it is nearly 3 times faster.
Why is the Generic List Faster Than the ArrayList?
There are two reasons:
- The
TypeSafe
array stores the items in itself rather than pointers to boxed objects - In a 64 bit OS (like mine), a pointer happens to be 64 bits rather than the 32 bits integer
I won't really expand on this, the things I will remember are:
- Generics are typesafe and fast, there is no need for
ArrayList
- Don't use
ArrayList
on structures or value types (Integers
, Float
...)
Apart from that, the Insertion of 100,000 items still takes 4 seconds, this is not glorious.
How Could We Make This Faster
It really depends on what you need; you will hardly find a single Collection which will be faster than all others on all the test.
If you use your list to mainly add at the end and access by Index, the List(of Integer)
is probably a good choice.
On the contrary, if you require random insertion and can avoid accessing items by index then perhaps a LinkedList(Of Integer)
is the way to go.
However, if you need...
- Quick insertion at any position
- Being able to access item by index
- Large lists
... then you might be interested in the little class I wrote for this article.
The LinkedArray Class
The concept is relatively simple.
I noticed that the lists start to be slow when they get a bit too long.
So why not have a list of lists.
This way when one becomes too long, I can just split it in two.
The class is not rocket science, internally it uses a simple list(of List(of Integer))
. The results are dramatically improved.
On an ArrayList
replacement, we get these results:
*** Test speed of LinkedArray ****
Append 100,000 items at the end..............: 0.014 sec
Remove last item 100,000 times...............: 0.013 sec
Insert 100,000 items at the beginning........: 0.181 sec
Remove first object 100,000 times............: 0.077 sec
Insert 100,000 items at a random position....: 0.165 sec
Get 100,000 items............................: 0.042 sec
ForEach enumerator...........................: 0.003 sec
Remove items randomly........................: 0.106 sec
Total........................................: 0.638 sec
On a List(of Integer)
replacement, we get these results:
*** Test speed of LinkedArray(Of Integer) ****
Append 100,000 items at the end..............: 0.011 sec
Remove last item 100,000 times...............: 0.019 sec
Insert 100,000 items at the beginning........: 0.155 sec
Remove first object 100,000 times............: 0.064 sec
Insert 100,000 items at a random position....: 0.129 sec
Get 100,000 items............................: 0.043 sec
ForEach enumerator...........................: 0.006 sec
Remove items randomly........................: 0.109 sec
Total........................................: 0.559 sec
The speed of insertion and random deletion is nearly hundred times faster than an arraylist
.
The main drawback of this class is "Get Items", we are 10 to 15 times slower than a normal list. This is due to the fact that getting an item by index requires more computation than on a straight array.
It is interesting to see, however, that ForEach
enumerator is as quick as the native list.
The sample shows milliseconds, however as said in the forum below, this is only a measured estimation. In effect, I could probably round a 1/10th of a second, but I did not want to show entries with 0.0 sec.
What you can read however in those results is that using the LinkedArray
:
- Inserting is much faster
- Getting items by index is much slower but still very fast
If you change the number of items, the numbers change completely but this simple fact is:
*** Test speed of List(Of Integer) ****
Append 200,000 items at the end..............: < 0.1 sec
Remove last item 200,000 times...............: < 0.1 sec
Insert 200,000 items at the beginning........: 16.7 sec
Remove first object 200,000 times............: 17.3 sec
Insert 200,000 items at a random position....: 8.5 sec
Get 200,000 items............................: < 0.1 sec
ForEach enumerator...........................: < 0.1 sec
Remove items randomly........................: 8.7 sec
Total........................................: 51.3 sec
*** Test speed of LinkedArray(Of Integer) ****
Append 200,000 items at the end..............: < 0.1 sec
Remove last item 200,000 times...............: < 0.1 sec
Insert 200,000 items at the beginning........: 0.3 sec
Remove first object 200,000 times............: 0.1 sec
Insert 200,000 items at a random position....: 0.4 sec
Get 200,000 items............................: 0.1 sec
ForEach enumerator...........................: < 0.1 sec
Remove items randomly........................: 0.3 sec
Total........................................: 1.3 sec
Using the Code
In the demo program, I include a class called LinkedArray
.
The code has not been extensively tested. This article is more about having an open discussion on .NET collections.
However feel free to test it on your own project and let us know if you get improved performances.
Points of Interest
I will expand on the splitting / merging operation here (probably by the end of the week).
History
- 11th December, 2007: First post
- 12th December, 2007: Few amendments to explain the numbers