Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / VB

ArrayLists, Generic Lists, Is There A Way To Insert Items Faster?

4.83/5 (37 votes)
26 Nov 2014CPOL6 min read 2   590  
In this article, I try to highlight some issues in the .NET Framework generic list and how to circumvent them

Screenshot - snapshot2.jpg

  • Green: LinkedArray(of Integer)
  • Yellow: LinkedArray
  • Red: List(of Integer)
  • Blue: ArrayList

The bars show the duration of the tests (shorter is better).

Introduction

In the .NET Framework, we were first introduced to the ubiquitous ArrayList, then since .NET 2.0 we can enjoy the easy to use and type safe generic lists.
At the back of my mind, I always wondered what the cost of using generics was.

As my server was down yesterday and I had nothing better to do, I decided to investigate.

It surprised me to discover that generics are doing pretty well. They are surprisingly faster than the ArrayList. Unfortunately some operations, like insertions, are painfully slow for both generic lists and ArrayLists.

I then tried to find some alternatives to ArrayLists and propose a new LinkedArray collection.

Performance Measurement

My article is about Lists. By list, I mean a collection of items, which can be fetched by index and where you can add or delete items at the end or insert at any position.

In order to compare the overall performance of the lists, I decided to take different measures:

  • Append 100,000 items at the end
  • Remove last item 100,000 times
  • Insert 100,000 items at the beginning
  • Remove first object 100,000 times
  • Insert 100,000 items at a random position
  • Get 100,000 items by index
  • ForEach enumerator
  • Remove 100,000 items randomly

I wrote a Test function which can be used on any collection as long as it implements IList:

VB.NET
Sub Test(ByVal TestName As String, ByVal l As IList)
  Console.WriteLine("*** Test speed of " & TestName & " ****")
  Append(l)
  RemoveLast(l)
  InsertFirst(l)
  RemoveFirst(l)
  InsertRandom(l)
  FetchItems(l)
  RunEnumerator(l)
  RemoveRandom(l)
  ...
End Sub

This function is then called on different collections.

VB.NET
Test("ArrayList", New ArrayList())
Test("List(Of Integer)", New List(Of Integer))
Test("LinkedArray", New LinkedArray())
Test("LinkedArray(Of Integer)", New Generic.LinkedArray(Of Integer))

Each test is very simple. Let's see the Append test:

VB.NET
Sub Append(ByVal l As IList)
  l.Clear()
  Console.Write("Append {0} items at the end", NB_ITEMS.ToString("#,##0"))
  Dim t As DateTime = Now
  For cpt As Integer = 1 To NB_ITEMS
    l.Add(cpt)
  Next
  WriteResult(Now.Subtract(t))
End Sub

The write result displays the elapsed time during the test function itself.

The ArrayList Results

First, I run the test on an ArrayList:

 *** Test speed of ArrayList ****
Append 100,000 items at the end..............:   0.016 sec
Remove last item 100,000 times...............:   0.002 sec
Insert 100,000 items at the beginning........:  11.630 sec
Remove first object 100,000 times............:  11.919 sec
Insert 100,000 items at a random position....:   5.669 sec
Get 100,000 items............................:   0.002 sec
ForEach enumerator...........................:   0.002 sec
Remove items randomly........................:   5.924 sec
Total........................................:  35.188 sec

So the ArrayList is pretty good for adding and removing items at the end. It is also extremely efficient for fetching a given record. It is however pretty bad at inserting at a random position and worse at the beginning.

I digged a bit into the .NET Framework, to explain the numbers. I will spare you the boring details and try to just emphasize the bottlenecks.

ArrayList : Insert at the Beginning

Many of you already know why it behaves that badly. This is because the ArrayList is only a wrapper around an Array. The array uses a single contiguous block of memory to store all of its items.

Each time you insert an element at the beginning, the Framework has to shift all the items of the array using a command like this one:

VB.NET
Array.Copy(ItemsArray, 0, ItemsArray, 1, Me._size - 1)

This is done for every single insert and takes up a lot of processor time.

ArrayList : Insert at Random Position

Inserting in the middle is pretty bad too, the test completed in 5.669 seconds.

Here again, the culprit is the same function. Each time you insert an item in the middle, the Framework has to shift about half the items in the array using the same Array.Copy function.

VB.NET
Array.Copy(ItemsArray, index, ItemsArray, 1, Me._size - index)

Appending at the end is pretty good and the Framework does a very good job here.

You have to be aware however that there is still a substantial issue here.

ArrayList : Memory Usage

As I said, the ArrayList uses an Array internally.

In .NET, the arrays are fixed size, you cannot resize them. If you want an array to become larger, you have to allocate a new array and copy the original values in the new array, using the ubiquitous Array.Copy.

In order to avoid having to copy the complete Array each time you add an item, the .NET Framework allocates a larger array than necessary.

The default size is 4. The size doubles each time the limit is reached.
It means that if you store 32769 items in an arraylist, it will in fact use the space for 65536 items. This is not very good especially when you start to hit big arrays containing a few millions items.

The List(of Integer) Results

I must admit I did not expect the generic List(of Integer) to behave faster than the ArrayList.

*** Test speed of List(Of Integer) **** 
Append 100,000 items at the end..............:   0.018 sec 
Remove last item 100,000 times...............:   0.003 sec 
Insert 100,000 items at the beginning........:   4.106 sec 
Remove first object 100,000 times............:   4.008 sec 
Insert 100,000 items at a random position....:   2.067 sec
Get 100,000 items............................:   0.003 sec 
ForEach enumerator...........................:   0.005 sec 
Remove items randomly........................:   2.081 sec 
Total........................................:  12.303 sec 

I was wrong.

The List(of Integer) is more than twice as fast as ArrayList.
The code of the List(of Integer) is virtually the same as the ArrayList, and it took me some time to understand why it is nearly 3 times faster.

Why is the Generic List Faster Than the ArrayList?

There are two reasons:

  • The TypeSafe array stores the items in itself rather than pointers to boxed objects
  • In a 64 bit OS (like mine), a pointer happens to be 64 bits rather than the 32 bits integer

I won't really expand on this, the things I will remember are:

  • Generics are typesafe and fast, there is no need for ArrayList
  • Don't use ArrayList on structures or value types (Integers, Float...)

Apart from that, the Insertion of 100,000 items still takes 4 seconds, this is not glorious.

How Could We Make This Faster

It really depends on what you need; you will hardly find a single Collection which will be faster than all others on all the test.

If you use your list to mainly add at the end and access by Index, the List(of Integer) is probably a good choice.

On the contrary, if you require random insertion and can avoid accessing items by index then perhaps a LinkedList(Of Integer) is the way to go.

However, if you need...

  • Quick insertion at any position
  • Being able to access item by index
  • Large lists

... then you might be interested in the little class I wrote for this article.

The LinkedArray Class

The concept is relatively simple.

I noticed that the lists start to be slow when they get a bit too long.
So why not have a list of lists.

This way when one becomes too long, I can just split it in two.

Screenshot - Graph.png

The class is not rocket science, internally it uses a simple list(of List(of Integer)). The results are dramatically improved.

On an ArrayList replacement, we get these results:

*** Test speed of LinkedArray ****
Append 100,000 items at the end..............:   0.014 sec 
Remove last item 100,000 times...............:   0.013 sec 
Insert 100,000 items at the beginning........:   0.181 sec
Remove first object 100,000 times............:   0.077 sec 
Insert 100,000 items at a random position....:   0.165 sec
Get 100,000 items............................:   0.042 sec 
ForEach enumerator...........................:   0.003 sec 
Remove items randomly........................:   0.106 sec
Total........................................:   0.638 sec 

On a List(of Integer) replacement, we get these results:

*** Test speed of LinkedArray(Of Integer) ****
Append 100,000 items at the end..............:   0.011 sec
Remove last item 100,000 times...............:   0.019 sec
Insert 100,000 items at the beginning........:   0.155 sec
Remove first object 100,000 times............:   0.064 sec
Insert 100,000 items at a random position....:   0.129 sec
Get 100,000 items............................:   0.043 sec
ForEach enumerator...........................:   0.006 sec
Remove items randomly........................:   0.109 sec
Total........................................:   0.559 sec

The speed of insertion and random deletion is nearly hundred times faster than an arraylist.

The main drawback of this class is "Get Items", we are 10 to 15 times slower than a normal list. This is due to the fact that getting an item by index requires more computation than on a straight array.

It is interesting to see, however, that ForEach enumerator is as quick as the native list.

The sample shows milliseconds, however as said in the forum below, this is only a measured estimation. In effect, I could probably round a 1/10th of a second, but I did not want to show entries with 0.0 sec.

What you can read however in those results is that using the LinkedArray:

  • Inserting is much faster
  • Getting items by index is much slower but still very fast

If you change the number of items, the numbers change completely but this simple fact is:

*** Test speed of List(Of Integer) ****
Append 200,000 items at the end..............:   < 0.1 sec
Remove last item 200,000 times...............:   < 0.1 sec
Insert 200,000 items at the beginning........:    16.7 sec
Remove first object 200,000 times............:    17.3 sec
Insert 200,000 items at a random position....:     8.5 sec
Get 200,000 items............................:   < 0.1 sec
ForEach enumerator...........................:   < 0.1 sec
Remove items randomly........................:     8.7 sec
Total........................................:    51.3 sec
*** Test speed of LinkedArray(Of Integer) ****
Append 200,000 items at the end..............:   < 0.1 sec
Remove last item 200,000 times...............:   < 0.1 sec
Insert 200,000 items at the beginning........:     0.3 sec
Remove first object 200,000 times............:     0.1 sec
Insert 200,000 items at a random position....:     0.4 sec
Get 200,000 items............................:     0.1 sec
ForEach enumerator...........................:   < 0.1 sec
Remove items randomly........................:     0.3 sec
Total........................................:     1.3 sec

Using the Code

In the demo program, I include a class called LinkedArray.

The code has not been extensively tested. This article is more about having an open discussion on .NET collections.

However feel free to test it on your own project and let us know if you get improved performances.

Points of Interest

I will expand on the splitting / merging operation here (probably by the end of the week).

History

  • 11th December, 2007: First post
  • 12th December, 2007: Few amendments to explain the numbers

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)