Introduction
What is a List<T>? Well, according to MSDN[^]:
List<T> Class
Represents a strongly typed list of objects that can be accessed by index. Provides methods to search, sort, and manipulate lists.
So, a List<T> is a list of items that you can treat as an array. Excellent. We’re done here.
Well…no. Not really. MSDN is not exactly telling the whole truth here.
Let's look a little more closely
So, a List<T> is a List of elements of type T that you can treat as an Array. Simple – we’re done.
Hold your horses there! Are you sure?
Microsoft released the Reference Sources for .NET and they can make interesting reading. One in particular: http://referencesource.microsoft.com/#mscorlib/system/collections/generic/list.cs,cf7f4095e4de7646[^]
That’s the source code for the List<T> class. And let’s have a quick look at the fields:
public class List<t> : IList<t>, System.Collections.IList, IReadOnlyList<t>
{
private const int _defaultCapacity = 4;
private T[] _items;
[ContractPublicPropertyName("Count")]
private int _size;</t></t></t>
Hang on a moment…
private T[] _items;
Does that mean what I think it does?
static readonly T[] _emptyArray = new T[0];
#if !FEATURE_CORECLR
[TargetedPatchingOptOut("Performance critical to inline across NGen image boundaries")]
#endif
public List() {
_items = _emptyArray;
}
Yes. A List<T> is an Array that you can treat as a List, not the other way round.
That's a big difference: it implies some operations do not work as the class name might imply:
- A List is an Array, so when you try to add an element and there is no room, and bigger array is allocated and all existing elements are copied into it (If you are familiar with a StringBuilder, that is exactly what it does as well).
- When you insert an element, the data for all elements with a higher index has to be copied into a new location.
- When you delete an element, the data for all elements with a higher index has to be copied to a new location.
- If you add an element and the internal array is full, a new, larger internal array is allocated and all existing data is copied to the new list before the new element can be added at the end.
And that's important, because it defines what the various operations you can do on a List<T> can cost in terms of time.
The first thing to note is that as an array, a List<T> is very efficient to access individual elements via the index.
i = myList[1234567];
is a trivial operation: get the start of the array in memory from the myList reference, multiply the index value by the number of bytes in the type T, add it to the start and fetch the element value. In performance terms, it's (almost) as fast as a "straight" array - not quite, as there is some additional checking code added which makes it slightly slower.
But the other operations which make a List<T> more useful than an Array all come with a cost - and that can be quite a large cost.
Adding items
When you us the Add method to add an item to a List<T>, it checks to see if there is room in the existing array, and if not it:
- Creates a new array, twice the size of the old one.
- Copies each entry from the old array to the new one,
- Sets the "last" entry in the new array to the new value, and moves the "last" indicator on by one.
Think about what this means, just in terms of memory allocation and usage:
If you want to add 2 million elements to a List:
- Adding the first creates an Array containing 4 elements.
- Adding the fifth creates an Array containing 8 elements, and copies the existing 4
- Adding the ninth creates an Array containing 16 elements, and copies the existing 8
- Adding one more creates an Array containing 32 elements, and copies the existing 16
- Adding one more creates an Array containing 64 elements, and copies the existing 32
- Adding one more creates an Array containing 128 elements, and copies the existing 64
- Adding one more creates an Array containing 256 elements, and copies the existing 128
- Adding one more creates an Array containing 512 elements, and copies the existing 256
- Adding one more creates an Array containing 1024 elements, and copies the existing 512
- Adding one more creates an Array containing 2048 elements, and copies the existing 1024
- Adding one more creates an Array containing 4096 elements, and copies the existing 2048
- Adding one more creates an Array containing 8192 elements, and copies the existing 4096
- Adding one more creates an Array containing 16384 elements, and copies the existing 8192
- Adding one more creates an Array containing 32768 elements, and copies the existing 16384
- Adding one more creates an Array containing 65536 elements, and copies the existing 32768
- Adding one more creates an Array containing 131072 elements, and copies the existing 65536
- Adding one more creates an Array containing 262144 elements, and copies the existing 131072
- Adding one more creates an Array containing 524288 elements, and copies the existing 262144
- Adding one more creates an Array containing 1048576 elements, and copies the existing 524288
- Adding one more creates an Array containing 2097152 elements, and copies the existing 1048576
So to get sufficient space to hold your data, .NET will do 20 separate memory allocations (using both the small and large object heaps), and allocate and copy 2,097,148 elements that you will never use.
OK, that’s not a lot of memory these days – if the elements are reference types it’s 8 bytes per element for a 64bit system - but that means nearly twice the memory you need will be allocated unnecessarily, and the copy operations will not be trivial in time terms. It may even kick in the Garbage Collector a couple of times if your element count / size combination gets big enough.
Inserting items
When you use the Insert method, it's worse. A lot worse.
Start by doing the Add checks, and allocate and copy if there isn't enough room.
Then, move every element of the array with an index greater than or equal to the value of the insert point index one place to the right.
So if you insert a single element at the begining of a List<T> you potentially will copy the entire array twice in effect: once to allocate enough space, and then again to make room at the front.
Deleting items
Delete is better than Insert, becasue it doesn't need the check-the-size-and-copy operations - but it still requires the processor to move every single element of the array with an index greater than or equal to the value of the delete point index one place to the left.
Cost of moving memory
Moving and copying memory is slow: there are hardware instructions in most processors to speed the operation up (mostly by removing the setup costs at the start of each instruction) but it still requires the processor to fetch the data from memory (or it's cache(s)) and write the value to a new location (which will normally be write-through on the cache rather than "held" internally).
If you are copying or moving a lot of information, it is quite possible that you will exceed the processor caches, and that means that the data caches will end up containing nothing except your data! Bulk data copies are slow, and should be avoided at all costs - they can slow your whole system!
For example, you might be trying to add elements of an existing array in reverse order, so instead of using Add, you use Insert with a zero index:
int elements = 50000;
int[] data = Enumerable.Range(1, elements).ToArray();
long addSum = 0;
long insertSum = 0;
for (int i = 0; i < 20; i++)
{
Stopwatch s1 = new Stopwatch();
s1.Start();
List<int> lAdd = new List<int>();
for (int index = 0; index < elements; index++)
{
lAdd.Add(data[index]);
}
s1.Stop();
Stopwatch s2 = new Stopwatch();
s2.Start();
List<int> lInsert = new List<int>();
for (int index = 0; index < elements; index++)
{
lInsert.Insert(0,data[index]);
}
s2.Stop();
addSum += s1.ElapsedMilliseconds;
insertSum += s2.ElapsedMilliseconds;
Console.WriteLine("{0}:{1}", s1.ElapsedMilliseconds, s2.ElapsedMilliseconds);
}
Console.WriteLine("-----");
Console.WriteLine("Add: Total = {0}\n Average = {1}", addSum, addSum / 20);
Console.WriteLine("Insert: Total = {0}\n Average = {1}", insertSum, insertSum / 20);
And the results we get:
1:694
0:632
0:649
0:636
0:635
0:632
0:633
0:623
0:627
0:635
0:631
0:632
0:708
0:654
0:642
0:629
0:639
0:636
0:625
0:629
-----
Add: Total = 1
Average = 0
Insert: Total = 12821
Average = 641
Speak for themselves!
Over 600 times slower to do inserts than to add items. And bear in mind, these are only 50,000 integers we are moving here! (I initially tried with 500,000 to get a "reasonable" figure for the Add operations - but the Inserts crashed my debugger :omg: )
Summary
I’m not saying that you shouldn't use List<T> - no way, it’s one of the most useful classes in .NET – but you should be aware that using it can have a serious performance impact if you do the wrong thing, or you don’t think carefully about exactly how you are using it.
- If you know (even roughly) how many elements you will be adding to your list in advance, use the overloaded constructor to allocate a suitable sized array to start with: That will save both memory allocations and time in copying elements form one array to another.
- If you want a one dimensional array, you are probably better off using a List<T> instead. You will use the same amount of memory, access time will be about the same, and you will gain flexibility in your code.
- Avoid inserting elements near the front of a list: each time you do, you will move all subsequent elements in memory. Where ever possible, use Add instead to avoid this.
- Use InsertRange and RemoveRange whenever possible instead of individual Insert and Remove operations.
- When deleting elements, think very carefully. If you can delete from the tail forward instead of head back, do – it saves moving elements you are going to delete anyway.
- If you are going to insert or delete more than a couple of elements that do not have sequential indexes, think carefully: it may be more efficient to construct a new List<T> and do the copying yourself. But this is a difficult call: Array.Copy (used internally by the insert and Remove methods) is a lot faster than individual element copying as it can use “bulk move” codes. It may be worth timing your code to work this out - but it''s going to depend on too many variables to provide a generalized "this is faster" recommendation.
- Always use List<T>.Clear or create a new List<T> instead of deleting the elements of an existing List<T> one-by-one.
- You should also be aware that it means that there is a maximum size for a List<T> - because it is an Array internally, that means it is subject to the same size restrictions. And that means that a List<T> can never be bigger than 2GB - the largest any single object in .NET can be. So for a 64 bit system, a List<AnyReferenceType> can only hold 268,435,456 elements (twice that for 32 bit applications). The number of elements you can get in a List<AnyValueType> will depend on the size of the value type and any "packing" .NET does or doesn't do.
History
2019-03-18 Typos.
2015-01-27 Original Version.
2014-03-05 V1.0
The first version got some strange comments from a very small number of readers: 13 out of 12,009 views were negative - and it seemed that having explanations of what an Array and a List were at the top meant that some people just didn't read beyond that, and immediately voted this a one. That's not a problem in itself - everyone is allowed an opinion - but it implies that they didn't get to the "meat" of the article and missed why the use of an Array for something called a List can cause problems if you don't know what is going on behind the scenes.
So, the initial intro was ripped out, and replaced with a more detailed explanation of what actually happens when you use a List<T> - and particularly when you misuse a List<T>