Introduction
This article presents an interface (IArray
) for working with immutable arrays that guarantees O(1)
performance when accessing
elements by index or retrieving the length of the array. Many of the same extension methods available for IEnumerable
are available
for IArray
allowing LINQ query syntax to be used. The performance of IArray
is significantly better than using IEnumerable
extension methods and converting the results to a List
.
Background
In some problem domains such as 3D graphics array processing is very common. Traditional imperative programming techniques
for working with arrays can be error prone and verbose. LINQ requires significantly less code
to be written and can be less error prone. Unfortunately LINQ is not optimized for arrays data types. When constant time random element access is required on the
result of an IEnumerable
generated from a LINQ query the standard approach is to convert the result to a list or array.
This article proposes an a new interface (IArray
) and a set of extension methods that provides LINQ style methods and syntax like IEnumerable
but that provides the performance characteristics of an array.
LINQ
LINQ (Language Integrated Natural Query) is a set of technologies that allows C# code to contain SQL style queries in code. LINQ is designed
specifically for easy and efficient querying of data sets.
There are two syntactic forms for LINQ: query syntax and method syntax. C# automatically transforms code in query syntax into method syntax:
Query Syntax:
from x in xs where x % 2 == 0 select x * 2
Method Syntax:
xs.Where(x => x % 2 == 0).Select(x => x * 2)
This transformation happens to any data source regardless of whether it implements IEnumerable
or not. In other words, if xs
supports a method Where()
and returns a that supports the Select
method, then the query syntax can be used. This is fact is leveraged in the IArray
implementation, allowing it to be used with LINQ query syntax expressions.
IEnumerable
The IEnumerable
interface represents a sequence of items. IEnumerable
provides many operations for creating new sequences
with O(1)
complexity, such as Where()
, Select()
, Take()
, Skip()
, and so on.
The IEnumerable
interface has operations for element access (ElementAt()
) and querying the number of items in the sequence (Count()
)
but except in very specific cases they have O(N)
complexity.
In cases where fast random element access is desired IEnumerable
supports casting to Lists (ToList()
) and Arrays (ToArray()
).
Introducing IArray
The IArray
interface represents an immutable array and is intended for contexts where access of individual elements in constant time is important.
IArray
has two public read-only properties, the indexer property and the count property, and a CopyTo()
method.
public interface IArray<T>
{
int Count { get; }
T this[int n] { get; }
T[] CopyTo(int srcIndex, T[] dest, int destIndex, int len);
}
The requirement of the implementing implementation is that both properties have O(1)
performance, and that the collection never changes during its lifetime.
Constructing IArray Instances
The static class ImmutableArray
provides the following static methods for creating an array.
- IArray<T> Create<T>(IList<T> xs)
- IArray<T> Create<T>(params T[] args)
- IArray<T> Create<T>(IEnumerable<T> xs)
- IArray<T> Nil<T>()
- IArray<T> Unit<T>(T x)
- IArray<int> Range<T>(int count)
- IArray<int> Range<T>(int from, int count)
- IArray<T> Repeat<T>(T x, int n)
- IArray<T> Repeat<T>(Func<T> f, int n)
- IArray<T> Generate<T>(T first, Func<T, bool> invariant, Func<T, T> next)
The following extension methods have been added for lists, arrays, and IEnumerable
instances to create immutable arrays.
- IArray<T> ToIArray<T>(this T[] xs)
- IArray<T> ToIArray<T>(this List<T> xs)
- IArray<T> ToIArray<T>(this IEnumerable<T> xs)
When creating an immutable array from an array or list, a copy is created of the passed collection.
Introducing ISortedArray
A special interface is returned from IArray.OrderBy()
called ISortedArray
.
This interface has optimized methods for finding the index of an element in the array (IndexOf()
and
checking whether an item is in the array (Contains()
). These methods use a binary search algorithm,
and have O(LogN)
complexity. The ISortedArray
interface also has O(1)
implementation of IsOrdered()
and OrderBy()
methods.
Comparing Performance of IArray and IEnumerable
The following table contains the complexity analysis for common sequence operations on IArray
and IEnumerable
:
Method | IEnumerable | IArray | ISortedArray |
ElementAt | O(N) | O(1) | O(1) |
Count | O(N) | O(1) | O(1) |
Select / Where | O(1) | O(N) | O(N) |
Aggregate | O(N) | O(N) | O(N) |
Take / Skip | O(1) | O(N) | O(N) |
First | O(1) | O(1) | O(1) |
Last | O(N) | O(1) | O(1) |
All / Any | O(N) | O(N) | O(N) |
ToArray / ToList | O(N) | O(N) | O(N) |
Min / Max / Sum / Average | O(N) | O(N) | O(1) |
OrderBy | O(1) | O(NLogN) | O(NLogN) |
IndexOf | O(N) | O(LogN) | O(LogN) |
Contains | O(N) | O(LogN) | O(LogN) |
When O(1) is effectively O(N)
Certain methods that return a sequence have a better complexity analysis for IEnumerable
than IArray
. Some examples are
Select()
, Where()
, Take()
, and Skip()
.
If a programmer requires fast random element access afterwards, then the resulting IEnumerable
has to be converted to an array or list using ToArray()
or ToList()
. Since these two operations are O(N)
complexity this cost dominates any benchmark.
Benchmarking Methodology
In order to fairly compare the performance of IEnumerable
and IArray
I profile a number of test cases that include the cost of executing the
both executing the operation and converting the result to a List.
The following time comparison are made by running a test repeatedly for different sizes of input sequences.
Each iteration of the test includes a comparison of the results (not included in timings) to assure that the two versions are computing the same
result. An entire benchmark for a given sequence is limited to a maximum number of mseconds.
Higher numbers represent more execution time spent executing the test. Below is the section of code responsible for computing the time.
while ((DateTime.Now - start).TotalMilliseconds < MinTimeTrial)
{
var inputA = gen(count);
var inputB = inputA.ToIArray();
var outputA = TimeFunction(() => f(inputA));
var outputB = TimeFunction(() => g(inputB));
tr.Result &= cmp(outputA.Item2, outputB.Item2);
if (!tr.Result)
throw new Exception("test failed");
tr.TimeA += outputA.Item1;
tr.TimeB += outputB.Item1;
}
Benchmark Results
Middle
The middle test returns the item halfway between the first and last items. The complexity
is O(1)
for IArray
compared to O(N) for IEnumerable
.
IEnumerable xs; xs.ElementAt(xs.Count() / 2);
IArray xs; xs[xs.Count / 2];
Size | IEnumerable | IArray |
10 | 112.6231 | 11.2247 |
100 | 209.174 | 3.2049 |
1000 | 248.1513 | 0.424 |
10000 | 251.5417 | 0.0772 |
100000 | 241.6906 | 0.0289 |
Select
The select test compares the Select() method of IArray with the same
method on IEnumerable followed by a conversion to list using ToList.
In general IArray outperforms by about 5x.
IEnumerable xs; xs.Select(x => x * 10).ToList();
IArray xs; xs.Select(x => x * 10);
Size | IEnumerable | IArray |
10 | 130.4198 | 22.8207 |
100 | 203.7963 | 33.9568 |
1000 | 219.7549 | 41.9019 |
10000 | 222.4864 | 42.0895 |
100000 | 215.9494 | 42.8385 |
Where
The where test is similar to the Select test. The gains in performance are in the vicinity of 2.5x.
IEnumerable xs; xs.Where(x => x & 10 == 0).ToList();
IArray xs; xs.Where(x => x & 10 == 0);
Size | IEnumerable | IArray |
10 | 87.7143 | 30.7792 |
100 | 169.0739 | 71.1118 |
1000 | 196.1686 | 83.7951 |
10000 | 197.5586 | 83.4241 |
100000 | 195.6574 | 82.8913 |
OrderBy
The
IArray
implementation of
OrderBy
returns
an
ISortedArray
interface.
IEnumerable xs; xs.OrderBy(x => x).ToList();
IArray xs; xs.Where(x => x & 10 == 0);
Size | IEnumerable | IArray |
10 | 222.2376 | 30.1089 |
100 | 335.6944 | 38.6848 |
1000 | 355.91 | 77.7343 |
10000 | 356.3396 | 93.0001 |
100000 | 400.4608 | 97.2454 |
More Benchmarks
In the attached project there are a large number of benchmark tests that demonstrate performance benefits of using IArray
in a large number of scenarios. All of the source code for IArray
is also included.
Note: don't forget to compile in release mode, and to run without the debugger
attached.
Summary
When a programmer wants to use IEnumerable
but needs to convert
the result to an array or a list in order to have fast indexing then IArray
can
be a good high-performance alternative.
Please let me know if you find any faults with the code or the article, or if you find the code useful, thanks!