Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

LINQ for Immutable Arrays

0.00/5 (No votes)
28 Dec 2012 1  
A library of extension methods similar to IEnumerable for working with immutable arrays.

Introduction 

This article presents an interface (IArray) for working with immutable arrays that guarantees O(1) performance when accessing elements by index or retrieving the length of the array. Many of the same extension methods available for IEnumerable are available for IArray allowing LINQ query syntax to be used. The performance of IArray is significantly better than using IEnumerable extension methods and converting the results to a List.

Background  

In some problem domains such as 3D graphics array processing is very common. Traditional imperative programming techniques for working with arrays can be error prone and verbose. LINQ requires significantly less code to be written and can be less error prone. Unfortunately LINQ is not optimized for arrays data types. When constant time random element access is required on the result of an IEnumerable generated from a LINQ query the standard approach is to convert the result to a list or array.

This article proposes an a new interface (IArray) and a set of extension methods that provides LINQ style methods and syntax like IEnumerable but that provides the performance characteristics of an array.  

LINQ 

LINQ (Language Integrated Natural Query) is a set of technologies that allows C# code to contain SQL style queries in code. LINQ is designed specifically for easy and efficient querying of data sets.   

There are two syntactic forms for LINQ: query syntax and method syntax. C# automatically transforms code in query syntax into method syntax:

Query Syntax: 

    from x in xs where x % 2 == 0 select x * 2 

Method Syntax:   

    xs.Where(x => x % 2 == 0).Select(x => x * 2) 

This transformation happens to any data source regardless of whether it implements IEnumerable or not. In other words, if xs supports a method Where() and returns a that supports the Select method, then the query syntax can be used. This is fact is leveraged in the IArray implementation, allowing it to be used with LINQ query syntax expressions.  

IEnumerable  

The IEnumerable interface represents a sequence of items. IEnumerable provides many operations for creating new sequences with O(1) complexity, such as Where(), Select(), Take(), Skip(), and so on.  

The IEnumerable interface has operations for element access (ElementAt()) and querying the number of items in the sequence (Count()) but except in very specific cases they have O(N) complexity. In cases where fast random element access is desired IEnumerable supports casting to Lists (ToList()) and Arrays (ToArray()).

Introducing IArray

The IArray interface represents an immutable array and is intended for contexts where access of individual elements in constant time is important. IArray has two public read-only properties, the indexer property and the count property, and a CopyTo() method.

public interface IArray<T>
{
    int Count { get; }
    T this[int n] { get; }
    T[] CopyTo(int srcIndex, T[] dest, int destIndex, int len);
}

The requirement of the implementing implementation is that both properties have O(1) performance, and that the collection never changes during its lifetime.

Constructing IArray Instances

The static class ImmutableArray provides the following static methods for creating an array.

  • IArray<T> Create<T>(IList<T> xs)
  • IArray<T> Create<T>(params T[] args)
  • IArray<T> Create<T>(IEnumerable<T> xs)
  • IArray<T> Nil<T>()
  • IArray<T> Unit<T>(T x)
  • IArray<int> Range<T>(int count)
  • IArray<int> Range<T>(int from, int count)
  • IArray<T> Repeat<T>(T x, int n)
  • IArray<T> Repeat<T>(Func<T> f, int n)
  • IArray<T> Generate<T>(T first, Func<T, bool> invariant, Func<T, T> next)

The following extension methods have been added for lists, arrays, and IEnumerable instances to create immutable arrays. 

  • IArray<T> ToIArray<T>(this T[] xs) 
  • IArray<T> ToIArray<T>(this List<T> xs)
  • IArray<T> ToIArray<T>(this IEnumerable<T> xs) 

When creating an immutable array from an array or list, a copy is created of the passed collection.

Introducing ISortedArray 

A special interface is returned from IArray.OrderBy() called ISortedArray. This interface has optimized methods for finding the index of an element in the array (IndexOf()and checking whether an item is in the array (Contains()). These methods use a binary search algorithm, and have O(LogN) complexity. The ISortedArray interface also has O(1) implementation of IsOrdered() and OrderBy() methods. 

Comparing Performance of IArray and IEnumerable 

The following table contains the complexity analysis for common sequence operations on IArray and IEnumerable

Method  IEnumerableIArray ISortedArray 
ElementAt   O(N)O(1)O(1)
Count O(N)O(1)O(1)
Select / Where O(1)O(N)O(N)
Aggregate  O(N)O(N)O(N)
Take / Skip O(1)O(N)O(N)
First  O(1)O(1)O(1)
Last O(N)O(1)O(1)
All / Any O(N)O(N)O(N)
ToArray / ToListO(N)O(N)O(N)
Min / Max / Sum / AverageO(N)O(N)O(1)
OrderBy O(1)O(NLogN)O(NLogN)
IndexOf O(N)O(LogN)O(LogN)
Contains O(N) O(LogN) O(LogN) 

When O(1) is effectively O(N)

Certain methods that return a sequence have a better complexity analysis for IEnumerable than IArray. Some examples are Select(), Where(), Take(), and Skip(). If a programmer requires fast random element access afterwards, then the resulting IEnumerable has to be converted to an array or list using ToArray() or ToList(). Since these two operations are O(N) complexity this cost dominates any benchmark.

Benchmarking Methodology 

In order to fairly compare the performance of IEnumerable and IArray I profile a number of test cases that include the cost of executing the both executing the operation and converting the result to a List.

The following time comparison are made by running a test repeatedly for different sizes of input sequences. Each iteration of the test includes a comparison of the results (not included in timings) to assure that the two versions are computing the same result. An entire benchmark for a given sequence is limited to a maximum number of mseconds. Higher numbers represent more execution time spent executing the test. Below is the section of code responsible for computing the time.

while ((DateTime.Now - start).TotalMilliseconds < MinTimeTrial)
{
    var inputA = gen(count);
    var inputB = inputA.ToIArray();
    var outputA = TimeFunction(() => f(inputA));
    var outputB = TimeFunction(() => g(inputB));
    tr.Result &= cmp(outputA.Item2, outputB.Item2);
    if (!tr.Result)
        throw new Exception("test failed");
    tr.TimeA += outputA.Item1;
    tr.TimeB += outputB.Item1;
} 

Benchmark Results 

Middle

The middle test returns the item halfway between the first and last items. The complexity is O(1) for IArray compared to O(N) for IEnumerable.

IEnumerable xs; xs.ElementAt(xs.Count() / 2);
IArray xs; xs[xs.Count / 2];
SizeIEnumerableIArray
10 112.6231 11.2247
100 209.174 3.2049
1000  248.1513 0.424 
10000 251.5417 0.0772
100000 241.6906 0.0289 

Select

The select test compares the Select() method of IArray with the same method on IEnumerable followed by a conversion to list using ToList. In general IArray outperforms by about 5x.
IEnumerable xs; xs.Select(x => x * 10).ToList();
IArray xs; xs.Select(x => x * 10);
SizeIEnumerableIArray
10 130.4198 22.8207
100 203.7963 33.9568
1000 219.7549 41.9019
10000 222.4864 42.0895
100000 215.9494 42.8385

Where

The where test is similar to the Select test. The gains in performance are in the vicinity of 2.5x.
IEnumerable xs; xs.Where(x => x & 10 == 0).ToList();
IArray xs; xs.Where(x => x & 10 == 0);
SizeIEnumerableIArray
10 87.7143 30.7792
100 169.0739 71.1118
1000 196.1686 83.7951
10000 197.5586 83.4241
100000 195.6574 82.8913

OrderBy

The IArray implementation of OrderBy returns an ISortedArray interface.
IEnumerable xs; xs.OrderBy(x => x).ToList();
IArray xs; xs.Where(x => x & 10 == 0);
SizeIEnumerableIArray
10 222.2376 30.1089
100 335.6944 38.6848
1000 355.91 77.7343
10000 356.3396 93.0001
100000 400.4608 97.2454

More Benchmarks 

In the attached project there are a large number of benchmark tests that demonstrate performance benefits of using IArray in a large number of scenarios.  All of the source code for IArray is also included.

Note: don't forget to compile in release mode, and to run without the debugger attached.

Summary 

When a programmer wants to use IEnumerable but needs to convert the result to an array or a list in order to have fast indexing then IArray can be a good high-performance alternative.

Please let me know if you find any faults with the code or the article, or if you find the code useful, thanks!  

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here