Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C#

Considerations on Efficient use of LINQ Extension Methods

4.98/5 (35 votes)
2 Jun 2014CPOL12 min read 32.3K   178  
Unless you're careful, LINQ extension methods on IEnumerable can lead to inefficient implementations

Introduction

The extension methods on IEnumerable<T> provided by LINQ can make some operations on sequences or collections fairly simple to implement. However, if you're not careful the implementation can be very inefficient. I've seen some tips and articles implementing various operations which appear quite clever but hide significant inefficiencies.

Although I use C# for this article, the issues and considerations apply to all .NET languages.

Background

The class System.Linq.Enumerable provides many extension methods that operate on enumerations that implement IEnumerable<T> to support LINQ to Objects The extension methods can be used directly as well. Many of these methods return new enumerations.

For example, the Where() methods filter the elements of a source enumeration into a new enumeration.

C#
int[] numbers = new int[] {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
IEnumerable<int> odds = numbers.Where(n => (n & 1) == 1);
foreach (int o in odds)
  Console.WriteLine(o.ToString());

At this point odds is an enumerator which provides:

1
3
5
7
9

However, if we change a value in the numbers array and print the elements of the enumeration again:

C#
int[] numbers = new int[] {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
IEnumerable<int> odds = numbers.Where(n => (n & 1) == 1);
foreach (int o in odds)
  Console.WriteLine(o.ToString());
numbers[0] = 21;
foreach (int o in odds)
  Console.WriteLine(o.ToString());

The odds enumerator, the second time through, provides:

21
1
3
5
7
9

The Where() method does not return an enumeration whose values are based on the contents of the numbers array at the time the Where() method is executed. The enumeration's action is lazy in that it doesn't do the filtering. Instead, it returns an instance of a class that will do the filtering as the program actually iterates over the enumeration. This lazy evaluation is where it is easy to get fooled into a significant inefficiency.

Simplicity leads us astray

So now I'll switch to the simple problem that lead me to think about this whole issue. The problem is to partition (chunkify) an enumeration into a sequence of smaller enumerations (chunks) of some maximum length. The straightforward implementation of this (as a generic extension method itself) is:

C#
public static IEnumerable<IEnumerable<T>> Chunkify0<T>(this IEnumerable<T> source,
                                                       int chunkSize)
{
  while (source.Any())
  {
    yield return source.Take(chunkSize);
    source = source.Skip(chunkSize);
  }
}

This just loops checking if the source enumeration has any elements. If so, yield return the enumeration that is the first up-to-chunkSize elements of source, then replace source with the enumeration that skips the first chunkSize elements of itself. Then it goes to the check if the new source enumeration has any elements.

So how does this behave? If we have an enumeration with 12 elements and we call this with a chunkSize of 3 and we then iterate once over each of the 4 returned IEnumerable<T> of length 3, then we'd expect that each element would be touched (invocation of .MoveNext()) twice and the total count of touches would be about 24. However, the total count of touches is 65. And if the starting enumeration has 13 elements, then instead of about 26 touches the total is 93!

Aside: How to Count MoveNext()?

I decided I needed to count how many times these enumerations would actually invoke MoveNext() on the "full" enumeration, as this seemed to be the best estimation to the complexity of the chunkifying implementations. So I wrote an enumeration method that used a static field to count the number of times the enumeration returned a value, and also count the end of enumeration condition:

C#
// These are in the Program class of a Console Application
// so they are static in order to be accessible from the static Main
static int SECount = 0;
static IEnumerable<char> SE(int len)
{
  char c = 'A';
  for (int i = 1; i <= len; ++i)
  {
    ++SECount;
    Console.WriteLine("SE: {0} Count: {1}", c, SECount);
    yield return c++;
  }
  ++SECount;
  Console.WriteLine("SE: (past end) Count: {0}", SECount);
}

(One thing that is a little deceiving is that the once it is in the "past end" state, additional calls to MoveNext() are not counted. I didn't consider this a big drawback.)
I also wanted to be able to put some tracing output in the while conditional, but Console.WriteLine is type void so I hacked a simple method that calls Console.WriteLine and then returns true.

C#
private static bool TrueWriteLine(string format, params object[] args)
{
  Console.WriteLine(format, args);
  return true;
}

You can see this in use below.

What's happening?

As with the Where() example above, the Take() and more importantly the Skip() methods don't actually build new collections to iterate over, they merely take the enumeration they're given and return class instances implementing IEnumerable<T> that Take or Skip the specified number of elements.

I'm going to revise the implementation to collect the individual chunk IEnumerable<T> in a List. It will make the explanation a bit simpler and it doesn't change the result. (The attached code includes both implementations so you can verify that the behavior is comparable.)

C#
public static IEnumerable<IEnumerable<T>> Chunkify0L<T>(this IEnumerable<T> initSource,
                                                        int chunkSize)
{
  IEnumerable<T> source = initSource; // this is to help clarify the explanation below.
  List<IEnumerable<T>> result = new List<IEnumerable<T>>();
  while (TrueWriteLine("while source.Any()") && source.Any())
  {
    Console.WriteLine("Take = {0}", chunkSize);
    IEnumerable<T> chunk = source.Take(chunkSize);
    result.Add(chunk);
    Console.WriteLine("skipping = {0}", chunkSize);
    source = source.Skip(chunkSize);
  }
  return result;
}

(This is going to get verbose quickly, so I'll use a smaller starting enumeration.)
With 7 elements in initSource the while loop executes 3 times, returning 2 enumerations of 3 elements and one enumeration of 1 element.
The first iteration looks like this:

  • source.Any() executes a MoveNext() on source.GetEnumerator() which is initSource.GetEnumerator().MoveNext()
  • chunk = source.Take(chunkSize) assigns to chunk an instance of a .NET-internal class that captures the chunkSize and source and whose GetEnumerator() returns an IEnumerator<char> that returns the first chunkSize elements of source (really initSource here)
  • This chunk is added to the result
  • source = source.Skip(chunkSize) assigns to source an instance of an internal class that captures the chunkSize and source and whose GetEnumerator() returns an IEnumerator<char> that skips the first chunkSize elements of source (really initSource here)

The second iteration looks like this:

  • source.Any() executes a MoveNext() on source.GetEnumerator(). However <code>source is an enumeration that will skip the first 3 (chunkSize) elements of initSource so this causes MoveNext() to be invoked 4 times on initSource.GetEnumerator() (3 because of the skipping and 1 for the actual check if the enumeration (after skipping) is empty)
  • chunk = source.Take(chunkSize) assigns to chunk an instance of a .NET-internal class that captures the chunkSize and source and whose GetEnumerator() returns an IEnumerator<char> that returns the first chunkSize elements of source. However, as above, source is an enumeration that will skip the first 3 elements of initSource. So this is equivalent to
    chunk = initSource.Skip(chunkSize).Take(chunkSize)
  • This chunk is added to the result
  • source = source.Skip(chunkSize) assigns to source an instance of an internal class that captures the chunkSize and source and whose GetEnumerator() returns an IEnumerator<char> that skips the first chunkSize elements of source. Again, source is an enumeration that will skip the first 3 elements of initSource. So this is equivalent to
    source = initSource.Skip(chunkSize).Skip(chunkSize)

Is it messy enough yet? There's still the third iteration:

  • source.Any() executes a MoveNext() on source.GetEnumerator(). However source is an enumeration that will skip the first 3 elements of an enumeration that will skip the first 3 elements of initSource so this causes MoveNext() to be invoked 7 times on initSource.GetEnumerator() (6 because of the skipping and 1 for the actual check if the enumeration (after skipping) is empty)
  • chunk = source.Take(chunkSize) assigns to chunk an instance of a .NET-internal class that captures the chunkSize and source and whose GetEnumerator() returns an IEnumerator<char> that returns the first chunkSize elements of source. However, as above, source is an enumeration that will skip the first 3 elements of an enumeration that will skip the first 3 elements of initSource. So this is equivalent to chunk = initSource.Skip(chunkSize).Skip(chunkSize).Take(chunkSize)
  • This chunk is added to the result
  • source = source.Skip(chunkSize) assigns to source an instance of an internal class that captures the chunkSize and source and whose GetEnumerator() returns an IEnumerator<char> that skips the first chunkSize elements of source. Again, source is an enumeration that will skip the first 3 elements of an enumeration that will skip the first 3 elements of initSource. So this is equivalent to
    source = initSource.Skip(chunkSize).Skip(chunkSize).Skip(chunkSize)

The fourth iteration fails because the source.Any() returns false since there are no elements in source. The source.Any() has caused MoveNext() to be invoked 8 times on initSource.GetEnumerator() (7 because of the skipping of all of the elements of initSource and 1 which actually determines that the enumeration (after skipping) is empty).
Executing this with:

C#
static void Main(string[] args)
{
  Console.WriteLine("Before chunkifying");
  var theChunks = SE(7).Chunkify0L(3);
}

Gives this output:

Before chunkifying
while source.Any()
SE: A Count: 1
Take = 3
skipping = 3
while source.Any()
SE: A Count: 2
SE: B Count: 3
SE: C Count: 4
SE: D Count: 5
Take = 3
skipping = 3
while source.Any()
SE: A Count: 6
SE: B Count: 7
SE: C Count: 8
SE: D Count: 9
SE: E Count: 10
SE: F Count: 11
SE: G Count: 12
Take = 3
skipping = 3
while source.Any()
SE: A Count: 13
SE: B Count: 14
SE: C Count: 15
SE: D Count: 16
SE: E Count: 17
SE: F Count: 18
SE: G Count: 19
SE: (past end) Count: 20

So at this point MoveNext() has been invoked 20 times!
Notice that the "Take" and "skipping" actions don't cause invocations of MoveNext() at those points in the execution.
Also, now, result contains 3 enumerations (that have not been iterated over yet):

  1. initSource.Take(3)
  2. initSource.Skip(3).Take(3)
  3. initSource.Skip(3).Skip(3).Take(3)

You probably can already see where the lazy evaluation has lead.

Iterating Over the Contents of the Chunks

One simple way to iterate over the contents of each of the chunks is to collect the elements into a List<char> using the .ToList() extension method provided by LINQ. So I'll extend the Program.Main() to do that:

C#
static void Main(string[] args)
{
  Console.WriteLine("Before chunkifying");
  var theChunks = SE(7).Chunkify0L(3);
  Console.WriteLine("Before iterating over the chunks");
  foreach (var chunk in theChunks)
  {
    Console.WriteLine("  Before chunk.ToList()");
    var chunkList = chunk.ToList();
    Console.WriteLine("  After chunk.ToList()");
    Console.WriteLine("  Iterating over the chunkList of length: {0}", chunkList.Count);
    foreach (var item in chunkList)
    {
      Console.WriteLine("    chunkList: {0}", item);
    }
  }
  Console.WriteLine("Total SE Count = {0}", SECount);
  Console.WriteLine("Enter to exit");
  Console.ReadLine();
}

This then outputs the following. (I'll omit the output shown above and start with the "Before iterating over the chunks" line.

Before iterating over the chunks
  Before chunk.ToList()
SE: A Count: 21
SE: B Count: 22
SE: C Count: 23
  After chunk.ToList()
  Iterating over the chunkList of length: 3
    chunkList: A
    chunkList: B
    chunkList: C
  Before chunk.ToList()
SE: A Count: 24
SE: B Count: 25
SE: C Count: 26
SE: D Count: 27
SE: E Count: 28
SE: F Count: 29
  After chunk.ToList()
  Iterating over the chunkList of length: 3
    chunkList: D
    chunkList: E
    chunkList: F
  Before chunk.ToList()
SE: A Count: 30
SE: B Count: 31
SE: C Count: 32
SE: D Count: 33
SE: E Count: 34
SE: F Count: 35
SE: G Count: 36
SE: (past end) Count: 37
  After chunk.ToList()
  Iterating over the chunkList of length: 1
    chunkList: G
Total SE Count = 37
Enter to exit
  • The first chunk.ToList() behaves as expected. It invokes MoveNext() 3 times, taking 3 elements from the original enumeration.
  • The second chunk.ToList() restarts at the beginning of the original enumeration, invoking MoveNext() 6 times! Remember that the second chunk has the deferred (lazy) Skip(3) still pending, so this first invokes MoveNext() 3 times to do the Skip() and then 3 more times to take the 3 elements.
  • Likewise, the third chunk.ToList() restarts at the beginning of the original enumeration, invoking MoveNext() 8 times, due to two consecutive deferred Skip(3) operations. So there's MoveNext() 6 times for the Skip()s, 1 to take the seventh element of the enumeration and 1 to detect the enumeration has ended.

The computational complexity grows as O(n²) (where n is the length of the original enumeration). The biggest increases in the number of invocations of MoveNext() occur at the n mod chunkSize == 1 lengths. (See below for the complexity analysis.) So for short length source enumerations (initSource), this isn't a big deal. However, as the source enumeration gets longer this gets ugly. Fast!

So What Can Be Done?

The next step seems to be making this more efficient. So getting rid of the concatenated Skip()s seemed to be the key.

The First New Implementation

I thought if I walk through the IEnumerator<T> of the source enumeration explicitly, then construct the chunks while updating the IEnumerator<T> it should work. I still wanted to avoid building a new collection in memory for each of the chunks. So this was the next iteration:

C#
public static IEnumerable<IEnumerable<T>> Chunkify1<T>(this IEnumerable<T> source,
                                                       int chunkSize)
{
  IEnumerator<T> e = source.GetEnumerator();
  Func<bool> mover = () => { Console.WriteLine("Chunkify MoveNext"); return e.MoveNext(); };
  while (mover())
  {
    yield return ChunkIterator(e, chunkSize);
  }
}

private static IEnumerable<T> ChunkIterator<T>(IEnumerator<T> e,
                                                  int chunkSize)
{
  Console.WriteLine("ChunkIterator = {0}", chunkSize);
  do
  {
    Console.WriteLine("CI[{1}]: {0}", e.Current, chunkSize);
    yield return e.Current;
  } while (--chunkSize > 0 && e.MoveNext());
}

This seemed to work well, changing the Chunkify0L(3) to Chunkify1(3) in the Program.Main() shown above I got:

Before chunkifying
Before iterating over the chunks
Chunkify MoveNext
SE: A Count: 1
  Before chunk.ToList()
ChunkIterator = 3
CI[3]: A
SE: B Count: 2
CI[2]: B
SE: C Count: 3
CI[1]: C
  After chunk.ToList()
  Iterating over the chunkList of length: 3
    chunkList: A
    chunkList: B
    chunkList: C
Chunkify MoveNext
SE: D Count: 4
  Before chunk.ToList()
ChunkIterator = 3
CI[3]: D
SE: E Count: 5
CI[2]: E
SE: F Count: 6
CI[1]: F
  After chunk.ToList()
  Iterating over the chunkList of length: 3
    chunkList: D
    chunkList: E
    chunkList: F
Chunkify MoveNext
SE: G Count: 7
  Before chunk.ToList()
ChunkIterator = 3
CI[3]: G
SE: (past end) Count: 8
  After chunk.ToList()
  Iterating over the chunkList of length: 1
    chunkList: G
Chunkify MoveNext
Total SE Count = 8
Enter to exit

This looks great, the count of MoveNext() invocations is 8. (One for each of the 7 elements of the original enumeration and 1 to determine it is done.)

But what happens if the resulting chunks are not iterated over in the order they occur in theChunks? I tried iterating over the chunks in reverse order.

Before chunkifying
Before iterating over the chunks
Chunkify MoveNext
SE: A Count: 1
Chunkify MoveNext
SE: B Count: 2
Chunkify MoveNext
SE: C Count: 3
Chunkify MoveNext
SE: D Count: 4
Chunkify MoveNext
SE: E Count: 5
Chunkify MoveNext
SE: F Count: 6
Chunkify MoveNext
SE: G Count: 7
Chunkify MoveNext
SE: (past end) Count: 8
  Before chunk.ToList()
ChunkIterator = 3
CI[3]: G
  After chunk.ToList()
  Iterating over the chunkList of length: 1
    chunkList: G
  Before chunk.ToList()
ChunkIterator = 3
CI[3]: G
  After chunk.ToList()
  Iterating over the chunkList of length: 1
    chunkList: G
  Before chunk.ToList()
ChunkIterator = 3
CI[3]: G
  After chunk.ToList()
  Iterating over the chunkList of length: 1
    chunkList: G
  Before chunk.ToList()
ChunkIterator = 3
CI[3]: G
  After chunk.ToList()
  Iterating over the chunkList of length: 1
    chunkList: G
  Before chunk.ToList()
ChunkIterator = 3
CI[3]: G
  After chunk.ToList()
  Iterating over the chunkList of length: 1
    chunkList: G
  Before chunk.ToList()
ChunkIterator = 3
CI[3]: G
  After chunk.ToList()
  Iterating over the chunkList of length: 1
    chunkList: G
  Before chunk.ToList()
ChunkIterator = 3
CI[3]: G
  After chunk.ToList()
  Iterating over the chunkList of length: 1
    chunkList: G
Total SE Count = 8
Enter to exit

Wow! What happened? The count is still 8 but instead of two chunks of length 3 and one of length 1, now there are seven chunks of length 1 and they all contain the same seventh element of the original source enumeration ('G').

Again, it is still the lazy enumeration of the chunks and chunk contents that is the problem. They're all sharing the same IEnumerator<T> of the source enumeration, so if it isn't used in the sequence we expect we get nonsense results.

In case you were thinking "Why not just clone the IEnumerator<T> before passing it to the ChunkIterator()?" That will not work for two reasons:

  1. There's no way to clone the IEnumerator<T>. It is an interface. The object behind it can be anything that implements the interface.
  2. Even if you could make a copy of the IEnumerator<T> it wouldn't help because this actually depends on the shared state to be updated in the ChunkIterator() so it is ready for the next chunk in Chunkify1()(or detecting the end of the source).

So this works with the caveat that the chunks must be completely iterated over in the original sequence order.

The Next New Implementation

Although I had wanted to avoid building a new collection in memory for each of the chunks, I thought that doing the opposite and building a List-of-Lists would certainly prevent the odd behavior I was seeing in the non-sequential processing case above. So:

C#
public static IEnumerable<IEnumerable<T>> Chunkify2<T>(this IEnumerable<T> source,
                                                       int chunkSize)
{
  List<IEnumerable<T>> allChunks = new List<IEnumerable<T>>();
  int count = 0;
  List<T> chunk = null;
  foreach (var item in source)
  {
    if (chunk == null)
    {
      Console.WriteLine("  New chunk");
      chunk = new List<T>(chunkSize);
      count = 0;
      allChunks.Add(chunk);
    }
    Console.WriteLine("  chunk[{0}]: {1}", count + 1, item);
    chunk.Add(item);
    if (++count == chunkSize)
    {
      chunk = null;
    }
  }
  return allChunks;
}

This worked, changing to Chunkify2(3) in the Program.Main() gives the same count of 8 and correct forward and reverse behavior:

Forward
Before chunkifying
SE: A Count: 1
  New chunk
  chunk[1]: A
SE: B Count: 2
  chunk[2]: B
SE: C Count: 3
  chunk[3]: C
SE: D Count: 4
  New chunk
  chunk[1]: D
SE: E Count: 5
  chunk[2]: E
SE: F Count: 6
  chunk[3]: F
SE: G Count: 7
  New chunk
  chunk[1]: G
SE: (past end) Count: 8
Before iterating over the chunks
  Before chunk.ToList()
  After chunk.ToList()
  Iterating over the chunkList of length: 3
    chunkList: A
    chunkList: B
    chunkList: C
  Before chunk.ToList()
  After chunk.ToList()
  Iterating over the chunkList of length: 3
    chunkList: D
    chunkList: E
    chunkList: F
  Before chunk.ToList()
  After chunk.ToList()
  Iterating over the chunkList of length: 1
    chunkList: G
Total SE Count = 8
Enter to exit

Reverse
Before chunkifying
SE: A Count: 1
  New chunk
  chunk[1]: A
SE: B Count: 2
  chunk[2]: B
SE: C Count: 3
  chunk[3]: C
SE: D Count: 4
  New chunk
  chunk[1]: D
SE: E Count: 5
  chunk[2]: E
SE: F Count: 6
  chunk[3]: F
SE: G Count: 7
  New chunk
  chunk[1]: G
SE: (past end) Count: 8
Before iterating over the chunks
  Before chunk.ToList()
  After chunk.ToList()
  Iterating over the chunkList of length: 1
    chunkList: G
  Before chunk.ToList()
  After chunk.ToList()
  Iterating over the chunkList of length: 3
    chunkList: D
    chunkList: E
    chunkList: F
  Before chunk.ToList()
  After chunk.ToList()
  Iterating over the chunkList of length: 3
    chunkList: A
    chunkList: B
    chunkList: C
Total SE Count = 8
Enter to exit

This is as expected. Everything is assembled into an in-memory collection of chunks, where each chunk is itself assembled into an in-memory collection.

But what if we had a source enumeration where getting the next element is comparatively slow, so getting all of the source at once would be very slow; but just getting a chunk's worth would be reasonable. Or, if the source enumeration was very large but the chunks were reasonably sized? Could we still be sort of lazy?

The Final New Implementation

This implementation builds each chunk as a List<T> but provides them each in a lazy (on-demand) manner:

C#
public static IEnumerable<IEnumerable<T>> Chunkify<T>(this IEnumerable<T> source,
                                                      int chunkSize)
{
  IEnumerator<T> e = source.GetEnumerator();
  Func<bool> mover = () => { Console.WriteLine("Chunkify MoveNext"); return e.MoveNext(); };
  int count = 0;
  while (mover())
  {
    Console.WriteLine("  New chunk");
    List<T> chunk = new List<T>(chunkSize);
    do
    {
      Console.WriteLine("  chunk[{1}]: {0}", e.Current, count);
      chunk.Add(e.Current);
    } while (++count < chunkSize && e.MoveNext());
    yield return chunk;
    count = 0;
  }
}

The results for this are:

Forward
Before chunkifying
Before iterating over the chunks
Chunkify MoveNext
SE: A Count: 1
  New chunk
  chunk[0]: A
SE: B Count: 2
  chunk[1]: B
SE: C Count: 3
  chunk[2]: C
  Before chunk.ToList()
  After chunk.ToList()
  Iterating over the chunkList of length: 3
    chunkList: A
    chunkList: B
    chunkList: C
Chunkify MoveNext
SE: D Count: 4
  New chunk
  chunk[0]: D
SE: E Count: 5
  chunk[1]: E
SE: F Count: 6
  chunk[2]: F
  Before chunk.ToList()
  After chunk.ToList()
  Iterating over the chunkList of length: 3
    chunkList: D
    chunkList: E
    chunkList: F
Chunkify MoveNext
SE: G Count: 7
  New chunk
  chunk[0]: G
SE: (past end) Count: 8
  Before chunk.ToList()
  After chunk.ToList()
  Iterating over the chunkList of length: 1
    chunkList: G
Chunkify MoveNext
Total SE Count = 8
Enter to exit

Reverse
Before chunkifying
Before iterating over the chunks
Chunkify MoveNext
SE: A Count: 1
  New chunk
  chunk[0]: A
SE: B Count: 2
  chunk[1]: B
SE: C Count: 3
  chunk[2]: C
Chunkify MoveNext
SE: D Count: 4
  New chunk
  chunk[0]: D
SE: E Count: 5
  chunk[1]: E
SE: F Count: 6
  chunk[2]: F
Chunkify MoveNext
SE: G Count: 7
  New chunk
  chunk[0]: G
SE: (past end) Count: 8
Chunkify MoveNext
  Before chunk.ToList()
  After chunk.ToList()
  Iterating over the chunkList of length: 1
    chunkList: G
  Before chunk.ToList()
  After chunk.ToList()
  Iterating over the chunkList of length: 3
    chunkList: D
    chunkList: E
    chunkList: F
  Before chunk.ToList()
  After chunk.ToList()
  Iterating over the chunkList of length: 3
    chunkList: A
    chunkList: B
    chunkList: C
Total SE Count = 8
Enter to exit

Reversing theChunks forces all of the individual chunks to be created, so this is no real advantage over the List-of-Lists if theChunks will be iterated over before the individual chunks are processed.

The attached code

I've attached all of the code from this article. It includes versions of the extension method implementations that don't perform the Console.WriteLine() output. The implementations there also include some basic argument checking that I've omitted in the body of the article. The Visual Studio project file in the attached code is from VS 2012 and targets .NET 4, but the code should be usable in earlier versions of C#/.NET supporting LINQ.

Parting Thoughts

Using the LINQ extension methods can be very powerful and make implementing some operations easy and very readable/maintainable. However, it is important to remember the lazy behavior of those methods (and most enumerators, in general) and think through how things are really going to work.

Complexity Analysis of Chunkify0

To analyze the computational complexity of the Chunkify0, I generated the MoveNext() count for chunkSize of 2 to 6 with sequence lengths in multiples of the chunkSize, using the lengths where length mod chunkSize == 1 because those lengths dominate the complexity growth. I used the following code to generate data that I could copy and paste into Octave (free Matlab equivalent) and then used that to fit the data to polynomials:

C#
for (int chunkSize = 2; chunkSize <= 6; chunkSize++)
{
  string delim = string.Empty;
  Debug.Write(string.Format("data{0} = [", chunkSize));
  for (int sequenceLengthFactor = 1; sequenceLengthFactor <= 5; sequenceLengthFactor++)
  {
    SECount = 0;
    var theChunks = SE(1 + sequenceLengthFactor * chunkSize).Chunkify0(chunkSize);
    foreach (var chunk in theChunks)
    {
      var chunklist = chunk.ToList();
    }
    Debug.Write(string.Format("{0}{1}, {2}", delim, 1 + sequenceLengthFactor * chunkSize, SECount));
    delim = "; ";
  }
  Debug.WriteLine("];");
}

This fit very nicely to a set of quadratic equations with coefficients that depended on the chunkSize, and could be reduced to:

${1 \over chunkSize} \times n^2 + (2 + {chunkSize - 1 \over chunkSize}) \times n + 2$

where n is the sequence length.

History

  • June 2, 2014 - Initial version.
  • June 5, 2014 - Added complexity analysis.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)