Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

LINQ Part 1: A Deep Dive into IEnumerable

0.00/5 (No votes)
28 Mar 2018 6  
A deep dive into the IEnumerable interface, how the C# language supports it, how to avoid some of its pitfalls, and an introduction to some basic LINQ concepts.

Introduction

LINQ comes in two flavors, based on the associated interface that it extends: either IEnumerable or IQueryable. We’ll start by limiting our discussion to extension methods for the IEnumerable interface that are found in the System.Linq.Enumerable class. If you’re unfamiliar with extension methods, there is a summary of this topic at the end of this article.

To understand LINQ well, it is necessary to understand IEnumerable well. This conceptually simple interface is the work-horse of LINQ. It can demonstrate some surprisingly complex behavior.

Before dismissing this topic, and moving on to the next one, try to correctly answer these five simple questions. If successful, you might understand it well enough. If not, it might be worth taking a few extra moments to read this article.

  1. What is the one and only method required by this interface?
  2. How would you implement a foreach loop without using the foreach keyword?
  3. What are the pitfalls of premature materialization and how you can you avoid them?
  4. How would you return an IEnumerable from a method without resorting to the yield keyword?
  5. Is there a maximum number of items that can exist in an IEnumerable?

To check your answers, skip to the end of the article, but no cheating please.

By the end of this article, you will fully understand all of these concepts and begin to understand how they relate to LINQ.

Background

This is the first in a series of articles on LINQ. Links to other articles in this series are as follows:

Many Things

The way to know life is to love many things. – Vincent Van Gogh

Anytime we deal with many things, it is highly likely that an IEnumerable is involved. At its core, IEnumerable is simply a sequence of zero or more items. This is the interface that is implemented by every collection in the .NET framework: arrays, lists, dictionaries, hash sets, and more.

Also, a great deal of the .NET framework employs this interface to return long (or costly) sequences of items without requiring a collection, a few quick examples include: DirectoryInfo.EnumerateFiles, DirectoryInfo.EnumerateDirectories, and File.ReadLines.

This interface has one, and only one, method: GetEnumerator. If you’ve ever used the foreach keyword, you’ve used this interface. So, let’s consider a simple foreach loop:

List<char> letters = new List<char> { 'a', 'b', 'c', 'd', 'e' };

foreach (char letter in letters)
  Console.Write(letter + " ");
Console.WriteLine();

Under the covers, C# is performing the equivalent of the following on your behalf:

using (IEnumerator<char> enumerator = letters.GetEnumerator())
  while(enumerator.MoveNext())
  {
    char letter = enumerator.Current;
    Console.Write(letter + " ");
  }
Console.WriteLine();

Almost There

Greatness is more than potential. It is the execution of that potential. - Eric Burns

The beauty of an IEnumerable is that it sits there and does almost nothing. Until you call GetEnumerator, it remains immaterial and full of potential. You can copy its value between variables, pass it as a parameter, and wrap another IEnumerable around it. What happens? Almost nothing.

Let’s consider a very simple IEnumerable, based on the sequence of Fibonacci numbers. This sequence continues infinitely. Every item in the sequence, except the first two, is simply the sum of the previous two. This sequence proceeds as follows: 1, 1, 2, 3, 5, 8, 13, 21, 34, 55…

Let’s consider the following simple method that returns the numbers in this sequence:

public static IEnumerable<long> EnumerateFibonacci()
{
  long current = 1;
  long previous = 0;

  while (true)
  {
    yield return current;

    long temp = previous;
    previous = current;
    current += temp;
  }
}

Even though this sequence is infinite, we can still make use of it in an application. Note: This sequence will start to return nonsense when current + temp is greater than long.MaxValue.

IEnumerable<long> mySequence = EnumerateFibonacci();

The above assignment simply obtains the IEnumerable. It does not execute a single line of code within the EnumerateFibonacci method. A common mistake I’ve seen people make is to simply tack a ToList call onto the end of every IEnumerable, as follows:

mySequence = mySequence.ToList();

In order to populate the list, every item in the sequence must be evaluated. In this case, since the sequence is infinite, the operation would never complete. Eventually, the operation would fail, after all of the memory was exhausted.

While this is an extreme example of the pitfalls of premature materialization, there are many real world examples. With most, the application simply freezes until all of the items have been evaluated. It can only begin processing items after the list has been built.

Taming the Infinite

Hold infinity in the palm of your hand, and eternity in an hour. – William Blake

Using Fibonacci as a model again, let’s assume we want to display the first 10 items. There is no need for a list. The following simple loop will suffice:

IEnumerable<long> mySequence = EnumerateFibonacci();
int index = 0;

foreach (long number in mySequence)
{
  Console.Write(number + " ");
  if (++index >= 10)
    break;
}
Console.WriteLine();

Or, to make it easier, you could use LINQ to prune the list:

mySequence = EnumerateFibonacci()
  .Take(10);

Here, LINQ simply wraps another IEnumerable, which stops after the tenth item, around mySequence. Again, not a single line of code in the EnumerateFibonacci method is executed by this assignment.

So, putting it all together, the following makes it easy to tame the infinite:

foreach (long number in EnumerateFibonacci().Take(10))
  Console.Write(number + " ");
Console.WriteLine();

The beauty of LINQ is that most methods employ the decorator pattern, simply wrapping a new IEnumerable around the sequence. By adding a series of method calls, you can form a production line (or pipeline). For example, to obtain an IEnumerable for the second 10 numbers in the sequence using LINQ, the following assignment suffices:

mySequence = EnumerateFibonacci()
  .Skip(10)
  .Take(10);

Again, not a single line of code in EnumerateFibonacci is executed as a result of this assignment.

Under the Hood

Everything has beauty, but not everyone sees it. – Confucius

Even trivial LINQ, can seem almost magical in its ability to fashion a pipeline from a series of simple method calls. However, once you understand what the simplest LINQ methods do, it becomes much easier to understand.

At this level, all we really have is a group of extension methods that operate on IEnumerable. Let’s consider how we might implement our own version of the Take method.

public static class MyEnumerable
{
  public static IEnumerable<TItem> MyTake<TItem>(this IEnumerable<TItem> items, int count)
  {
    using (IEnumerator<TItem> enumerator = items.GetEnumerator())
      for (int index = 0; index < count && enumerator.MoveNext(); index++)
        yield return enumerator.Current;
  }
}

With this method available, we look very much like LINQ:

mySequence = EnumerateFibonacci()
  .MyTake(10);

Heavy Lifting

Heavy lifting doesn’t need to be heavy spending if we do the job right. – Buzz Aldrin

The C# language does a remarkable amount of heavy lifting, on your behalf, when you use the yield keyword. While we won’t exactly replicate the code that is produced, it is still helpful to consider how we might write the same code without using the yield keyword.

First, consider the code below. Don’t worry we’ll explain it right afterwards.

public static class MyEnumerable2
{
  public static IEnumerable<TItem> MyTake2<TItem>(this IEnumerable<TItem> items, int count) =>
    new MyTakeEnumerable<TItem>(items, count);

  private class MyTakeEnumerable<TItem> : IEnumerable<TItem>
  {
    public MyTakeEnumerable(IEnumerable<TItem> items, int count)
    {
      this.items = items;
      this.count = count;
    }

    private IEnumerable<TItem> items;
    private int count;

    public IEnumerator<TItem> GetEnumerator() =>
      new MyTakeEnumerator(items.GetEnumerator(), count);

    IEnumerator IEnumerable.GetEnumerator() =>
      GetEnumerator();

    private class MyTakeEnumerator : IEnumerator<TItem>
    {
      public MyTakeEnumerator(IEnumerator<TItem> enumerator, int count)
      {
        this.enumerator = enumerator;
        this.count = count;
      }

      private IEnumerator<TItem> enumerator;
      private int count;
      private int index;

      public bool MoveNext() =>
        index++ >= count ? false : enumerator.MoveNext();

      public TItem Current => enumerator.Current;

      object IEnumerator.Current => enumerator.Current;

      public void Dispose() => enumerator.Dispose();

      public void Reset() => enumerator.Reset();
    }
  }
}

I’ll bet you are now very happy that the yield keyword does all this work for you. Now, let’s try to unwind this somewhat complicated code.

Basically, we have three classes that we’ve implemented here:

Class Name Description
MyEnumerable2 This class simply provides the MyTake2 extension method and one inner class.
MyTakeEnumerable This inner class simply wraps the original sequence. It provides a new GetEnumerator method that gets an enumerator, for the original sequence, and wraps it in an instance of MyTakeEnumerator.
MyTakeEnumerator This inner class simply wraps an enumerator from the original sequence and limits it to the requested number of items.

MyEnumerable2 Class

This important method here is fairly easy to understand. We simply create an instance of MyTakeEnumerable to wrap the original sequence.

public static IEnumerable<TItem> MyTake2<TItem>(this IEnumerable<TItem> items, int count) =>
  new MyTakeEnumerable<TItem>(items, count);

MyTakeEnumerable Class

Here again, we have a relatively straightforward method. We simply create an instance of MyTakeEnumerator to wrap the enumerator from the original sequence.

public IEnumerator<TItem> GetEnumerator() =>
  new MyTakeEnumerator(items.GetEnumerator(), count);

MyTakeEnumerator Class

Here, things seem more complicated, but are truly not. This is simply an implementation of IEnumerator that wraps the enumerator from the original sequence. Most of the methods do nothing more than call the equivalent methods from the underlying enumerator.

There is only one exception: the MoveNext method. This one we change slightly. If we have returned less than the requested number items, we simply call the underlying MoveNext method. Otherwise, we return false.

public bool MoveNext() =>
  index++ >= count ? false : enumerator.MoveNext();

Summing It Up

To summarize the summary of the summary: people are a problem. – Douglas Adams

When C# encounters a yield keyword, within a method, it performs a little bit of magic. Your method is changed so that it simply creates an IEnumerable/IEnumerator class on your behalf. It then returns an instance of this new class.

So where did your code go? It is (essentially) moved into the MoveNext method of the IEnumerator.

This is why none of your code is actually executed when you call the original method. Your actual code now resides in MoveNext. It is not executed until you call GetEnumerator and then MoveNext.

Standard LINQ Methods

The standard LINQ methods are implemented as extension methods to the IEnumerable interface. They reside in the System.Linq.Enumerable class.

These methods are divided into three basic flavors: those that return a sequence in the original order, those that return a sequence in a different order, and those that return a singleton value.

Sequence in Original Order

Some methods return a new sequence where all or a portion of the original sequence is included, in its original order. These methods do not require materialization of the sequence to begin returning items. These methods include: Append, AsEnumerable, Cast, Concat, Empty, Except, OfType, Prepend, Range, Repeat, Select, SelectMany, Skip, SkipWhile, Take, TakeWhile, Where, and Zip.

Sequence in New Order

Some methods return a new sequence where all or a portion of the original sequence is included, but in a different order. While execution is still deferred until you begin consuming the sequence, this can be deceptive. In order to return the initial item in the sequence, these methods must first evaluate (and materialize) either a portion or all of the sequence. These methods include: Distinct, GroupBy, GroupJoin, Intersect, Join, OrderBy, OrderByDescending, Reverse, ThenBy, ThenByDescending, and Union.

Singleton

The singleton methods force immediate materialization of at least a portion of the sequence. These methods include: Aggregate, All, Any, Average, Contains, Count, DefaultIfEmpty, ElementAt, ElementAtOrDefault, First, FirstOrDefault, Last, LastOrDefault, LongCount, Max, Min, SequenceEqual, Single, SingleOrDefault, Sum, ToArray, ToDictionary, ToList, and ToLookup.

Avoiding Materialization

If you stick to standard LINQ methods that return a new altered sequence in the same order as the original sequence, you will avoid materialization. While this is not always possible, it is possible far more often than some of our colleagues may realize.

One common complaint, that forces materialization “I need the count, sum, minimum, maximum, or average of the items”. This may be true. However, when you need this information? If you are simply displaying this information at the end of a report, then it can be trivially calculated during processing (without forcing premature materialization).

Another common complaint, “I first need to know if the sequence is empty”. This is another problem that is solved quite easily. In the demonstration project, you’ll find the extension method PeekableEnumerable.NullIfEmpty. It can be used as follows:

mySequence = mySequence.NullIfEmpty();
if (mySequence == null)
{
  Console.WriteLine("empty");
  return;
}

Sometimes, you may encounter an instance where you need to look-ahead in the sequence. This is familiar to consumers of System.IO.StreamReader, which provides a Peek method specifically for this purpose. This is yet another problem that is solved quite easily. In the demonstration project, you’ll find the extension method PeekableEnumerable.AsPeekableEnumerable. It can be used as follows:

using (var peekable = mySequence.AsPeekableEnumerable())
{
  foreach(var current in peekable)
  {
    Console.Write($"current={current}");
    if (peekable.Peek(out long next))
      Console.Write($", next={next}");
    Console.WriteLine();
  }
}

Bottom line, lack of ingenuity is probably the most common cause of premature materialization. That said, there are a few times when materialization is necessary and unavoidable. Simply take some extra time to consider your circumstances, before simply slapping a ToList onto your sequence.

Extension Methods

Extension methods were first introduced in C# 3.0 (along with LINQ). From a syntactical stand-point, these methods appear to extend the behavior of pre-existing classes or interfaces.

So, let’s first consider a normal method, for the sake of comparison:

public static string TrimMyStringNormal(string value) =>
  value.Trim();

This method is invoked as follows:

string value = TrimMyStringNormal(" Trim It! ");

Extension methods are declared by prefixing the first parameter of a method with the this keyword. An extension method for the same logic might look like the following:

public static class SimpleExtension
{
  public static string TrimMyStringExtended(this string value) =>
    value.Trim();
}

When we call this extension method, syntactically it appears to have extended the string class:

string value = " Trim It! ".TrimMyStringExtended();

In truth, behind the scenes, the compiler simply translates this call into the following call:

string value = SimpleExtension.TrimMyStringExtended(" Trim It!  ");

Answers

In the introduction of this article, we presented some simple questions, below are the answers, all of which are explained (in detail) within the article:

  1. What is the one and only method required by this interface?

    The GetEnumerator method is the one and only method required by this interface. As a follow on, if you’ve never written an IEnumerator implementation, you should do so.

  2. How would you implement a foreach loop without using the foreach keyword?

    You invoke GetEnumerator to get an enumerator, use MoveNext/ Current to iterate through the sequence, and then invoke Dispose to free the resources (if any) associated with the enumerator. Generally, the disposal is accomplished via a using statement.

  3. What are the pitfalls of premature materialization and how you can you avoid them?

    Materialization occurs when you force an IEnumerable to provide members in its sequence. Methods such as LINQ’s Count and ToList force materialization of the entire sequence. This can delay processing of individual members until after all of them have been evaluated. It can also result in unnecessary memory consumption. Finally, in the case of infinite sequences (e.g. the Fibonacci sequence), it can create a scenario where the operation never completes.

  4. How would you return an IEnumerable from a method without resorting to the yield keyword?

    You would need to do what yield does on your behalf. Wrap the code in an IEnumerator implementation, wrap the IEnumerator in an IEnumerable instance (via GetEnumerator), and then return the IEnumerable instance. The yield keyword sure is friendly.

  5. Is there a maximum number of items that can exist in an IEnumerable?

    There is no limit. An IEnumerable can literally return an infinite number of elements in a sequence. A practical example of this behavior is provided in the sample code of this article, where an IEnumerable for Fibonacci numbers is provided.

Additional Reading

Below are a collection of links to Microsoft reference materials covering some of the concepts covered in this article:

Enumerable Class
https://msdn.microsoft.com/en-us/library/system.linq.enumerable.aspx

Extension Methods (C# Programming Guide)
https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/classes-and-structs/extension-methods

Getting Started with LINQ in C#
https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/linq/getting-started-with-linq

Standard Query Operators Overview (C#)
https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/linq/standard-query-operators-overview

yield (C# Reference)
https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/yield

History

  • 3/28/2018 - The original version was uploaded
  • 3/28/2018 - Error in original upload...fixed and re-uploaded
  • 4/20/2018 - Added link to second article in series
  • 4/21/2018 - Added link to third article in series
  • 4/25/2018 - Added link to fourth article in series

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here