IEnumerable and State Machines

Vlad Neculai Vizitiu

2.78/5 (5 votes)

14 Nov 2018CPOL5 min read

5.6K

IEnumerable and state machines

Introduction

In a previous post, we had a look at enumerators and how .NET works with the foreach loop, we saw how enumerators are actually objects that transition from one state to another by using the MoveNext method and the Current property.

We know that if we want to make a custom enumerator, we will need to implement either the IEnumerator interface or its generic counterpart, this is where states come into play and state machines. Seeing how the Current property of an enumerator is an object or a generic type, we can use that to our advantage and implement all manner of algorithms from computations, to enumeration, or even workflows as a whole. All the “magic” actually happens in the MoveNext method, in which we can do any action we would require to transition from one state to another.

All of this is essential to understand if we ever want to implement our own collections by implementing either the IEnumerable interface or the generic IEnumerable version because we will have to tell our collection how it should be traversed, and that means returning an enumerator and implementing the MoveNext methods, basically we will need to either implement an enumerator or if we will have an inner collection in our own implementation, then just pass that around.

But in most case scenarios, the generic collections provided by .NET are more than enough unless we want to create a very specialized iterator or structure, like graphs and binary trees.

And that brings us to the main topic about what the .NET compiler does behind the scenes to make our life easier, I know we took a long way around but it’s to better understand how it fits together.

Now enters our guest, the yield keyword. For this, I prepared an example to better visualize one of the yield‘s uses.

public IEnumerable<int> Fibs (int fibCount)
{
  for (int i = 0, prevFib = 1, curFib = 1; i < fibCount; i++)
  {
    yield return prevFib;
    int newFib = prevFib+curFib;
    prevFib = curFib;
    curFib = newFib;
  }
}

What we have here is a method that returns all the “fibCount” Fibonacci numbers, please note the yield keyword. When the .NET compiler encounters the yield keyword, then it looks at the method return type (in this case, it’s IEnumerable of type int) and generates an enumerator behind the scenes, so this will actually create an object that enumerates integers, so the yield return combination is the equivalent of the Current property of an enumerator and the rest of the method until another yield is met which is the equivalent of the MoveNext method. By “another yield”, I mean that like an enumerator implemented manually, it will retain its current state and use it next time it encounters a yield. So in this case, the function will return 1 the first time, 1 on the second call, 2 on its third call and so on until the yield goes out of scope or the method ends.

You can have as many yield statements as you desire in your method, it’s not required to have it inside a loop, and it will always continue from the last return line it executed, let’s see an example:

public IEnumerable<int> GetSomeIntegers()
{
  yield return 1;
  yield return 2;
  yield return 3;
}

This method will return 1, then on the next call it will return 2 and on the next after that it will return 3.

But there is another form that the yield construct has, and that is yield break, which will tell the enumerator that it has reached the end of its scope, here’s how that looks:

IEnumerable<string> Foo (bool breakEarly)
{
  yield return "One";
  yield return "Two";

  if (breakEarly)
    yield break;

  yield return "Three";
}

This example will only return “One” and “Two”, it will never reach “Three” if the breakEarly parameter is true.

So you see, using yield return and yield break, we can design a complex workflow without implementing any enumerators of our own and using outside parameters with ease.

Next, I will show you an example that defies the normal execution flow, and shows a sneak peek at how LINQ works behind the scenes with its extension methods and how we can compose enumerators.

static void Main()
{
  foreach (int fib in EvenNumbersOnly(Fibs(6)))
  {
    Console.WriteLine (fib);
  }
}
    
static IEnumerable<int> Fibs (int fibCount)
{
  for (int i = 0, prevFib = 1, curFib = 1; i < fibCount; i++)
  {
    yield return prevFib;
    int newFib = prevFib+curFib;
    prevFib = curFib;
    curFib = newFib;
  }
}

static IEnumerable<int> EvenNumbersOnly (IEnumerable<int> sequence)
{
  foreach (int x in sequence)
    if ((x % 2) == 0)
      yield return x;
}

Here, we have an example of enumerator composition. At first glance, we would expect the Fibs to execute first, but this is the part that defies the workflow logic, the program will first enter the EvenNumberOnly method, and then when it reaches the inner foreach, only then it will actually enter the Fibs method. Then it will actually keep executing the foreach until it can return a value, at which point it will write it to the screen then the process starts again from where it left off keeping the state of both the EvenNumberOnly and Fibs enumerators until the Fibs finishes, at which point the EvenNumberOnly will also finish.

This is how LINQ allows us to chain several operations and work with a large dataset in “real time” as opposed to going through the whole collection of elements at every step. Using this technique, we could also process paginated data from web services without making a lot of calls up front and storing them in memory.

Even though it’s a very nice and useful feature, we have to keep in mind that the yield construct has a few restrictions:

The yield keyword can only be used in methods that return a form of IEnumerable.
The yield keyword cannot be used in a try-catch block (the reason is that when an exception is thrown, then the enumerator becomes invalid and it’s disposed of) but it can be used in a try-finally block.
The method cannot contain ref or out parameters.
It cannot be used in unsafe blocks.
It cannot be used in anonymous methods like lambda expressions.

In conclusion, we saw how we can implement custom enumerators without going through the hassle of making our own custom types and how we can leverage the foreach to do more than just iterate over a collection of items.

Thank you and see you next time.

CodeProject

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)