Introduction
This article explains the in-depth analysis of how the C# yield
keyword works under the hood.
If you don't have any idea about yield
keyword or have never used it before, check out my post on Iterators in C# on my original blog or on CodeProject.
Using Iterators is easy, but it's always good to know how this thing works under the hood, right?
Well for the purpose of understanding, let's have a simple example of C# method, which returns a list of values.
Here is the code:
public class InDepth
{
static IEnumerator DoSomething()
{
yield return "start";
for (int i = 1; i < 3; i++)
{
yield return i.ToString();
}
yield return "end";
}
}
It's pretty much simple, isn't it ? Let's have a look at the compiled code:
using System;
using System.Collections;
using System.Collections.Generic;
using System.Diagnostics;
using System.Runtime.CompilerServices;
namespace YieldDemo
{
public class InDepth
{
public InDepth()
{
base..ctor();
}
private static IEnumerator DoSomething()
{
InDepth.<DoSomething>d__0 doSomethingD0 = new InDepth.<DoSomething>d__0(0);
return (IEnumerator) doSomethingD0;
}
[CompilerGenerated]
private sealed class <DoSomething>d__0 : IEnumerator<object>, IEnumerator, IDisposable
{
private object <>2__current;
private int <>1__state;
public int <i>5__1;
object IEnumerator<object>.Current
{
[DebuggerHidden] get
{
return this.<>2__current;
}
}
object IEnumerator.Current
{
[DebuggerHidden] get
{
return this.<>2__current;
}
}
[DebuggerHidden]
public <DoSomething>d__0(int <>1__state)
{
base.\u002Ector();
this.<>1__state = param0;
}
bool IEnumerator.MoveNext()
{
switch (this.<>1__state)
{
case 0:
this.<>1__state = -1;
this.<>2__current = (object) "start";
this.<>1__state = 1;
return true;
case 1:
this.<>1__state = -1;
this.<i>5__1 = 1;
break;
case 2:
this.<>1__state = -1;
++this.<i>5__1;
break;
case 3:
this.<>1__state = -1;
goto default;
default:
return false;
}
if (this.<i>5__1 < 3)
{
this.<>2__current = (object) this.<i>5__1.ToString();
this.<>1__state = 2;
return true;
}
else
{
this.<>2__current = (object) "end";
this.<>1__state = 3;
return true;
}
}
[DebuggerHidden]
void IEnumerator.Reset()
{
throw new NotSupportedException();
}
void IDisposable.Dispose()
{
}
}
}
}
Shocked! I just wrote hardly 10 LOC(Lines of Code), but the compiler generated too many lines. Well, the compiler creates auto-generated state machines to implement yield
functionality. Let's examine the code that is compiled.
Overall Observation
- The code shown is not a valid C# code: Yes, the code is not valid. We'll use a valid C# code to write programs and logic and if the compiler uses the same valid code, it causes conflicts with the method and variable declarations during the compilation process.
- Some of the methods are decorated with
[CompilerGenerated]
and [DebuggerHidden]
attributes. The compiler generated attribute distinguishes the compiler generated element to a user generated element while the DebuggerHidden
attribute stops the method from debugging.
<DoSomething>d__0
implements three interfaces, IEnumerator<object>
, IEnumerator
, IDisposable
but we have implemented only one Interface. Well the compiler implemented a generic form of IEnumerator
even though we have implemented non-generic form of IEnumerator
. IEnumerator<object>
implies the other two interfaces.
There's a whole lot of magic happening in <DoSomething>d__0
. Let's have a closer look at it.
- Three variables are declared in the method. Namely
<>1__state
, <>2__current
and <i>5__1
. <>1_state
keeps tracking where the code has reached. <>2__current
will return the current value from the iterator. <i>5__1
is just the count variable.
State
and current
are declared as private
while count
is declared as public
. If we use any parameters to in the Iterator block, those variables will also be public
.
- There is an important thing to note here.
DoSomething()
method calls <DoSomething>d__0
which always passes 0
to the constructor. This parameter may vary based on the return type used for the Iterator block. For example, if we use IEnumerable<int>
as return type, then it passes the initial value as "-2
", instead of 0
.
- There are two versions of the
Current
property. They both return <>2__current. MoveNext()
, Reset
, Dispose
are the methods implemented.
- The
Reset()
method always throws NotSupportedException
exception. This is normally as per the C# specification.
- Whatever the code you write in the Iterator block goes in to the
MoveNext()
method. Its always a switch
statement. The values for current
, state
, count
are modified in this method itself. You can observe the condition statement for the switch
is the current state. Based on the current state, the values are modified and returned.
The Iterator doesn't just run on its own. When the Iterator method is called, it is just created. The actual process starts when a call to MoveNext()
is made. The MoveNext()
is called repeatedly until yield break
or yield return
or at the end of the method is reached.
An important thing to note in the Iterators is that you cannot yield from a try
block with a catch
block associate with it or with catch
and finally
blocks. But you can yield from a try
block which only has a finally
block without a catch
.
Till now, we've been returning IEnumerator
from the Iterator block. Let's replace IEnumerator
with IEnumerable
. Also note that the IEnumerator
returned from the
Iterator block earlier is a non-generic version. We'll use the IEnumerable
with a generic form to implement Iterator block once again. Here is the code after modification.
static IEnumerable<string> DoSomething(){
yield return "start";
for (int i = 1; i < 3; i++)
{
yield return i.ToString();
}
yield return "end";
}
Also, let's have our compiled code in place. We'll check what's new with the IEnumerable
implementation. Here is the code:
using System;
using System.Collections;
using System.Collections.Generic;
using System.Diagnostics;
using System.Runtime.CompilerServices;
namespace YieldDemo
{
public class InDepth
{
public InDepth()
{
base..ctor();
}
private static IEnumerable<string> DoSomething()
{
InDepth.<DoSomething>d__0 doSomethingD0 = new InDepth.<DoSomething>d__0(-2);
return (IEnumerable<string>) doSomethingD0;
}
[CompilerGenerated]
private sealed class <DoSomething>d__0 : IEnumerable<string>,
IEnumerable, IEnumerator<string>, IEnumerator, IDisposable
{
private string <>2__current;
private int <>1__state;
private int <>l__initialThreadId;
public int <i>5__1;
string IEnumerator<string>.Current
{
[DebuggerHidden] get
{
return this.<>2__current;
}
}
object IEnumerator.Current
{
[DebuggerHidden] get
{
return (object) this.<>2__current;
}
}
[DebuggerHidden]
public <DoSomething>d__0(int <>1__state)
{
base..ctor();
this.<>1__state = param0;
this.<>l__initialThreadId = Environment.CurrentManagedThreadId;
}
[DebuggerHidden]
IEnumerator<string> IEnumerable<string>.GetEnumerator()
{
InDepth.<DoSomething>d__0 doSomethingD0;
if (Environment.CurrentManagedThreadId == this.<>l__initialThreadId && this.<>1__state == -2)
{
this.<>1__state = 0;
doSomethingD0 = this;
}
else
doSomethingD0 = new InDepth.<DoSomething>d__0(0);
return (IEnumerator<string>) doSomethingD0;
}
[DebuggerHidden]
IEnumerator IEnumerable.GetEnumerator()
{
return (IEnumerator) this.System.Collections.Generic.IEnumerable<System.String>.GetEnumerator();
}
bool IEnumerator.MoveNext()
{
switch (this.<>1__state)
{
case 0:
this.<>1__state = -1;
this.<>2__current = "start";
this.<>1__state = 1;
return true;
case 1:
this.<>1__state = -1;
this.<i>5__1 = 1;
break;
case 2:
this.<>1__state = -1;
++this.<i>5__1;
break;
case 3:
this.<>1__state = -1;
goto default;
default:
return false;
}
if (this.<i>5__1 < 3)
{
this.<>2__current = this.<i>5__1.ToString();
this.<>1__state = 2;
return true;
}
else
{
this.<>2__current = "end";
this.<>1__state = 3;
return true;
}
}
[DebuggerHidden]
void IEnumerator.Reset()
{
throw new NotSupportedException();
}
void IDisposable.Dispose()
{
}
}
}
}
Observations
- At first, the return type of the
DoSomething()
method is changed to IEnumerable<string>
.
- Also, noticeably the parameter passing to the
<DoSomething>d__0()
constructor has changed from 0
to -2
.
- The compiler generated
<DoSomething>d__0
class implements IEnumerable<string>
, IEnumerable
along with IEnumerator<string>
and the others.
- The implementation of the
IEnumerator<int>
in the sealed class implements almost the same as IEnumerator
. The Current
property just has the current value to return, Reset
throws the same exception and MoveNext()
has the same logic.
- A
private
variable <>l__initialThreadId
is added, set in the constructor to the current thread.
Well, what happened? When the instance of IEnumerable<string>
is created, then GetEnumerator()
method is called, which returns an IEnumerator
interface and methods in the IEnumerator
were carried on. Also a readonly access to the collection is turned on. Its the MoveNext()
method that is operated over and over again to return the values lazily.
Why is the initial call to DoSomething
constructor changed from 0
to -2
. Well, these are the codes to tell the compiler what state they are in. Here are the states that the state machine operates on.
0
: indicates the "work is yet to start"(Before) .
-1
: indicates the "work is in progress"(Running) or "work is completed" (After).
-2
: This is specific to IEnumerable
. This is the initial state for IEnumerable
before the call to GetEnumerator
is made.
- Greater than
0
: indicates the resuming state.
Also a point to note here is that -2
state is specific to IEnumerable
. The other states are specific to IEnumerator
. So when the GetEnumerator
method is called by the IEnumerable
, the state will be changed to 0
and so on as it returns IEnumerator
interface.
That's it! At first glance, it looks freaky, but when we slowly started understanding, it has become a lot more easier than what we expected.
Please share your thoughts and reviews on this post! Thanks!