Introduction
The focus of this article is on a hack developed to implement C# 2.0 Iterators with version 1.1 of the .NET Framework. It is simply the result of a fascination with the compiler feature and of mere curiosity to see it work with version 1.1 of the Framework. The hack is based on the Iterators implementation described in chapter 22 of the C# Specification.
Beforehand, however, it should be absolutely clear to the reader that the hack is of no practical value. It is certainly not expected or recommended that anyone actually use it for anything, certainly not with production code, not necessarily because it will not get the job done or for major performance issues, but rather because it is not wise to expose code, especially production code, to a tool that modifies the PE file yet has not been fully tested, as is the case with this tool, even less so when this tool is nothing more than a hack! Moreover, C# has this feature straight out of the box with the next major version, 2.0, of the compiler, so why even bother. The reader has been warned! This article is meant to amuse the reader by showing how C# 2.0 Iterators can be used with version 1.1 of the .NET Framework. It is not meant to provide the reader with anything that can be used for production purposes.
First, a very brief and weak introduction to C# 2.0 Iterators will be given for those of you who are not already familiar with the feature. This article does assume that the reader is very comfortable with the Iterator design pattern and how it is implemented within .NET. Specifically, you should completely understand the Iterator type interface IEnumerator
as well as the Iterator Factory type interface IEnumerable
, and how both of these are used by the foreach
statement to enable generic iteration over an aggregate.
This introduction is then followed by a brief outline of the steps the hack takes to implement Iterators with PE files that target version 1.1 of the .NET Framework, and that can be disassembled to IL code using the ildasm.exe tool that ships with the .NET 1.1 platform SDK. Note, however, that extreme caution should be taken when using PE files generated with compilers other than the C# and VB compilers that ship with Visual Studio .NET 2003.
C# 2.0 Iterators
Go read chapter 22 of the C# Specification! Just kidding, although it is highly recommended. Furthermore, this introduction to C# 2.0 Iterators provides only the main characteristics of the feature and implementation. It omits various other facts that are very important, for example, those dealing with exception handling. Also, the reader should not be surprised to discover that this introduction to Iterators simply paraphrases specific content of chapter 22 of the C# Specification. Finally, with all that said, the reader is better off reading and understanding chapter 22 of the C# Specification and then skipping this section.
C# 2.0 Iterators is simply a compiler feature that greatly facilitates implementation of the Iterator design pattern. Specifically, the language is given the capability to define Iterator types through ordinary functions. These Iterator functions hold iteration logic and, at compile time, are enclosed within nested IEnumerable
/IEnumerator
types. At runtime, instances of these nested types are used to carry out the iteration logic expressed by the corresponding Iterator functions. It is never the case that an Iterator function executes directly within the type in which it is defined. Instead, it runs inside the MoveNext
method of the IEnumerator
type that the compiler creates and nests within the type containing the Iterator function.
An Iterator function is any function that satisfies the following main criteria, although other criteria do exist and are fully documented in chapter 22 of the C# Specification:
- The function has a return type of either
IEnumerator
or IEnumerable
. As such, the function can be used with the foreach
construct.
- Within the function�s code block, the word, not keyword,
yield
is used in context with the keyword return
to yield a value to the caller.
- Within the function�s code block, the word, not keyword,
yield
is used in context with the keyword break
to terminate the iteration.
An example of an Iterator function that yields the values 1 through 5 is:
System.Collections.IEnumerable yieldOneToFive()
{
yield return 1;
yield return 2;
yield return 3;
yield return 4;
yield return 5;
}
The code block of an Iterator function can be referred to as a yield
block. When the compiler encounters an Iterator function, it does the following, among other things, at compile time:
- Creates a
IEnumerable
/IEnumerator
type nested within the type that contains the Iterator function. This nested Iterator type acts out the Iterator function through its IEnumerator.MoveNext
method.
- Replaces the Iterator function body with code that returns an instance of the nested Iterator type. The Iterator function never actually executes in original form, rather always returns an instance of a class that encloses the original code.
The compiler simply does the work that otherwise the programmer would need to do when implementing the Iterator design pattern in traditional fashion. In addition, however, Iterator functions can be used as coroutines. With that said, C# 2.0 Iterators not only facilitate implementation of the Iterator design pattern but also the implementation of other design patterns that benefit from being implemented using coroutines (Wesner Moise, .NET Undocumented - Iterators, Not Just for Iteration). Fortunately for the learning experience, making note of the behavior of Iterator functions will make this evident for those of you who know what coroutines are. The behavior can be described as follows, although this is certainly not an exhaustive description; for that see chapter 22 of the C# specification:
- Whenever a
yield
return
statement is encountered, the value of the statement�s expression is returned to the caller. What ends up happening is that points at which yield
return
statements occur will end up as points within the nested Iterator�s MoveNext
method that set the value of the Iterator�s Current
property to the value of the yield
return
statement�s expression. When this happens, MoveNext
will return the value true
to inform the caller that a next element in the iteration is available.
- The first time an Iterator function is invoked, execution starts at the beginning of the code block. With subsequent invocations, execution begins immediately after the last
yield
return
statement encountered, not at the beginning of the code block. All local variables and parameters maintain state across all invocations of the yield
block, that is, they don�t go out of scope. Here, coroutines come to mind. Like a co-routine, an Iterator function exposes a series of cooperating subroutines, each one picking up execution where the last one left off with all local state maintained, and each one capable of returning a value to the caller. However, it should be clear that at a low level, the CLR does not allow locals to maintain state across method invocations, since locals are stored on the stack and a method's stack is torn down upon exit, nor can one invocation begin execution where the other left off, per se. Nonetheless, Iterators overcome these obstacles by transforming Iterator functions into nested IEnumerator
types that perform as state machines, each state resulting in a different branch of code being executed within the nested Iterator�s MoveNext
method. In addition, these nested types have fields that correspond one to one with the locals of the Iterator functions they enclose, thereby, giving the illusion that local state persists across method invocations. To the programmer�s eye, however, Iterator functions behave like coroutines, regardless of how they are implemented.
- Whenever a
yield
break
statement is encountered, the Iterator block terminates as would an ordinary function via return
. The value false
is returned by MoveNext
to indicate the iteration is over.
With all the above said, the Iterator function shown in the example above will result in a nested Iterator type similar but not identical to:
class _IEnumerable : System.Collections.IEnumerable, System.Collections.IEnumerator
{
bool System.Collections.IEnumerator.MoveNext()
{
switch(_state)
{
case 0: break;
case 1: goto state_1;
case 2: goto state_2;
case 3: goto state_3;
case 4: goto state_4;
default: return false;
}
_current = 1;
_state++;
return true;
state_1:
_current = 2;
_state++;
return true;
state_2:
_current = 3;
_state++;
return true;
state_3:
_current = 4;
_state++;
return true;
state_4:
_current = 5;
_state++;
return true;
}
}
In addition, the Iterator function itself will be rewritten to return an instance of the nested Iterator type above. Perhaps something like:
System.Collections.IEnumerable yieldOneToFive()
{
return new _IEnumerable();
}
C# 2.0 Iterators is a feature that greatly simplifies implementation of the Iterator design pattern. It alleviates the need to explicitly create types that implement the IEnumerator
interface. Furthermore, given they behave as coroutines, Iterator functions can be used to implement other design patterns unrelated to iteration, sometimes solving problems in which the yielded values are of little or no interest, rather what is important is being able to have methods cooperate with one another, that is, talk back to each other and not lose local state as they do so.
This concludes the brief, practically non existent, introduction to C# 2.0 Iterators. Once again, it is highly recommended that the reader understand chapter 22 of the C# Specification. What follows now is a description of a hack that enables seeing this feature in action with version 1.1 of the .NET Framework.
IteratorsHack
To see C# 2.0 Iterators in action with version 1.1 of the Framework, the following steps must be taken, at least when dealing with the hack described herein:
- Add a reference to an assembly called
Iterators
that exposes two types, Iterators.IteratorFunctionAttribute
and Iterators.Yield
. The source code that accompanies this article provides such an assembly.
- Create functions that have a return type of
IEnumerable
. Although the C# Specification states that Iterator functions can have a return type of either IEnumerable
or IEnumerator
, this hack requires that Iterator functions have a return type of IEnumerable
, which is just a factory of IEnumerator
.
- Decorate each Iterator function defined in step 2 above with an
IteratorFunctionAttribute
, this attribute is exposed by the assembly referenced in step 1 above.
- Within each Iterator function defined in step 2 above, make a call to the static method
Yield.Return
at each point at which a value is to be yielded to the caller, this value being the argument supplied to the method. In effect, the C# 2.0 yield
return
statement is simulated by this method, which is defined within the Yield
type exposed by the assembly referenced in step 1 above.
- Within each Iterator function, use the
return
statement with a null
expression to terminate the iteration. In other words, the normal return
statement is used to simulate the C# 2.0 yield
break
statement.
- Compile the assembly in which the Iterator functions have been defined.
- Run the
IteratorsHack
assembly, providing arguments for its two command line parameters. The first parameter holds the path to the source PE file that contains the Iterator functions to be processed, that is, the result of step 6 above. The second parameter holds the path to the destination PE file, that is, the PE file that the hack creates, which encloses the Iterator functions within the nested Iterator types in semi-full accordance to the Iterators implementation section of chapter 22 of the C# Specification.
- Run the PE file the hack creates in step 7 above and cross your fingers it works!
An example of an Iterator function the hack will process:
[Iterators.IteratorFunction]
System.Collections.IEnumerable yieldOneToFive()
{
Iterators.Yield.Return(1);
Iterators.Yield.Return(2);
Iterators.Yield.Return(3);
Iterators.Yield.Return(4);
Iterators.Yield.Return(5);
return null;
}
At a high level, the hack performs the following steps:
- Disassembles the source PE file into an IL text file by running the ildasm.exe tool.
- Loads the content of the IL text file produced in step 1 above into a
StringCollection
. Hereon, this StringCollection
will be referred to as the instruction stream.
- Inspects the instruction stream loaded in step 2 above making note of all Iterator functions.
- Processes the header and definition sections of the instruction stream, adding the necessary nested
IEnumerable
/IEnumerator
types that enclose the Iterator functions captured in step 3 above.
- Creates an IL text file that contains the result of step 4 above.
- Assembles the IL text file created in step 5 above into a PE file using the ilasm.exe tool.
- Runs the PE file created in step 6 above through the peverify.exe tool to ensure that the PE file meets type safety standards and does not have code that will result in an invalid stack state. If the peverify tool determines that something is wrong with the PE file, the hack throws an exception.
Be forewarned that this hack is by no means whatsoever a tokenizer, rather just a dirty parsing routine that relies solely on native string
functions, a few elementary Regular Expressions, and the IL formatting provided by the ildasm.exe to get the job done. Furthermore, the reason this hack operates at the IL level is to simply avoid having to develop such a tokenizer, since the primitive nature of IL instructions makes parsing IL trivial when compared to what is involved when interpreting higher level languages like C# and VB. Without a decent tokenizer, converting an Iterator function into a state machine is much easier to do at the IL level than at the C#/VB level.
What follows now is a brief description of some, but not all, of the processing the hack performs. Specifically, a brief rundown is given of some of the IL modifications made to the Iterator functions in the course of converting them into corresponding nested IEnumerator
types, the main focus being the transition from the Iterator function to the MoveNext
method of the nested Iterator that encloses the function. Given this discussion is IL oriented, it is best if the reader is familiar with the IL stack.
Instance Member Access within Iterator Functions
If the Iterator function is an instance method, the instruction stream is updated so that each occurrence of the ldarg
instruction with an argument of zero (ldarg.0
) is followed by a ldfld
instruction. Why is the ldarg.0
instruction relevant? Well, it is relevant only if the Iterator function in question is not a static method but rather an instance method, in which case the ldarg.0
instruction pushes the current instance pointer onto the stack. The Iterator function will eventually execute inside the MoveNext
method of the nested IEnumerator
type, not within the class in which the Iterator function is defined. This inner class will in turn have a field that points to the instance of the outer class. The ldfld
operation added to the instruction stream will push this pointer onto the stack so that member access to the outer type is uninterrupted, since subsequent instructions will operate on this latter pointer. For example, the following is code defined within an Iterator function that accesses instance members:
ldarg.0
ldfld int32 Namespace1.Class1::_i
After the Iterator function is enclosed within the nested type, the above IL code will look something like:
ldarg.0
ldfld class Namespace1.Class1/Enumerable1::_this
ldfld int32 Namespace1.Class1::_i
The actual code that performs this processing is:
private void processMemberAccess(StringCollection sc,
IteratorMethodInfo imi, string cls)
{
if(imi.IsStatic)
return;
int n = -1;
while(++n < sc.Count)
{
string s = sc[n].Trim();
int ndx = s.IndexOf(": ");
if(ndx != -1)
s = s.Substring(ndx + 1).Trim();
if(s.StartsWith("ldarg.0"))
sc.Insert(++n, "ldfld class " + cls + " " + cls + "/" +
imi.ShortEnumerableName + "::" + THIS_FIELD);
}
}
Read/Write Access to Local Variables and Parameters of Iterator Functions
All instructions that read/write the value of local variables and parameters are processed. All local variables and parameters will eventually be elevated to field status within the nested type that represents the Iterator function. Therefore, the instruction stream is updated so that instructions that read the values of local variables and parameters are translated to instructions that read the values of corresponding fields. Specifically, the instructions we are looking to process are:
ldloc
: this instruction pushes onto the stack the value of a local variable that is identified by the argument provided to the instruction.
ldloca
: this instruction pushes onto the stack the address of a local variable that is identified by the argument provided to the instruction.
ldarg
: this instruction pushes onto the stack the value of an argument that was supplied when the method was invoked, with the argument identified by the argument supplied to the instruction. As was already noted, if this instruction is provided an argument of zero within an instance method, the pointer to the current instance is pushed onto the stack, not the argument supplied to the first parameter of the method.
ldarga
: this instruction pushes onto the stack the address of an argument that was supplied when the method was invoked, with the argument identified by the argument supplied to the instruction.
Basically, whenever one of these instructions is encountered, the corresponding field is located based on the argument supplied to the instruction in question. Then the instruction is replaced with either a ldfld
or ldflda
instruction, respectively depending on whether it is necessary to load the value or address of the field. Finally, the field read instruction is preceded by a ldarg.0
instruction, since instance member instructions require that a pointer to the instance be placed onto the stack prior to being invoked. For example:
ldloc v
ldarg 1
After the Iterator function is enclosed within the nested type, the above IL code will look something like:
ldarg.0
ldfld int32 Namespace1.Class1/Enumerable1::_v
ldarg.0
ldfld int32 Namespace1.Class1/Enumerable1::_p
The actual code that performs this processing is:
private void processLocalRead(StringCollection sc, IteratorMethodInfo imi)
{
int n = -1;
while(++n < sc.Count)
{
string s = sc[n].Trim();
int ndx = s.IndexOf(": ");
string label = string.Empty;
if(ndx != -1)
{
label = s.Substring(0, ndx + 1);
s = s.Substring(ndx + 1).Trim();
}
FieldInfo fi = null;
bool loadAddress = false;
if(Regex.IsMatch(s, @"(?:^ldloc(?:\.| ))"))
fi = getFieldInfo(s, imi, false);
else if(s.StartsWith("ldloca"))
{
fi = getFieldInfo(s, imi, false);
loadAddress = true;
}
else if(Regex.IsMatch(s, @"(?:^ldarg(?:\.| ))") &&
(imi.IsStatic || s.IndexOf("ldarg.0") == -1))
fi = getFieldInfo(s, imi, true);
else if(s.StartsWith("ldarga"))
{
fi = getFieldInfo(s, imi, true);
loadAddress = true;
}
if(fi != null)
{
sc.Insert(n++, label + " ldarg.0");
sc[n] = sc[n].Replace(s, "ldfld" +
(loadAddress ? "a" : string.Empty) + " " + fi.Type + " "
+ imi.EnumerableName + "::" + fi.Name).Replace(label,
string.Empty).Trim();
}
}
}
Once read instructions of local variables and parameters have been processed, all instructions that write to local variables and parameters are also processed, although this latter process is not as straightforward as the former. It is necessary to make sure that write instructions that operate on local variables and parameters are reflected in the corresponding fields of the nested type. In other words, if a local variable or parameter has its value set within the Iterator function, the field of the nested type that corresponds to this local variable or parameter also needs to have its value set when the Iterator function is enclosed. Specifically, the write instructions we are looking to process are:
stloc
: this instruction pops the topmost value off the stack and stores it in the local variable identified by the argument supplied to the instruction.
starg
: this instruction pops the topmost value off the stack and stores it in the parameter identified by the argument supplied to the instruction.
As was the case when reading local variables or parameters, whenever one of these instructions is encountered, the corresponding field is located based on the argument supplied to the instruction in question. However, this is where the difference comes in. Read instructions are always replaced; however, write instructions are never replaced, rather are followed by additional instructions that have the effect of assigning the value of the local variable or parameter to the corresponding field of the nested type via the stfld
instruction. To illustrate:
stloc v
starg 1
After the Iterator function is enclosed within the nested type, the above IL code will look something like:
stloc v
ldarg.0
ldloc v
stfld int32 Namespace1.Class1/Enumerable1::_v
stloc p
ldarg.0
ldloc p
stfld int32 Namespace1.Class1/Enumerable1::_p
So what is going on here? Why is that instructions that write to local variables and parameters are not simply replaced as is the case with instructions that read local variables or parameters? Why is it that the local variables of the Iterator function must also be available within the MoveNext
method of the nested type that encapsulates the Iterator function, despite the fact that this nested type will have fields that correspond to these local variables? Why is it that the parameters of the Iterator function must be converted into local variables of the MoveNext
method of the nested Iterator, despite the fact that this Iterator will have fields that correspond to these parameters? Why these inefficiencies?
There is one simple answer to all of these questions, and that is, what has been described so far is nothing more than a hack, and as such, it takes the easy way out of a problem that would otherwise require a much more sophisticated solution. As was mentioned earlier, instance member access, whether read or write, requires that the instance pointer be pushed onto the stack beforehand. Field read operations expect the topmost value on the stack to be the instance pointer, and that is all they expect to see. On the other hand, the field write instruction stfld
expects the topmost value on the stack to be the value that the field is going to be set to, underneath which must lie the instance pointer. In other words, the stfld
instruction pops two values off the stack, the topmost value being that which will be assigned to the field in question, the second being the instance pointer.
So the problem is where to place the ldarg.0
instruction, which pushes onto the stack the instance pointer, within the instruction stream. Without much thought, one may say to place the instruction immediately above the stfld
instruction, as is the case when processing field read instructions. Well, obviously, this is not going to work, because doing so would result in the instance pointer being he topmost stack value when in fact it should be the second to topmost stack value. What about placing the ldarg.0
instruction above the instruction that precedes the stfld
instruction? Well, this is even worst, since there is absolutely no guarantee that the instruction that precedes the stfld
instruction does not itself expect and consume values from the stack. For example, if the add
instruction precedes the stfld
instruction, placing the ldarg.0
instruction immediately above the add
instruction will result in the add
instruction popping off the stack the instance pointer and the value beneath the instance pointer and adding the two values together, which is clearly not the intention. Furthermore, it may be the case that the stfld
instruction is the target of a branch instruction, in which case the instruction that immediately precedes it will most likely be completely unrelated.
To completely replace local write instructions with stfld
requires an algorithm that not only keeps track of the stack state, which means that the algorithm has to know how each and every IL instruction affects the stack, but that also keeps track of all code paths. Certainly, the hack that has been described so far is light years away from such sophistication. Therefore, it simply makes sure that all local variables and parameters of the Iterator function are present as local variables of the MoveNext
method of the nested type that encloses the Iterator function. Local variable write instructions remain in tact, parameter write instructions are replaced with local variable write instructions, and finally, local variables that have been assigned a value are immediately assigned to their corresponding fields.
The actual code that performs this processing is:
private void processLocalWrite(StringCollection sc, IteratorMethodInfo imi)
{
int n = -1;
while(++n < sc.Count)
{
string s = sc[n].Trim();
int ndx = s.IndexOf(": ");
if(ndx != -1)
s = s.Substring(ndx + 1).Trim();
FieldInfo fi = null;
bool starg = false;
if(s.StartsWith("stloc"))
fi = getFieldInfo(s, imi, false);
else if(s.StartsWith("starg"))
{
fi = getFieldInfo(s, imi, true);
starg = true;
}
if(fi == null)
continue;
string local = fi.LocalName;
if(starg)
sc[n] = sc[n].Replace(s, "stloc " + local);
sc.Insert(++n, "ldarg.0");
sc.Insert(++n, "ldloc " + local);
sc.Insert(++n, "stfld " + fi.Type + " " +
imi.EnumerableName + "::" + fi.Name);
}
}
Yielding Values From Iterator Functions
Now the heart of the matter has been reached, that is, the point at which the instruction stream is updated in response to all yield
return
statements. Whenever the method Yield.Return
, which simulates the C# 2.0 yield
return
construct, is encountered within the Iterator function, the argument supplied will be returned to the caller through the IEnumerator.Current
property. Specifically:
- The
call
instruction that invokes the Yield.Return
method is replaced with a stloc
instruction that stores the value of the argument into a local variable.
- The value of this local variable is assigned to the
_current
field of the Iterator, this field holding the value of the Iterator�s Current
property of course.
- The value
true
is stored in the local variable that holds the result of the MoveNext
method.
- The value of the Iterator�s
_state
field is incremented by one so that next time MoveNext
is invoked, it can query this field to determine where within the code block to resume execution.
- Execution is unconditionally transferred to the instruction labeled
_EXIT_ITERATOR_BLOCK
, which simply begins the process of exiting the MoveNext
method.
- A dummy instruction is added and labeled the target of execution for the state reached in step 4 above. Therefore, next time
MoveNext
is invoked, it will begin execution at this instruction.
- The necessary instructions are added to the beginning of the
MoveNext
code block so that the state reached in step 4 above is taken into account and branched into, next time MoveNext
is invoked. In other words, the state machine is updated accordingly.
To illustrate:
ldc.i4.1
box
call void [Iterators]Iterators.Yield::Return(object)
ldc.i4.1
box
call void [Iterators]Iterators.Yield::Return(object)
After the Iterator function is enclosed within the nested type, the above IL code will look something like:
ldc.i4 1
ldarg.0
ldfld int32 Namespace1.Class1/Enumerable1::_state
beq _STATE_1
ldc.i4.1
box
stloc current
ldarg.0
ldloc current
stfld object Namespace1.Class1/Enumerable1::_current
ldc.i4.1
stloc result
ldarg.0
ldc.i4 1
stfld int32 Namespace1.Class1/Enumerable1::_state
br _EXIT_ITERATOR_BLOCK
STATE_1: nop
The actual code, which is in need of refactoring, that performs this processing is:
private int processMethodYields(StringCollection sc, int index,
ref int tryBlockIndex, IteratorMethodInfo imi,
ref int state, int instrIndex)
{
bool tryBlock = sc[index == 0 ? 0 : index - 1].Trim().StartsWith(".try");
bool finallyBlock =
sc[index == 0 ? 0 : index - 1].Trim().StartsWith("finally");
bool validYldBlock = index == 0 || tryBlock;
string tryBlockLabel = string.Empty;
int tryInstrIndex = 0;
if(tryBlock)
{
tryBlockLabel = TRY_BLOCK_LABEL + (tryBlockIndex++).ToString();
sc.Insert(++index, tryBlockLabel + ": nop");
tryInstrIndex = index + 1;
}
else if(finallyBlock)
{
int i = index;
while(!sc[++i].EndsWith(" endfinally"));
string endFinallyLabel = sc[i].Substring(0, sc[i].IndexOf(":")).Trim();
sc.Insert(++index, "ldc.i4.1");
sc.Insert(++index, "ldloc " + imi.MoveNextResultLocal);
sc.Insert(++index, "beq " + endFinallyLabel);
}
while(true)
{
string s = sc[++index].Trim();
if(s == "{")
index = processMethodYields(sc, index,
ref tryBlockIndex, imi, ref state, instrIndex);
else if(s.StartsWith("}"))
return index;
else if(validYldBlock && Regex.IsMatch(s,
@"(?:call +void +\[Iterators\]Iterators\.Yield::Return\(object\))"))
{
sc[index] = s.Substring(0, s.IndexOf("call ")) +
"stloc " + imi.CurrentLocal;
sc.Insert(++index, "ldarg.0");
sc.Insert(++index, "ldloc " + imi.CurrentLocal);
sc.Insert(++index, "stfld object " +
imi.EnumerableName + "::" + CURRENT_FIELD);
sc.Insert(++index, "ldc.i4.1");
sc.Insert(++index, "stloc " + imi.MoveNextResultLocal);
sc.Insert(++index, "ldarg.0");
sc.Insert(++index, "ldc.i4 " + (++state).ToString());
sc.Insert(++index, "stfld int32 " +
imi.EnumerableName + "::" + STATE_FIELD);
sc.Insert(++index, (tryBlock ? "leave" : "br") +
" " + EXIT_ITERATOR_BLOCK_LABEL);
string stateLabel = STATE_LABEL + state.ToString();
sc.Insert(++index, stateLabel + ": nop");
sc.Insert(instrIndex++, "ldc.i4 " + (state).ToString());
index++;
tryInstrIndex++;
sc.Insert(instrIndex++, "ldarg.0");
index++;
tryInstrIndex++;
sc.Insert(instrIndex++, "ldfld int32 " +
imi.EnumerableName + "::" + STATE_FIELD);
index++;
tryInstrIndex++;
sc.Insert(instrIndex++,
"beq " + (tryBlock ? tryBlockLabel : stateLabel));
index++;
tryInstrIndex++;
if(!tryBlock)
continue;
sc.Insert(tryInstrIndex++, "ldc.i4 " + (state).ToString());
index++;
sc.Insert(tryInstrIndex++, "ldarg.0");
index++;
sc.Insert(tryInstrIndex++, "ldfld int32 " +
imi.EnumerableName + "::" + STATE_FIELD);
index++;
sc.Insert(tryInstrIndex++, "beq " + stateLabel);
index++;
}
}
}
Terminating Execution of Iterator Functions
The instruction stream is updated so that all ret
(return
) instructions are replaced by a series of instructions that do the following:
- Pop the topmost value off the stack. Since the Iterator function returns a value of type
IEnumerable
, at any point at which the ret
instruction is encountered, it is guaranteed that one, and only one, value is on the stack, and, furthermore, this value is either null
or of type IEnumerable
. However, since the Iterator function is eventually converted into the MoveNext
method of the nested Iterator that encapsulates the Iterator function, and, furthermore, since the MoveNext
method returns a value of type Boolean
, whatever value that is on the stack is popped off, since it is not legal to return a value of type IEnumerable
from a function that returns a Boolean
.
- Store the value
false
in the local variable that holds the result of the MoveNext
method. Recall that exiting an Iterator function by any means other than a yield
return
statement signifies that the Iterator function has no more values to yield, that is, the end of the iteration has been reached and, therefore, the MoveNext
method will return false
to inform the client that such is the case.
- The value -1 is stored in the
_state
field of the Iterator so that any subsequent calls to the MoveNext
method will not do anything other than exit immediately.
- Unconditionally branch to the instruction labeled
_EXIT_ITERATOR_BLOCK
, which simply begins the process of exiting the MoveNext
method.
To illustrate:
ret
After the Iterator function is enclosed within the nested type, the above IL code will look something like:
pop
ldc.i4.0
stloc result
ldarg.0
ldc.i4 -1
stfld int32 Namespace1.Class1/Enumerable1::_state
br _EXIT_ITERATOR_BLOCK
The actual code that performs this processing is:
private void processMethodReturns(StringCollection sc, IteratorMethodInfo imi)
{
int n = -1;
string result = imi.MoveNextResultLocal;
while(++n < sc.Count)
{
string s = sc[n].Trim();
int ndx = s.IndexOf(": ");
string label = string.Empty;
if(ndx != -1)
{
label = s.Substring(0, ndx + 1);
s = s.Substring(ndx + 1).Trim();
}
if(s != "ret")
continue;
sc.Insert(n++, (label == string.Empty ? label : label + " ") + "pop");
sc.Insert(n++, "ldc.i4.0");
sc.Insert(n++, "stloc " + result);
sc.Insert(n++, "ldarg.0");
sc.Insert(n++, "ldc.i4 -1");
sc.Insert(n++, "stfld int32 " + imi.EnumerableName + "::" + STATE_FIELD);
sc[n] = "br " + EXIT_ITERATOR_BLOCK_LABEL;
}
}
Here we conclude the discussion of some of the IL hacking performed by this hack.
Demos
Two versions of the same demo are provided with the source code of this article, one written in C#, the other in VB. The demo is a simple Windows application that demonstrates the use of a recursive Iterator by populating a TreeView
control with the contents of a chosen directory. Be careful how deep the directory you choose is; there is no Cancel button! Also, it goes without saying that the demo is trivial and works only if it is a PE file that has been run through the IteratorsHack
process. For convenience, each version of the demo has such a PE file located directly in the root folder of the demo project in question.
The C# version of the recursive Iterator function is:
[IteratorFunction]
private IEnumerable getDirectories(DirectoryInfo dir)
{
foreach(DirectoryInfo dir1 in dir.GetDirectories())
{
Yield.Return(dir1);
foreach(DirectoryInfo dir2 in getDirectories(dir1))
{
Yield.Return(dir2);
}
}
return null;
}
The VB version of the recursive Iterator function is:
<IteratorFunction()> _
Private Function getDirectories(ByVal dir As DirectoryInfo) As IEnumerable
For Each dir1 As DirectoryInfo In dir.GetDirectories()
Yield.Return(dir1)
For Each dir2 As DirectoryInfo In getDirectories(dir1)
Yield.Return(dir2)
Next
Next
End Function
Final Concluding Thoughts
C# 2.0 Iterators is a powerful programming feature that facilitates implementation of the Iterator design pattern, although there are many other uses for Iterators, ones that have nothing to do with iteration per se. The IteratorsHack
is a less than suboptimal hack that enables seeing the Iterators feature in action with version 1.1 of the Framework. This is possible only because Iterators is a feature of the C# 2.0 compiler, not a feature of the .NET 2.0 Framework (CLR/BCL), unlike, for example, Generics. Unfortunately, the next version of VB, which targets version 2.0 of the .NET Framework, will not have the Iterators compiler feature, only C# will. Indeed, VB programmers will have to wait, despite the universal, unquestionable, and plain obvious fact that VB is Int32.MaxValue
times the language C# will ever be! My sincere apologies, but I just had to throw that cheap shot in there.
Thanks for taking the time to read the article and I hope you enjoyed it as much as I enjoyed writing it. If there are any inaccuracies, a definite possibility since I am no expert of any field, please disclose. Peace out homies and until next time, God willing of course!
References