Introduction
The Microsoft .NET technology in C# version 2.0 introduced a new language feature - the "yield return
" keyword used in iterating across collections (and not only collections). This article shows how to create a similar iterating technique in native C++. Unlike most (recent) C++ libraries, it does not require highly complicated code to be used by the end-developer.
Background
The "yield return
" iterator is a language feature that was created for one reason: simplicity. It is generally much easier to iterate across whole collecting, storing all context needed in local variables, rather than crafting a complicated, custom iterator object that stores its state across subsequent retrieval operations.
A sample code that uses yield return
iterator is presented below:
class Program
{
static void Main(string[] args)
{
foreach (int i in GetRandom())
Console.WriteLine(i);
}
public static IEnumerable<int /> GetRandom()
{
for (int i = 0; i < 5; i++)
yield return Random.Get();
}
}
This program implements a collection of five random numbers. However, the iterating function seems never to return, or maybe to return five times!
Going Native
C++ of course has no foreach
keyword, but consider the following code:
SomeCollection C;
Enumerator <Somecollection, int> E(C);
for (int i; E.Get(i); )
cout << i << endl;
It does not look any more complicated than the C# one.
The only prerequisite for the SomeCollection
is to have an Enumerate
method returning, in this case, an integer. Furthermore, we can use the following form of the Enumerate
function:
int SomeCollection::Enumerate()
{
for (int i=0; i<4; i++)
yield_return(rand());
return rand();
}
Simple return indicates that we are finished with iterating (in fact, in C# it should be so too) whilst yield_return
sets an intermediate result.
The sample provided may not illustrate the whole power of the tool, but try it on iterating a hashtable or some sophisticated tree structure. Note that you can even use recursion - a thing that makes iterating tree structures a piece of cake.
Using the Code
The code contained in this article will allow you to write a program exactly as you saw above. No extra bureaucracy is needed, no long template names nor any other inconveniences. Well, there is one inconvenience of a different nature - you code will inevitably include <windows.h>
unless some wrapper is applied and system-dependent code moved to a dedicated compilation unit.
Fibers - The Power Unknown
One of the highly powerful yet quite underestimated features in Windows are fibers (probably because no such feature is to be found anywhere in UNIX-base systems).
A fiber is just a context for thread. It includes a separate stack, fiber-local-storage and a place to store register values upon switch. Fiber switches occur in user mode, so they have very low overhead. And last, but definitely not least, fibers are scheduled by the user! That means, that fiber can be used as a very powerful tool for scheduling deferred procedure calls, which is exactly what we need.
Creating a fiber requires nothing but to call CreateFiber
with some context information. It does not return a handle, but a user pointer to the fiber to which our thread can explicitly switch. Every thread is initially not a fiber, so we must call ConvertThreadToFiber
to create the first fiber and set fiber custom data (arbitrary pointer value provided by the caller of ConvertThreadToFiber
).
Switching is trivial and it is just a call to SwitchToFiber(PVOID fiber)
.
So, let's start the fun!
Implementation of the Iterator
Basically, to build the program that works as presented we have to implement two things: the yield_return
function and the Get()
function of the Enumerator
class. The class header for the Enumerator
is as follows:
template <class T>
class Enumerator
{
public:
Enumerator(T &owner) : Owner(&owner), Fiber(0), ToolFiber(0) {}
Enumerator(T *owner) : Owner(owner), Fiber(0), ToolFiber(0) {}
~Enumerator();
bool Get(R &result);
private:
static void Return(const R &value);
template <class R>
friend void yield_return(const R &result);
virtual void *ValidatePointer(void *_ptr) { return *(void **)_ptr; }
static void Enumerate(Enumerator *This);
void Prepare();
R *pResult; T *Owner; void *Fiber, *ToolFiber; };
Let's take a look at a rather boring preparation step:
template <class T, class R>
void Enumerator<T, R>::Prepare()
{
if (!Fiber)
{
Fiber=GetCurrentFiber();
__try
{
ValidatePointer(Fiber);
}
__except(EXCEPTION_EXECUTE_HANDLER)
{
Fiber=ConvertThreadToFiber(this);
}
}
if (!ToolFiber) ToolFiber=CreateFiber(4096, (LPFIBER_START_ROUTINE)Enumerate, this);
}
Enumeration function:
template <class T, class R>
void Enumerator<T, R>::Enumerate(Enumerator *This)
{
void *Fiber=This->Fiber; *This->pResult=This->Owner->Enumerate();
This->Owner=NULL; SwitchToFiber(Fiber); }
This function is called from the ToolFiber
context, so SwitchToFiber(Fiber)
is what should be there; if it did not switch back before return, the thread would terminate!
The almighty Get:
template <class T, class R>
bool Enumerator<T, R>::Get(R &result)
{
if (!Owner) return false;
pResult=&result; Prepare(); if (ToolFiber) {
SwitchToFiber(ToolFiber); if (!Owner) {
DeleteFiber(ToolFiber); ToolFiber=Fiber=0; }
}
return true; }
Setting the result:
template <class T, class R>
void Enumerator<T, R>::Return(const R &value)
{
Enumerator *E=(Enumerator *)GetFiberData(); if (E->Fiber!=GetCurrentFiber()) {
*E->pResult=value; SwitchToFiber(E->Fiber); }
}
Now we have our mock-return - it is handled by the SwitchToFiber(E->Fiber)
in Return
function - it transfers control back to the original fiber, that can read the results using pResult
.
After everything went ok, we are clean - the fiber has ended and everything is beautiful. But what if we abandon our iterator?
There is no other solution but to set control back to the ToolFiber
and iterate until the collection is over possibly explicitly telling the iterating procedure that its results will be disregarded anyway, so it can just return anything. In my solution, the enumerator is just told to iterate. However, the Fiber
field is set to ToolFiber
, so no fiber switching takes place and the result is not copied. That can improve the performance of these final iterations.
template <class T, class R>>
Enumerator<T, R>::~Enumerator()
{
if (ToolFiber)
{
Fiber=ToolFiber; SwitchToFiber(ToolFiber); DeleteFiber(ToolFiber);
ToolFiber=Fiber=0;
}
}
Now the only thing left is to civilize the way of returning the intermediate results. Calling Enumerator<Type, Type>::Return(value)
is a whole lot more complicated than yield_result(value)
. So let's make things as simple as they are:
template <class R>
inline void yield_return(const R &result)
{
Enumerator<int, R>::Return(result); }
Final Note
Wherever you can find use for pseudoiterating objects or the iterator itself is complicated, this method fits. It has some overhead (compared to classic iterators), but its true power is in its simplicity of use. If you apply this code in your software, please let me know - I'll be glad to have helped someone.
Post Scriptum - A Recent Idea
I came up with an idea, that EVERY value should be returned by yield return
and while the iterating function itself would return void
. This would in fact be the only way to return an empty collection, would make the code more homogenous and simplify detection of the end of collection.
A new Enumerator::Enumerate
function would then look like this:
template <class T, class R>
void Enumerator<T, R>::Enumerate(Enumerator *This)
{
void *Fiber=This->Fiber; This->Owner->Enumerate();
This->Owner=NULL; SwitchToFiber(Fiber); }
A minor change to Enumerator::Get
would also be introduced:
SwitchToFiber(ToolFiber); if (!Owner) {
DeleteFiber(ToolFiber); ToolFiber=Fiber=0; return false;
}
- Michal Zientkiewicz aka Mike65536