Yield Return Iterator for Native C++ Using Fibers

Mike65536

4.82/5 (19 votes)

4 Sep 2007CPOL5 min read

303

Yield Return Iterator for Native C++ Using Fibers

Download source - 991 B

Introduction

The Microsoft .NET technology in C# version 2.0 introduced a new language feature - the "yield return" keyword used in iterating across collections (and not only collections). This article shows how to create a similar iterating technique in native C++. Unlike most (recent) C++ libraries, it does not require highly complicated code to be used by the end-developer.

Background

The "yield return" iterator is a language feature that was created for one reason: simplicity. It is generally much easier to iterate across whole collecting, storing all context needed in local variables, rather than crafting a complicated, custom iterator object that stores its state across subsequent retrieval operations.

A sample code that uses yield return iterator is presented below:

C++

class Program
{
        static void Main(string[] args)
        {
            foreach (int i in GetRandom())
                Console.WriteLine(i);
        }

        public static IEnumerable<int /> GetRandom()
        {
            for (int i = 0; i < 5; i++)
                yield return Random.Get();
        }
}

This program implements a collection of five random numbers. However, the iterating function seems never to return, or maybe to return five times!

Going Native

C++ of course has no foreach keyword, but consider the following code:

C++

SomeCollection C;
Enumerator <Somecollection, int> E(C);
for (int i; E.Get(i); )
    cout << i << endl;

It does not look any more complicated than the C# one.
The only prerequisite for the SomeCollection is to have an Enumerate method returning, in this case, an integer. Furthermore, we can use the following form of the Enumerate function:

C++

int SomeCollection::Enumerate()
{
    for (int i=0; i<4; i++)
        yield_return(rand());
    return rand();
}

Simple return indicates that we are finished with iterating (in fact, in C# it should be so too) whilst yield_return sets an intermediate result.

The sample provided may not illustrate the whole power of the tool, but try it on iterating a hashtable or some sophisticated tree structure. Note that you can even use recursion - a thing that makes iterating tree structures a piece of cake.

Using the Code

The code contained in this article will allow you to write a program exactly as you saw above. No extra bureaucracy is needed, no long template names nor any other inconveniences. Well, there is one inconvenience of a different nature - you code will inevitably include <windows.h> unless some wrapper is applied and system-dependent code moved to a dedicated compilation unit.

Fibers - The Power Unknown

One of the highly powerful yet quite underestimated features in Windows are fibers (probably because no such feature is to be found anywhere in UNIX-base systems).
A fiber is just a context for thread. It includes a separate stack, fiber-local-storage and a place to store register values upon switch. Fiber switches occur in user mode, so they have very low overhead. And last, but definitely not least, fibers are scheduled by the user! That means, that fiber can be used as a very powerful tool for scheduling deferred procedure calls, which is exactly what we need.

Creating a fiber requires nothing but to call CreateFiber with some context information. It does not return a handle, but a user pointer to the fiber to which our thread can explicitly switch. Every thread is initially not a fiber, so we must call ConvertThreadToFiber to create the first fiber and set fiber custom data (arbitrary pointer value provided by the caller of ConvertThreadToFiber).

Switching is trivial and it is just a call to SwitchToFiber(PVOID fiber).

So, let's start the fun!

Implementation of the Iterator

Basically, to build the program that works as presented we have to implement two things: the yield_return function and the Get() function of the Enumerator class. The class header for the Enumerator is as follows:

C++

template <class T>
class Enumerator
{
public:
    Enumerator(T &owner) : Owner(&owner), Fiber(0), ToolFiber(0) {}
    Enumerator(T *owner) : Owner(owner), Fiber(0), ToolFiber(0) {}
    ~Enumerator();
    bool Get(R &result); // the core function

private:
    static void Return(const R &value); // a function that stores result
                                        // from yield_return

    template <class R>
    friend void yield_return(const R &result); // the yield_return itself
    
    virtual void *ValidatePointer(void *_ptr) // virtual just to force evaluation
    { return *(void **)_ptr; }

    static void Enumerate(Enumerator *This); // pointers to members in C++
                                             // are nasty, so let's make it a 
                                             // little bit more comfortable

    void Prepare(); // preparation step

    R *pResult; // result, set to &result in Get
    T *Owner;   // object we iterate across
    void *Fiber, *ToolFiber; // fibers
};

Let's take a look at a rather boring preparation step:

C++

template <class T, class R>
void Enumerator<T, R>::Prepare()
{
    if (!Fiber)
    {
        Fiber=GetCurrentFiber();
        __try
        {
            ValidatePointer(Fiber);
        }
        __except(EXCEPTION_EXECUTE_HANDLER)
        {
            Fiber=ConvertThreadToFiber(this);
        }
    }
    if (!ToolFiber) // and our enumeration fiber goes here!
        ToolFiber=CreateFiber(4096, (LPFIBER_START_ROUTINE)Enumerate, this);
}

Enumeration function:

C++

template <class T, class R>
void Enumerator<T, R>::Enumerate(Enumerator *This)
{
    void *Fiber=This->Fiber; // store the fiber - it may change
    *This->pResult=This->Owner->Enumerate();
    This->Owner=NULL; // indicate end of collection
    SwitchToFiber(Fiber); // return to the original fiber
}

This function is called from the ToolFiber context, so SwitchToFiber(Fiber) is what should be there; if it did not switch back before return, the thread would terminate!

The almighty Get:

C++

template <class T, class R>
bool Enumerator<T, R>::Get(R &result)
{
    if (!Owner) 		// no object to iterate - there never was one
                		// or iteration has ended
        return false;

    pResult=&result; 	// set result pointer
    Prepare(); 		// prepare
    if (ToolFiber) 	// we are ready
    {
        SwitchToFiber(ToolFiber); 	// and let's start (or continue)
                                  	// the enumeration!
        if (!Owner) 		// the way of the fiber to indicate end of iteration
        {
            DeleteFiber(ToolFiber); 	// delete the tool fiber
            ToolFiber=Fiber=0; 	// cleanup
        }
    }

    return true; // success!
}

Setting the result:

C++

template <class T, class R>
void Enumerator<T, R>::Return(const R &value)
{
    Enumerator *E=(Enumerator *)GetFiberData(); 	// get the context, 
						//stored in fiber data
    if (E->Fiber!=GetCurrentFiber()) 	// check, if it is the same fiber - 
                                        	// Fiber field is spoofed at cleanup
    {
        *E->pResult=value; 			// set the result
        SwitchToFiber(E->Fiber); 		// return to main fiber
    }
}

Now we have our mock-return - it is handled by the SwitchToFiber(E->Fiber) in Return function - it transfers control back to the original fiber, that can read the results using pResult.

After everything went ok, we are clean - the fiber has ended and everything is beautiful. But what if we abandon our iterator?
There is no other solution but to set control back to the ToolFiber and iterate until the collection is over possibly explicitly telling the iterating procedure that its results will be disregarded anyway, so it can just return anything. In my solution, the enumerator is just told to iterate. However, the Fiber field is set to ToolFiber, so no fiber switching takes place and the result is not copied. That can improve the performance of these final iterations.

C++

template <class T, class R>>
Enumerator<T, R>::~Enumerator()
{
    if (ToolFiber)
    {
        Fiber=ToolFiber; // spoofing the fiber pointer
        SwitchToFiber(ToolFiber); // switch to enumerator
        DeleteFiber(ToolFiber); //  delete the enumerator

        ToolFiber=Fiber=0;
    }
}

Now the only thing left is to civilize the way of returning the intermediate results. Calling Enumerator<Type, Type>::Return(value) is a whole lot more complicated than yield_result(value). So let's make things as simple as they are:

C++

template <class R>
inline void yield_return(const R &result)
{
    Enumerator<int, R>::Return(result); // collection type is not
                                        // needed here, so we put 'int'
}

Final Note

Wherever you can find use for pseudoiterating objects or the iterator itself is complicated, this method fits. It has some overhead (compared to classic iterators), but its true power is in its simplicity of use. If you apply this code in your software, please let me know - I'll be glad to have helped someone.

Post Scriptum - A Recent Idea

I came up with an idea, that EVERY value should be returned by yield return and while the iterating function itself would return void. This would in fact be the only way to return an empty collection, would make the code more homogenous and simplify detection of the end of collection.

A new Enumerator::Enumerate function would then look like this:

C++

template <class T, class R>
void Enumerator<T, R>::Enumerate(Enumerator *This)
{
    void *Fiber=This->Fiber; 	// store the fiber - it may change
    This->Owner->Enumerate();
    This->Owner=NULL; 		// indicate end of collection
    SwitchToFiber(Fiber); 		// return to the original fiber
}

A minor change to Enumerator::Get would also be introduced:

C++

SwitchToFiber(ToolFiber); 	// and let's start (or continue)
                          	// the enumeration!
if (!Owner) 		// the way of the fiber to indicate end of iteration
{
    DeleteFiber(ToolFiber);	// delete the tool fiber
    ToolFiber=Fiber=0; 	// cleanup
    return false;
}

- Michal Zientkiewicz aka Mike65536

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)