A Fiber Class (and Friends)

Joseph M. Newcomer

4.91/5 (23 votes)

1 Jul 2008CPOL33 min read

453

Fibers are a lightweigtht cooperative threading mechanism, or a coroutine mechanism, depending on how you look at them. Besides providing a very efficient thread-like implementation, fibers allow you to provide continuations, that is, computations which perform some function, suspend themselves, a

Download source - 25.2 KB

Introduction

This project came about when I encountered a client that needed a fiber scheduler. The problem arose because of a need to convert non-preemptive “threads” from an existing programming domain to Windows. [This was part of my teaching, not part of a contract, and I presented this solution in class; therefore, I was not actually paid to write it, and it remains mine. Just so you know that this wasn’t proprietary work.]

The classes I designed were intended to illustrate a round-robin fiber scheduler. You can feel free to adapt these classes to any application domain you want, using any discipline you want. For example, you could create a priority-based scheduler if you needed one (more on this later).

This uses a subset of the capabilities of fibers to implement a solution. Fibers are a good deal more general than used in this solution, and I'll talk about some of these issues later.

So what's a fiber?

A fiber is a non-preemptive multiplexing mechanism. In fact, fibers are more like an implementation of coroutines, a technology dating back to the 1960s. A fiber consists of a register set and a stack, so is much like a thread, but it is not a thread. A fiber has no identity unless it runs inside a thread. When a fiber is suspended, it transfers control to another fiber by explicitly selecting and naming (by supplying the address of) the target fiber. The operation that performs this is called SwitchToFiber. The fiber will continue to run in that thread until another SwitchToFiber is executed.

A fiber, unlike a thread, never exits. If a fiber actually exits, by returning from its top-level fiber function, the thread that is executing it will implicitly call ExitThread and terminate. Note that this call on ExitThread means the thread does not itself exit via its top-level function, and consequently cleanup that must be done there will not be performed. Consequently, you should consider the actual exit of a thread to be a serious programming error.

The MSDN lists a set of fiber-related functions, but the documentation is incomplete, confusing, sometimes misleading, and in one case, I believe overtly wrong. This essay is going to attempt to clarify these issues, although I don't have enough information to resolve some of the ambiguities or fill in the holes (although I have passed this off to my MVP contact in the hopes he can get some answers for me).

The fiber functions

The fiber functions are listed under the topic "Process and Thread functions [base]". Apparently, "fibers" is not considered worthy of mention in the title.

`ConvertFiberToThread`	This undoes the action of the `ConvertThreadToFiber(Ex)` operations and frees up resources that have been used in the original conversion.
`ConvertThreadToFiber`	Converts the current thread into a fiber.
`ConvertThreadToFiberEx`	Converts the current thread into a fiber, with some additional options.
`CreateFiber`	Allocates a fiber object, assigns it a stack, and sets up the EIP to be at the start address specified. A user-specified `PVOID` parameter will be passed to the fiber function.
`CreateFiberEx`	The same as `CreateFiber`, but provides additional options.
`DeleteFiber`	Deletes a fiber created by `CreateFiber(Ex)`. This is the way to get rid of a fiber and its stack, but it has certain limitations that must be considered.
`FiberProc`	The `typedef` for this is `typedef void (CALLBACK * LPFIBER_START_ROUTINE)(PVOID parameter)`
`FlsAlloc`	Allocates a fiber-local-storage (FLS) index; to delete it, use `FlsFree`.
`FlsCallback`	A function established by `FlsAlloc` that is called when the thread running a fiber terminates, or a fiber is deallocated.
`FlsFree`	Frees a fiber-local-storage (FLS) index allocated by `FlsAlloc`.
`FlsGetValue`	Retrieves the current FLS value stored at a specific index.
`FlsSetValue`	Sets the current FLS value stored at a specific index.
`GetCurrentFiber`	Returns the address of the current fiber.
`SwitchToFiber`	Switches control from one fiber to another.

There are several steps involved in creating a running fiber environment.

First, some fibers have to be created. When a thread is created, it is implicitly scheduled to run. When a fiber is created, there is no mechanism that makes it spontaneously able to run. Instead, the fiber remains "suspended" until there is an explicit SwitchToFiber to transfer control from a currently-running fiber to one of the other fibers.

Since the threads are what are actually running, only a running thread can do a SwitchToFiber. But for this to work correctly, it must be from an existing fiber, to an existing fiber. A thread is not a fiber, so although it could create a bunch of fibers, it can't cause them to run.

This is solved by using either ConvertThreadToFiber or ConvertThreadToFiberEx calls. These calls create a "fiber work area" in the current thread, thus imbuing the thread with fiberness, or to say it in another way, "the thread has the fiber nature". Now, the underlying thread scheduler will schedule threads, including the thread which now has the fiber nature. Essentially, upon return from the ConvertThreadToFiber(Ex) call, you find yourself running in a fiber, which is running inside the original thread. This fiber is now free to do a SwitchToFiber to any other fiber.

At some point, when a fiber is no longer in use, you must call DeleteFiber on the fiber object. There are some restrictions on how this can be done, which will be discussed later. But ultimately, every fiber you create must be deleted.

When the ConvertThreadToFiber(Ex) calls are performed, they, too, return a fiber object reference. When you call SwitchToFiber to this fiber, you are "back in the thread", and you then call ConvertFiberToThread to release the fiber-specific resources that had been allocated to the thread.

That's basically it. It isn't everything, and what I've described here represents a subset of the full capabilities of the fiber universe, but I'll elaborate on these a bit later.

Why use fibers?

Speed. Simplicity. Elimination of the need for synchronization.

Speed

Fibers are lightweight. The operation to switch from one fiber to another is 32 instructions, involving no kernel call at all. On a 2GHz machine, this means that you are looking at a swap as fast as 8ns (worst case, if none of the data is in the cache, could be several tens of nanoseconds).

In case you're wondering where those numbers come from, 32 instructions would take, on a 2GHz machine, 16ns, except that the Pentium 4 is a "pipelined superscalar" architecture and can issue two integer instructions per clock cycle, coming up with the 8ns value. Instruction pipelining and pre-fetching tend to mask instruction fetch times. However, in the worst case when nothing is in the cache, the speed depends upon the pattern of cache hits and replacements, and therefore, is limited by the memory cycle speed, which is platform-specific.

This means that if you have a lot of little things to do, fibers may be a better choice than threads, because threading involves kernel calls and invocation of the scheduler. Fibers execute within the time slice of the thread in which they are running, so swapping among many short-computation fibers does not involve any potential blocking in the kernel. The elimination of the need for synchronization in many cases also eliminates the need for the scheduler to come into play.

Simplicity

The simplicity from fibers resembles the simplicity of threading in many ways: instead of having to write complex interlaced code that tries to do several things at once, you write simple code that does only one thing, and use a lot of different fibers to do the different things. This allows you to concentrate on doing one thing well, rather than lots of things in a somewhat complex, and potentially fragile, way.

Elimination of the need for synchronization

In a preemptive multithreading environment, if two threads can access the same data, the data must be protected by some form of synchronization. This could be as simple as using one of the Interlocked... operations that lock at the memory-bus level, or it may require the use of a CRITICAL_SECTION or Mutex for synchronization. The downside of the latter two is that if access cannot be granted, the kernel is called to de-schedule the thread, which is queued up on the synchronization object for later execution when the synchronization object is released. The CRITICAL_SECTION is distinguished by the fact that if the synchronization object can be acquired (the most common situation in most cases), no kernel call is required, whereas the Mutex always requires a kernel call.

Synchronization between threads is where threads rub together. Synchronization is friction. As in mechanical systems, friction generates heat and wastes energy. Perhaps the best solution is to avoid it.

Fibers are a way to avoid this. ::SwitchToFiber is a "positive handoff". Until a fiber switches control, it cannot be preempted by any other fiber that would be running in that thread. Thus, once a fiber starts running, provided the state it is modifying is shared only with other fibers that would execute within that thread, there is no need to synchronize the state at all. The synchronization is implicit in the fiber scheduling, which is explicitly under the control of the programmer.

This is not a perfect solution. A fiber can be scheduled to run in multiple threads; if the shared state could now be accessed by fibers running in separate threads, the fiberness is irrelevant; you have a multithread synchronization problem, and you must use a CRITICAL_SECTION or Mutex.

When you have fibers, you do not have concurrency. If a fiber blocks, such as on an I/O call, it is actually the thread running the fiber that blocks. No additional fibers can be scheduled to run in that thread, because the thread itself is blocked. You do not get any concurrency between fibers running in the same thread (although you can get concurrency between fibers running in other threads). But, if you don't have blocking calls, fibers are particularly useful for multiplexing simple compute-bound tasks.

The CFiber class

The first class is a wrapper around the basic Fiber object:

class CFiber {
    public: // constructors
       CFiber(LPFIBER_START_ROUTINE rtn, SIZE_T stack = 0)
             { fiber = ::CreateFiber(stack, rtn, this); }
       CFiber() { fiber = NULL; }
       virtual ~CFiber() { ::DeleteFiber(fiber); }
    public: // methods
       BOOL Create(LPFIBER_START_ROUTINE rtn, SIZE_T stack = 0)
          { 
            ASSERT(fiber == NULL);
            fiber = ::CreateFiber(stack, rtn, this);
            return fiber != NULL;
          }
       BOOL ConvertThreadToFiber() {
          ASSERT(fiber == NULL);
          fiber = ::ConvertThreadToFiber(this);
          return fiber != NULL;
       }
       void Attach(LPVOID p) { ASSERT(fiber == NULL); fiber = p; }
       LPVOID Detach() { LPVOID result = fiber; 
                         fiber = NULL; return result; 
                       }
       LPVOID GetFiber() { return fiber; }
    public: // methods
       void run() { ASSERT(fiber != NULL); ::SwitchToFiber(fiber); }
    protected: // data
       LPVOID fiber;
};

The CFiber methods

`CFiber`	Constructors
`~CFiber`	Destructor
`Create`	Creates a new fiber in an existing `CFiber` object
`ConvertThreadToFiber`	Used to convert a thread to a fiber for purposes of fiber scheduling
`Attach`	Attaches an existing fiber to a `CFiber` object
`Detach`	Detaches a fiber from a `CFiber` object
`GetFiber`	Returns the current fiber
`run`	Switches to the chosen fiber

Constructors/Creation

There are several constructors. The key here is the assumption that the “fiber parameter” seen by the fiber will actually be the CFiber object, or a subclass derived from it.

CFiber::CFiber(LPFIBER_START_ROUTINE rtn, SIZE_T stack = 0);

This constructor will create a fiber that will execute the specified routine rtn. It has an optional stack size parameter which defaults to the process stack size.

Parameters:

	`rtn`	The fiber routine. `void CALLBACK rtn(LPVOID p)`
	`stack`	The desired stack size; if omitted, 0 (use default stack size) is assumed.

See also the implementation of CFiber::CFiber.

CFiber::CFiber();

This constructor creates a CFiber object which is not connected to any fiber. This is typically used when the user wishes to have the fiber parameter be other than the CFiber object.

CFiber::~CFiber()

The destructor deletes the underlying fiber object.

Notes: The delete operation must be done carefully on a fiber because deleting the currently executing fiber will terminate the currently executing thread that the fiber is running in. A fiber must not delete itself.

BOOL CFiber::Create(LPFIBER_START_ROUTINE rtn, SIZE_T stack = 0)

Given a CFiber object which is not associated with a fiber, this will create a new fiber and attach it to the CFiber object.

Parameters:

	`rtn`	The fiber routine. `void CALLBACK rtn(LPVOID p)`
	`stack`	The desired stack size; if omitted, 0 (use default stack size) is assumed.

Notes:

The following are equivalent:

(1)
CFiber * fiber = new CFiber(function);
(2)
CFiber * fiber = new CFiber;
LPVOID f = ::CreateFiber(0, function, fiber);
Fiber->Attach();
BOOL CFiber::ConvertThreadToFiber()

Remarks

Converts the current thread to a schedulable (target of ::SwitchToFiber) fiber. The current CFiber object becomes the fiber parameter.

Result: TRUE if the underlying ::ConvertThreadToFiber call succeeded; FALSE if it failed (use GetLastError to determine what went wrong).

Notes: After finishing with the fibers and having deleted them all, ::ConvertFiberToThread should be called to release the fiber-specific data structures which have been allocated to the thread.

void CFiber::Attach(LPVOID p)

This attaches a fiber to a CFiber object. The existing CFiber object must not already have a fiber attached.

Parameters:

p

A fiber reference

Notes: Unlike MFC, this does not maintain a map to see if a given fiber is bound to more than one CFiber object. It would be erroneous to have a program do so, but no additional checks are made.

When created via the constructors, the fiber parameter (returned from ::GetFiberData) is the CFiber object. If the application would like to have a different object associated as the fiber parameter, then the fiber can be created and attached.

LPVOID CFiber::Detach()

This breaks the association between the CFiber and the underlying fiber.

Result: The fiber that was formerly associated with the CFiber object.

Notes: The fiber retains the fiber parameter with which it was created; consequently, if a fiber is Detached from one CFiber object and Attached to another, and it has a fiber parameter which is the original CFiber object, this will now be an erroneous reference. Attach and Detach should not be used when the expected behavior is that the CFiber object is the fiber parameter.

LPVOID CFiber::GetFiber();

Result: The fiber associated with the CFiber object.

void CFiber::run();

Switches control to the fiber using ::SwitchToFiber.

The Queue Entry class

The goal here is to have a “time-shared” set of fibers, where the fibers would simply yield control when appropriate and some other fiber would run. In the normal mode of operation, a fiber yields control by calling ::SwitchToFiber, specifying the next fiber to run. However, this requires actually knowing what the next fiber should be. In some applications, this is part of the specification. In the system I was writing, all I knew was that I wanted to yield control to some other fiber. The choice was to do a round-robin fiber scheduler, which implies a queue. This required a queue-entry class to represent the fibers in the queue. I derived this from the CFiber class.

class QE : public CFiber {
    public: // constructor/destructor
       QE(LPFIBER_START_ROUTINE rtn, SIZE_T stack = 0) 
             : CFiber(rtn, stack) { next = NULL; }
       QE() { next = NULL; }
       virtual ~QE() {  }

    public:
       virtual void Display(LPCTSTR s) { }
    public: // internal state
       QE * next;
};

typedef QE * PQE;

The QE methods and member variables

`QE`	Constructors
`~QE`	Destructor
`Display`	Virtual method for debugging output
`next`	The link used to insert elements in the queue

QE::QE(LPFIBER_START_ROUTINE rtn, SIZE_T stack = 0);

This constructor will create a queue entry and a new fiber associated with that queue entry that will execute the specified routine rtn. It has an optional stack size parameter which defaults to the process stack size.

Parameters:

	`rtn`	The fiber routine. `void CALLBACK rtn(LPVOID p)`
	`stack`	The desired stack size; if omitted, 0 (use default stack size) is assumed.

QE::QE();

This constructor will create a queue entry which is not associated with any fiber. The CFiber::Attach call can be used to attach a fiber. See the caveats about the fiber data.

Notes: This code will not function properly if the ::GetFiberData call returns a value other than the QE reference.

virtual QE::~QE();

The destructor will delete the QE object and its associated fiber. See the caveats about deleting the running fiber. The delete operator should never be called on the QE representing the running fiber.

virtual void QE::Display(LPCTSTR s);

The intent of this is that subclasses of QE would implement this virtual method for purposes of debugging. This class does nothing in the implementation of this virtual method. While in principle it could have been a pure virtual method, there may be reasons to create raw QE objects, and this would not be possible if the method were a pure virtual method.

QE * QE::next;

This provides the queue structure managed by the Queue class.

Notes: C++ has a mechanism called friend which allows another class access to private variables. However, this is poorly-designed, because it means that the QE class has to know the name of the class that will use it. This reduces the generality of the classes; so rather than mark this as protected and require a known class name, I chose to make it public and thus allow other implementations of queues.

The Queue class

I wanted to implement a FIFO queue to do round-robin scheduling. I would normally use the MFC CList class, but this code had to work in a non-MFC environment.

class Queue {
    public:
        Queue() { queueHead = queueTail = emptyFiber = killFiber = NULL; }
    public:
       void appendToQueue(PQE qe);
       PQE removeFromQueue();
    public:
       void next();
       void yield();
       void kill();
   public:
       void SetEmptyFiber(PQE qe) { emptyFiber = qe; }
    protected:
       PQE queueHead;
       PQE queueTail;
       PQE emptyFiber;
    protected: // used for fiber-kill logic
       PQE killFiber;  
       PQE killTarget;

       static void CALLBACK killer(LPVOID p);
};

This is intended to support the queuing of fibers. The yield method puts the current queue entry at the tail of the queue. The next method dispatches the fiber at the head of the queue. The kill method does a delete on the currently running element, but it does so very carefully by using a separate fiber for this purpose. It then initiates the next fiber in the queue.

This code implements simple FIFO queuing. Other classes can be created that implement more sophisticated kinds of queuing.

The Queue methods

Queue management methods
`Queue`	Constructor
`appendToQueue`	Adds an element to the tail of the queue
`removeFromQueue`	Removes an element from the head of the queue
Scheduling methods
`next`	Dequeues the head of the queue and switches to that fiber; if the queue is empty, switches to the empty queue fiber
`yield`	Schedules the current fiber to the end of the queue and runs the fiber at the head of the queue
`kill`	Deletes the current fiber and schedules the next fiber
`SetEmptyFiber`	Specifies the fiber to be switched to if the queue is empty

Queue management methods

Queue::Queue();

Constructs an empty queue.

void Queue::appendToQueue(PQE qe);

Places the QE pointed to by its parameter at the end of the queue.

PQE Queue::removeFromQueue();

Removes the queue element from the front of the queue.

Result: A pointer to the QE entry which has been dequeued, or NULL if the queue is empty.

Queue scheduling methods

void Queue::next();

Dequeues the queue element at the head of the queue and switches control to the fiber it represents. If there is no element in the queue, it will switch to the fiber represented by the “empty queue” element (see SetEmptyFiber).

void Queue::yield();

When a fiber wishes to yield control, it calls this method, and control is transferred to the next fiber in the queue. The current fiber is placed at the end of the queue. Execution will resume on the line following the yield call when the fiber is re-scheduled.

void Queue::kill();

When a fiber is done computing, it calls this method. Control will be transferred to the next fiber in the queue. The current fiber is not placed at the end of the queue, and therefore will no longer be scheduled.

Notes: To “suspend” a fiber so it can be “restarted”, a more elaborate mechanism will need to be constructed.

void Queue::SetEmptyFiber(PQE qe);

Parameters:

qe

A QE reference to the fiber to be scheduled if the queue is empty.

Using the queue

The queue is first populated by using appendToQueue to add a QE object to the queue. Once the fibers start running, they will normally execute in round-robin FIFO order. The current mechanism does not have a way to “suspend” a fiber; that generalization is left as an Exercise For The Reader.

To start the queue, the main thread must create a QE which represents the CFiber::ConvertThreadToFiber call. This is typically set as the default fiber to resume when the queue is empty, so the SetEmptyFiber method is used to establish this.

The Code

The following sections illustrate the code in this example.

The CFiber class

All of the methods of the CFiber class are defined in the header file. The CFiber class wraps the primitive fiber representation, which is an LPVOID:

protected: // data
    LPVOID fiber;

Constructors/Destructors

CFiber(LPFIBER_START_ROUTINE rtn, SIZE_T stack = 0)
   { 
    fiber = ::CreateFiber(stack, rtn, this); 
   }

Note that this constructor creates the fiber using ::CreateFiber, passing this as the fiber parameter.

CFiber() { fiber = NULL; }

This creates a CFiber object which is not bound to a particular fiber; the Attach method can be used later to bind a fiber to this CFiber object. Note that this has both power and risk: it allows the fiber parameter to be other than the CFiber object, but some of the later classes are not designed to work under these conditions.

virtual ~CFiber() { ::DeleteFiber(fiber); }

This merely deletes the fiber. However, this method cannot be executed by the running fiber on itself, or the thread which is running the fiber will be terminated.

Creation/Attachment

BOOL Create(LPFIBER_START_ROUTINE rtn, SIZE_T stack = 0)
      {
        ASSERT(fiber == NULL);
        if(fiber != NULL)
           return FALSE;
        fiber = ::CreateFiber(stack, rtn, this);
        return fiber != NULL;
      }

In debug mode, this will handle the ASSERT test to make sure this CFiber is not already bound to a fiber. In release mode, the ASSERT disappears, but the method will return FALSE if the fiber is already bound.

BOOL ConvertThreadToFiber() {
          ASSERT(fiber == NULL);
          if(fiber != NULL)
             return FALSE;
          fiber = ::ConvertThreadToFiber(this);
          return fiber != NULL;
       }

The thread which is going to start scheduling fibers needs to do a ConvertThreadToFiber. This invokes the API call to map the current thread to a fiber, passing the current CFiber object as the fiber parameter.

void Attach(LPVOID p) { ASSERT(fiber == NULL); fiber = p; }

This should only be called for CFiber objects which are not bound to fibers. It is nominally erroneous to attempt to bind a CFiber to another fiber without doing a Detach first.

LPVOID Detach() { ASSERT(fiber != NULL);
                  LPVOID result = fiber;
                  fiber = NULL;
                  return result;
                }

This breaks the association between the CFiber and its underlying fiber, and returns the fiber reference. If the fiber has not been bound, the value returned is NULL. Nominally, it is erroneous to Detach if there is no association, and the ASSERT will cause a failure in debug mode if this situation arises. Note that there is no mechanism for changing the fiber parameter on an Attach of a formerly Detached fiber, and other subclasses described here will not function properly if the fiber parameter is not a reference to the CFiber object.

void run() { ASSERT(fiber != NULL);
             ::SwitchToFiber(fiber);
           }

This switches the currently executing fiber to the fiber on which the run method is specified.

The Queue Entry class

This class is derived from the CFiber class, and consequently inherits the constructor and other methods. The QE constructor is:

QE(LPFIBER_START_ROUTINE rtn, SIZE_T stack = 0) 
             : CFiber(rtn, stack) { next = NULL; }

so it is just the construction of a CFiber, with the addition of initializing the link used to maintain the queue.

The Display method is used to produce debugging output, but it is the responsibility of the subclass to implement the actual output.

The Queue class

The Queue class is essentially a singly-linked queue manager combined with the fiber-scheduling logic. The singly-linked queue manager is fairly straightforward. What I’ll show here are the fiber-scheduling functions.

The yield method

Note that to simplify the coding, I chose to assume that the fiber parameter is the actual QE object (or an instance of a subclass of QE). This means that this code would not operate correctly if Attach/Detach had been used in a way that made the fiber parameter inconsistent, since there is no way to reset the fiber parameter (while there is a ::GetFiberData API, there is no corresponding ::SetFiberData!).

All this method does is queue the currently-running fiber onto the tail of the queue and activate the next fiber in the queue (using the next method).

void Queue::yield()
    {
     PQE qe = (PQE) (::GetFiberData());
     qe->Display( _T("yield: returning to queue - ") );
     appendToQueue(qe);
     next();
    } // Queue::yield

The kill method

This is called whenever a fiber is terminated, and must be deleted.

void Queue::kill ()
    {#ifdef _DEBUG
     PQE qe = (PQE) (::GetFiberData());
     qe->Display(_T("kill"));
#endif
     killTarget = qe;
     if(killFiber == NULL)
        { /* create killer fiber */
         killFiber = new QE();
         LPVOID f = ::CreateFiber(0, killer, this);
         killFiber->Attach(f);
        } /* create killer fiber */
     killFiber->run();    } // Queue::kill

The problem here is that a fiber cannot delete itself. If ::DeleteFiber is called on the currently running fiber, the entire thread in which the fiber is running will exit because this will call ExitThread. In order to kill the fiber, it switches control to a different fiber, the killFiber. This fiber is able to do a delete operation on the QE object (which, remember, is a subclass of CFiber, and may itself be an instance of some further subclass of QE), which is implemented by a call on ::DeleteFiber, because at that point, the fiber being deleted is not the running fiber.

Note that this appears to be inconsistent with the documentation of DeleteFiber, which states:

If the currently running fiber calls DeleteFiber, its thread calls ExitThread and terminates. However, if a currently running fiber is deleted by another fiber, the thread running the deleted fiber is likely to terminate abnormally because the fiber stack has been freed.

This appears to make no sense at all. Due to the nature of fibers, a “running fiber” cannot be deleted by another fiber (in the same thread), since at the time it is being deleted it is, by definition, not the “running” fiber! However, I suspect that if the paragraph is replaced by one which says (with my suggested changes italicised)

If the currently running fiber calls DeleteFiber, its thread calls ExitThread and terminates. However, if a currently running fiber is deleted by another thread, the thread running the deleted fiber is likely to terminate abnormally because the fiber stack has been freed. If a fiber has been deleted, any attempt to use SwitchToFiber to return control to it is likely to cause its thread to terminate abnormally because the fiber stack has been freed.

I am currently researching this with Microsoft. The current documentation paragraph, as written, would make it impossible to ever call DeleteFiber!

The question then arises: since fibers appear in many ways to be coroutines, how is the parameter passed to the killer? In this case, a member variable killTarget was added to the Queue class. Because we are working with non-preemptive fibers, and we are assuming that there is no concurrent access to the Queue object from another thread (this is a basic assumption in this code), there is no need to worry about any form of synchronization; furthermore, no fiber other than the one being switched to can get control, so there is no problem in the use of such a variable (this would be a fatal design error in a preemptive multithreaded system, but a perfectly valid approach in a non-preemptive fiber system).

To avoid any need for explicit initialization or finalization, I create the kill fiber only if it does not already exist, and will show later where it is deleted. Why do it this way instead of in the constructor and destructor?

Mostly, it was an issue of lifetime. I felt that the fiber should not exist any longer than it needs to. There also seems to be some serious problems with determining exactly when it would be possible to delete the fiber, particularly since the destructor for the queue might run after the program has done a ::ConvertFiberToThread call to release the fiber resources, so the effects of doing a ::DeleteFiber under these conditions appear to be undefined.

This does lead to the question of when the kill fiber can be deleted. I chose to delete it when the queue goes empty. Note that this causes a reversion of scheduling to the “main fiber”, actually the empty-queue fiber. Since the queue is empty, there will be no more need to be able to kill fibers, and the killFiber fiber can be safely deleted. Note that the program may, as a consequence of having done all the processing, choose to create more queue entries; in this case, the killFiber is automatically re-created, “on demand”.

The killer fiber is quite simple; the key here is that since it is a separate fiber, it can now safely ::DeleteFiber the fiber which has now completed its operation.

void CALLBACK Queue::killer(LPVOID)
   {
     delete killTarget;
     next();
   } // Queue::killer

The next method

The next method dequeues the next element from the queue and schedules it to run by calling ::SwitchToFiber on its fiber. If there are no more fibers queued up in the queue, it reverts to the fiber established by the SetEmptyFiber. Note that a failure to call SetEmptyFiber will be considered a fatal error.

void Queue::next()
    {
     PQE qe = (PQE)removeFromQueue();
     if(qe != NULL)
        { /* got one */
         qe->Display(_T("Dequeued QE"));
         qe->run();
        } /* got one */
     else
        { /* all done */
         delete killFiber;
         killFiber = NULL;
         TRACE( (_T("Queue empty: switch to EmptyFiber\n")) );
         ASSERT(emptyFiber != NULL);
         // Note that a failure to have set the empty fiber is a fatal 
         // error and is unrecoverable!
         emptyFiber->run();
        } /* all done */
    } // Queue::next

Some other issues

Why did I use a QE structure for the killFiber and the emptyFiber? Since these are not actually scheduled, why not use a raw CFiber object for these instead?

In the case of the emptyFiber, it is possible that someone might choose, instead of the approach I took, to create a priority-scheduled queue, and make the empty fiber be simply the lowest-priority fiber. I felt it would be easier to see how to do this if the emptyFiber appeared to be a schedulable fiber. Since the caller has to create the fiber, it also meant the underlying implementation could be changed without changing the user interface.

The choice for the killFiber is harder to justify; in this case, it became more an issue of consistency rather than one of functionality, particularly because this never escapes to the user interface level. However, doing this way allows me to decide to do “lazy killing”, in that I could consider scheduling the kill fiber as simply one more task to do, and could either insert it at the head of the queue or later in the queue. However, note that if I were to insert it at other than the first queue item, then the use of the global killTarget would no longer be viable. This is because the killTarget is explicitly set for each fiber to be killed, and if I could have more than one pending kill request, this is insufficient (in this case, I would probably create the fibers on demand, and use the fiber parameter which would have to be a QE object; I would have to create a class Killer : public QE to maintain the context I need).

The problem domain

The goal of the sample program is to take a set of file names on the command line, and print out the files, one file per fiber. To make it appear as a concurrent program, the program is defined to print out only a few lines at a time, then the fiber is descheduled and another fiber runs. When end-of-file is reached, the fiber is no longer scheduled. Upon completion of the program, all the fibers are deleted.

To solve this, a class is derived from the QE class:

class CReaderFiber : public QE {
    public: // constructors
       CReaderFiber(LPCTSTR f, int c, Queue * q) : QE(reader) {
                                name = f;
                                count = c;
                                queue = q;
                                file = NULL;
                               }
       virtual ~CReaderFiber() { if(file != NULL) fclose(file); }
    public: // parameters
       LPCTSTR name;     // name of file
       int count;        // number of lines to write
       Queue * queue;    // the queue shared by all these fibers
    public:
       virtual void Display(LPCTSTR s);
    public: // local state
       FILE * file;      // currently-opened file object
       char buffer[MAX_LINE];  // local buffer
    protected:
       static void CALLBACK reader(LPVOID p);
 };

This derived class holds all the problem-specific information, such as the number of lines to write during each execution of the fiber, the name of the file, the buffer holding the input, and so on. The fiber function itself is a static class member.

/* static */ void CALLBACK CReaderFiber::reader(LPVOID p)
   {
    CReaderFiber * rf = (CReaderFiber *)p;
 
    TRACE( (_T("reader: called for %s\n"), rf->name) );
    rf->file = _tfopen(rf->name, _T("ra"));
    if(rf->file == NULL)
       { /* failed */
        DWORD err = ::GetLastError();
        reportError(err, rf->name);
        TRACE( (_T("reader: could not open %s\n"), rf->name) );
        rf->queue->kill();
        return;
       } /* failed */

Once the file is opened, we simply go into an infinite loop in the fiber, reading and printing lines. Note that this code assumes that a Unicode version of the program will be reading Unicode files and an ANSI version of the program will be reading ANSI (8-bit native code page) files. The generalization where either version can read or write either type of file is left as an Exercise For The Reader. Note that within the fiber loop, after printing the count number of lines, the fiber yields. Therefore another fiber will run.

  while(TRUE)
    { /* fiber loop */
     for(int i = 0; i < rf->count; i++)
        { /* read lines */
         if(_fgetts(rf->buffer, MAX_LINE, rf->file) == NULL)
            { /* all done */
             TRACE( (_T("reader: done with %s\n"), rf->name) );
             rf->queue->killFiber();
             ASSERT(FALSE);
            } /* all done */
         _tprintf(_T("%s"), rf->buffer);
        } /* read lines */
     TRACE( (_T("reader: yielding for %s after %d lines\n"),
                                            rf->name, rf->count) );
     rf->queue->yield();
    } /* fiber loop */
} // reader

The main program

The command line has the syntax:

argv[0] is the program name
argv[1] is the count of the number of lines to display on each iteration
argv[2..argc-1] are the names of the files

int _tmain(int argc, TCHAR * argv[])
{
    Queue queue;

    if(argc < 3)
    { /* usage */
        usage();
        return 1;
    } /* usage */
 
    int lines = _ttoi(argv[1]);

    if(lines <= 0)
    { /* failed */
        _tprintf(_T("Illegal buffer count (%d)\n"), lines);
        usage();
        return 1;
    } /* failed */

Once the boring details of the validation of the arguments is finished, the real work begins. First, an array is created to hold all the CFiber-derived objects. To simplify the code and eliminate the need to worry about all the offsets, I just created an array the same size as the argc array and ignore the first two entries in it.

CReaderFiber ** fibers = new CReaderFiber*[argc];

In order to handle the “main fiber”, which will be this thread converted to a fiber, I will need a QE object to allow it to be scheduled. The ConvertThreadToFiber method will accomplish this.

PQE mainFiber = new QE();

if(!mainFiber->ConvertThreadToFiber())

{ /* failed conversion */

    DWORD err = ::GetLastError();
    reportError(err, _T("Converting thread to fiber"));

    return 1;

} /* failed conversion */

Next, I scan the list of file names. For each file in the command line, I create a new CReaderFiber object and stick it in the array of fibers. As each instance of a fiber is created, I place it in the queue.

     for(int i = 2; i < argc; i++)

     { /* create fibers */
        fibers[i] = new CReaderFiber(argv[i], lines, &queue);

        if(fibers[i] == NULL)

        { /* failed */

            DWORD err = ::GetLastError();
            reportError(err, _T("Creating fiber"));

            return 1;

        } /* failed */
        queue.appendToQueue(fibers[i]);

} /* create fibers */

When all the threads have finished, control will return to this thread, the “main fiber”. To accomplish this, we have to set the mainFiber (the result of ConvertThreadToFiber) to send control back here when the queue is empty (all the contents of all the files have been printed).

queue.SetEmptyFiber(mainFiber);

At this point, we start scheduling the fibers. The fibers will now run until all the contents of the files have been printed.

queue.next();

When all the contents have been printed, the ::SwitchToFiber call will switch execution to this fiber, and the code below will execute.

 TRACE( (_T("Finished\n")) );
 delete mainFiber;
 ::ConvertFiberToThread();
 return 0;
}

Example of output

The command line arguments supplied were:

5 a.txt b.txt c.txt

where the contents of the files were of the form “File <filename> <line number>”. The output is shown below; the grouping-of-5 is based on the first parameter to the command line. Note the changes shown: file a.txt has 200 lines, b.txt has 250 lines, and c.txt has 300 lines. These were created by a little program called "datagen", by the following command lines:

datagen 200 "File A" > a.txt
datagen 250 "File B" > b.txt
datagen 300 "File C" > c.txt

The output:

File A 1
File A 2
File A 3
File A 4
File A 5
File B 1
File B 2
File B 3
File B 4
File B 5
File C 1
File C 2
File C 3
File C 4
File C 5
File A 6
File A 7
File A 8
File A 9
File A 10
File B 6
File B 7
File B 8
File B 9
File B 10
File C 6
File C 7
File C 8
File C 9
File C 10
File A 11
File A 12
File A 13
File A 14
File A 15
…
File C 191
File C 192
File C 193
File C 194
File C 195
File A 196
File A 197
File A 198
File A 199
File A 200
File B 196
File B 197
File B 198
File B 199
File B 200
File C 196
File C 197
File C 198
File C 199
File C 200
File B 201   <- Note that file A, with 200 lines is no longer in the mix after this point
File B 202
File B 203
File B 204
File B 205
File C 201
File C 202
File C 203
File C 204
File C 205
File B 206
File B 207
File B 208
File B 209
File B 210
File C 206
File C 207
File C 208
File C 209
File C 210
File B 211
File B 212
File B 213
File B 214
File B 215
…
File B 246
File B 247
File B 248
File B 249
File B 250
File C 246
File C 247
File C 248
File C 249
File C 250
File C 251   <- Note that file B, with 250 lines, is no longer in the mix after this point
File C 252
File C 253
File C 254
File C 255
File C 256
File C 257
File C 258
File C 259
File C 260

Some additional fiber issues

This code makes a number of simplifying assumptions which should not be seen as limiting in the general fiber world. For example:

All the fibers will be running in just one thread, the same thread in which they were created
The fiber, once created, will remain attached to the same CFiber object

These are both implementation choices. While it is possible to have a fiber be executed by one thread, and then another (::SwitchToFiber doesn’t seem to care which thread the fiber was created in), there are certain risks involved in doing so. If you assume your fibers have no synchronization issues because they are fibers, then suddenly start executing them in separate threads, you have to be exceedingly cautious that you have not introduced a synchronization problem, and furthermore, under maintenance, be careful that no future synchronization problem could arise.

Note that if you choose to run fibers in different threads (that is, a given fiber might be running in different threads; not the situation where multiple threads are each running fibers only within themselves), be sure you read about the /GT compiler option, “Support fiber-safe thread local storage”, before proceeding. This has an impact on the use of __declspec(thread) variables when used in fibers that might be later scheduled in different threads.

My choice of using the ::GetFiberData to obtain the fiber data was a simplifying assumption, and need not be considered a definitive approach to how this is done. It simplified the code for one case, which is the case I was implementing.

The usual miscellany

As in any real program, there are a few things done that are not part of the original goal of the program, but need to be done for either functionality, elegance, convenience, or some other good reason. This program is no exception, although some of these components had been developed independently and simply used here.

ASSERT

The MFC ASSERT macro is well-designed. It merely reports a problem, but it does not abort the program. This is a valuable feature. The programmer can use the ASSERT macro to aid in debugging, but has the option, upon return, of doing “graceful recovery” and allowing the programmer to do anything from hand-tweaking values with the debugger to doing program changes and using edit-and-go to re-execute lines. Overall, an intelligent and elegant design.

The assert function of the C library, on the other hand, is poorly-designed, using what I call the “PDP-11 mindset”, referring to the earliest minicomputer that popularized the C language. The philosophy then was that programs were small, and it was OK to terminate a program on an internal error. In the world of GUI design, this is always a bad design decision; only the user is permitted to terminate the program.

So, I needed an ASSERT macro for non-MFC applications. In addition, I wanted this code to easily fit inside an MFC app (note that my C++ classes in this article are all standalone, rather than being based on MFC classes like CObject).

The first trick here was the header file, assert.h:

#pragma once
#ifndef ASSERT
#ifdef _DEBUG
 
#define ASSERT(x) if(!(x)) 
\
      fail(_T("Assertion Failure %s(%d)\n"), __FILE__, __LINE__)
#define VERIFY(x) ASSERT(x)
extern void fail(LPCTSTR fmt, ...);
#else
#define ASSERT(x)
#define VERIFY(x) x
#endif // _DEBUG
#endif // ASSERT

The fail method is straightforward:

void fail(LPCTSTR fmt, ...)
   {
    va_list args;
    va_start(args, fmt);
    TCHAR buffer[1024]; 
// should be large enough
    _vstprintf(buffer, fmt, args);
    va_end(args);
    OutputDebugString(buffer);
   }

However, note how dangerous this is! What if the data were to exceed 1024 characters? The answer is simple: this would be a fatal buffer-overflow situation!

Because I am using only a filename (maximum MAX_PATH characters (nominally 260 characters)) and a line number, this is safe. But it is not safe in general!

Unfortunately, this had been done in Visual Studio 6, and there is no solution. But in Visual Studio .NET, after the numerous security holes caused by buffer overflows that Microsoft has been victimized by, the appropriate primitives were added. The important new calls needed here are _vscprintf/_vscwprintf, or as defined “bimodally” in tchar.h, _vsctprintf, which compiles as appropriate for ANSI or Unicode apps. These return the number of characters required to hold the formatted string (not including the NULL terminator). Therefore, a properly-written function for VS.NET would be as shown below:

void fail(LPCTSTR fmt, ...)
   {
    va_list args;
    va_start(args, fmt);
    int n = _vsctprintf(fmt, args);
    LPTSTR buffer = new TCHAR[n + 1];
    _vsctprintf(buffer, fmt, args);
    va_end(args);
    OutputDebugString(buffer);
    delete [] buffer;
   }

In MFC, this would be easily accomplished by using the CString::FormatV method, which implements a similar algorithm to _vsctprintf, but the premise in this code is that we are not using MFC.

Note that this code does not pop up a message box or perform interaction with the user. But it does allow the program to continue. I have often just set a breakpoint at the end of this function if I need to have it stop; or you can create something more elaborate using DebugBreak.

The TRACE macro

In the file debug.cpp, I have an implementation of simple output tracing. Note that it looks remarkably like the fail method described above, including the buffer-overflow bug (and the same solution applies if VS.NET is being used). However, we have an additional problem with TRACE that we did not have with ASSERT: a variable number of arguments.

The macro system in C represents the state-of-the-art for macro systems of the 1950s, and is nothing as elaborate as a lot of programming languages have had. A good macro system is actually Turing-equivalent in that you can write, in the macros, programs that write your code. The C macro system is, well, to call it “primitive” is a bit unfair. In many cases, a stone ax was far more sophisticated. Nonetheless, it is what we have to work with, so we have to deal with it as best as we can.

The trick here is taken from the KdPrint macro of the Microsoft DDK. To get a variable number of arguments into a macro call, you have to wrap them in an extra set of parentheses, so syntactically, they look like a single argument to the macro. The definition of the TRACE macro, and its accompanying function, is therefore given as (from debug.h):

void CDECL trace(LPCTSTR fmt, ...);

#ifdef _DEBUG
#define TRACE(x) trace x
#else
#define TRACE(x)
#endif

A call on the macro might be done as:

TRACE( (_T("The result for 0x%p is %d (%s)\n"), 
           object, object->value, object->name) );

Note my tendency to use whitespace between the parameter parentheses of the TRACE macro and the parameter parentheses of the argument list.

Note also the use of %p to get an address to print. It is very important to develop this habit instead of using 0x%08x or some other non-portable method. This would not work correctly if the program were ported to Win64 (the format would have to be hand-edited to be 0x%16I64x; note the first argument after the % is the number 16, and the next sequence is the qualifier I64 to indicate the incoming argument is a 64-bit value, and finally the x to specify the desired formatting).

Reporting errors

Generally, I use a much more general implementation wrapping ::FormatMessage, but for this simple example, a console-based app, with no GUI interface, and no use of MFC, a simpler implementation suffices.

In keeping with my philosophy of "no two places in my code ever issue the same error message", I allow the user to pass in a unique identification string. For this toy example, I have not bothered to provide for localization via the STRINGTABLE.

void reportError(DWORD err, LPCTSTR reason)
   {
    LPTSTR msg;
    
    if(FormatMessage(FORMAT_MESSAGE_ALLOCATE_BUFFER | FORMAT_MESSAGE_FROM_SYSTEM,
                     NULL,   // source of message table
                     err,    // error code to format
                     0,      // language ID: default for thread
                     (LPTSTR)&msg, // place to put pointer
                     0,            // size will be computed
                     NULL) == 0)   // arglist
       { /* error */
        _tprintf(_T("Error %d (0x%08x)\n"), err);
        return;
       } /* error */

    _tprintf(_T("Error %s\n%s\n"), reason, msg);
    LocalFree(msg);
   } // reportError

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)