Introduction
This project came about when I encountered a client that needed a fiber scheduler. The problem arose because of a need to convert non-preemptive “threads” from an existing programming domain to Windows. [This was part of my teaching, not part of a contract, and I presented this solution in class; therefore, I was not actually paid to write it, and it remains mine. Just so you know that this wasn’t proprietary work.]
The classes I designed were intended to illustrate a round-robin fiber scheduler. You can feel free to adapt these classes to any application domain you want, using any discipline you want. For example, you could create a priority-based scheduler if you needed one (more on this later).
This uses a subset of the capabilities of fibers to implement a solution. Fibers are a good deal more general than used in this solution, and I'll talk about some of these issues later.
So what's a fiber?
A fiber is a non-preemptive multiplexing mechanism. In fact, fibers are more like an implementation of coroutines, a technology dating back to the 1960s. A fiber consists of a register set and a stack, so is much like a thread, but it is not a thread. A fiber has no identity unless it runs inside a thread. When a fiber is suspended, it transfers control to another fiber by explicitly selecting and naming (by supplying the address of) the target fiber. The operation that performs this is called SwitchToFiber
. The fiber will continue to run in that thread until another SwitchToFiber
is executed.
A fiber, unlike a thread, never exits. If a fiber actually exits, by returning from its top-level fiber function, the thread that is executing it will implicitly call ExitThread
and terminate. Note that this call on ExitThread
means the thread does not itself exit via its top-level function, and consequently cleanup that must be done there will not be performed. Consequently, you should consider the actual exit of a thread to be a serious programming error.
The MSDN lists a set of fiber-related functions, but the documentation is incomplete, confusing, sometimes misleading, and in one case, I believe overtly wrong. This essay is going to attempt to clarify these issues, although I don't have enough information to resolve some of the ambiguities or fill in the holes (although I have passed this off to my MVP contact in the hopes he can get some answers for me).
The fiber functions
The fiber functions are listed under the topic "Process and Thread functions [base]". Apparently, "fibers" is not considered worthy of mention in the title.
ConvertFiberToThread | This undoes the action of the ConvertThreadToFiber(Ex) operations and frees up resources that have been used in the original conversion. |
ConvertThreadToFiber | Converts the current thread into a fiber. |
ConvertThreadToFiberEx | Converts the current thread into a fiber, with some additional options. |
CreateFiber | Allocates a fiber object, assigns it a stack, and sets up the EIP to be at the start address specified. A user-specified PVOID parameter will be passed to the fiber function. |
CreateFiberEx | The same as CreateFiber , but provides additional options. |
DeleteFiber | Deletes a fiber created by CreateFiber(Ex) . This is the way to get rid of a fiber and its stack, but it has certain limitations that must be considered. |
FiberProc | The typedef for this is typedef void (CALLBACK * LPFIBER_START_ROUTINE)(PVOID parameter) |
FlsAlloc | Allocates a fiber-local-storage (FLS) index; to delete it, use FlsFree . |
FlsCallback | A function established by FlsAlloc that is called when the thread running a fiber terminates, or a fiber is deallocated. |
FlsFree | Frees a fiber-local-storage (FLS) index allocated by FlsAlloc . |
FlsGetValue | Retrieves the current FLS value stored at a specific index. |
FlsSetValue | Sets the current FLS value stored at a specific index. |
GetCurrentFiber | Returns the address of the current fiber. |
SwitchToFiber | Switches control from one fiber to another. |
There are several steps involved in creating a running fiber environment.
First, some fibers have to be created. When a thread is created, it is implicitly scheduled to run. When a fiber is created, there is no mechanism that makes it spontaneously able to run. Instead, the fiber remains "suspended" until there is an explicit SwitchToFiber
to transfer control from a currently-running fiber to one of the other fibers.
Since the threads are what are actually running, only a running thread can do a SwitchToFiber
. But for this to work correctly, it must be from an existing fiber, to an existing fiber. A thread is not a fiber, so although it could create a bunch of fibers, it can't cause them to run.
This is solved by using either ConvertThreadToFiber
or ConvertThreadToFiberEx
calls. These calls create a "fiber work area" in the current thread, thus imbuing the thread with fiberness, or to say it in another way, "the thread has the fiber nature". Now, the underlying thread scheduler will schedule threads, including the thread which now has the fiber nature. Essentially, upon return from the ConvertThreadToFiber(Ex)
call, you find yourself running in a fiber, which is running inside the original thread. This fiber is now free to do a SwitchToFiber
to any other fiber.
At some point, when a fiber is no longer in use, you must call DeleteFiber
on the fiber object. There are some restrictions on how this can be done, which will be discussed later. But ultimately, every fiber you create must be deleted.
When the ConvertThreadToFiber(Ex)
calls are performed, they, too, return a fiber object reference. When you call SwitchToFiber
to this fiber, you are "back in the thread", and you then call ConvertFiberToThread
to release the fiber-specific resources that had been allocated to the thread.
That's basically it. It isn't everything, and what I've described here represents a subset of the full capabilities of the fiber universe, but I'll elaborate on these a bit later.
Why use fibers?
Speed. Simplicity. Elimination of the need for synchronization.
Speed
Fibers are lightweight. The operation to switch from one fiber to another is 32 instructions, involving no kernel call at all. On a 2GHz machine, this means that you are looking at a swap as fast as 8ns (worst case, if none of the data is in the cache, could be several tens of nanoseconds).
In case you're wondering where those numbers come from, 32 instructions would take, on a 2GHz machine, 16ns, except that the Pentium 4 is a "pipelined superscalar" architecture and can issue two integer instructions per clock cycle, coming up with the 8ns value. Instruction pipelining and pre-fetching tend to mask instruction fetch times. However, in the worst case when nothing is in the cache, the speed depends upon the pattern of cache hits and replacements, and therefore, is limited by the memory cycle speed, which is platform-specific.
This means that if you have a lot of little things to do, fibers may be a better choice than threads, because threading involves kernel calls and invocation of the scheduler. Fibers execute within the time slice of the thread in which they are running, so swapping among many short-computation fibers does not involve any potential blocking in the kernel. The elimination of the need for synchronization in many cases also eliminates the need for the scheduler to come into play.
Simplicity
The simplicity from fibers resembles the simplicity of threading in many ways: instead of having to write complex interlaced code that tries to do several things at once, you write simple code that does only one thing, and use a lot of different fibers to do the different things. This allows you to concentrate on doing one thing well, rather than lots of things in a somewhat complex, and potentially fragile, way.
Elimination of the need for synchronization
In a preemptive multithreading environment, if two threads can access the same data, the data must be protected by some form of synchronization. This could be as simple as using one of the Interlocked...
operations that lock at the memory-bus level, or it may require the use of a CRITICAL_SECTION
or Mutex for synchronization. The downside of the latter two is that if access cannot be granted, the kernel is called to de-schedule the thread, which is queued up on the synchronization object for later execution when the synchronization object is released. The CRITICAL_SECTION
is distinguished by the fact that if the synchronization object can be acquired (the most common situation in most cases), no kernel call is required, whereas the Mutex always requires a kernel call.
Synchronization between threads is where threads rub together. Synchronization is friction. As in mechanical systems, friction generates heat and wastes energy. Perhaps the best solution is to avoid it.
Fibers are a way to avoid this. ::SwitchToFiber
is a "positive handoff". Until a fiber switches control, it cannot be preempted by any other fiber that would be running in that thread. Thus, once a fiber starts running, provided the state it is modifying is shared only with other fibers that would execute within that thread, there is no need to synchronize the state at all. The synchronization is implicit in the fiber scheduling, which is explicitly under the control of the programmer.
This is not a perfect solution. A fiber can be scheduled to run in multiple threads; if the shared state could now be accessed by fibers running in separate threads, the fiberness is irrelevant; you have a multithread synchronization problem, and you must use a CRITICAL_SECTION
or Mutex.
When you have fibers, you do not have concurrency. If a fiber blocks, such as on an I/O call, it is actually the thread running the fiber that blocks. No additional fibers can be scheduled to run in that thread, because the thread itself is blocked. You do not get any concurrency between fibers running in the same thread (although you can get concurrency between fibers running in other threads). But, if you don't have blocking calls, fibers are particularly useful for multiplexing simple compute-bound tasks.
The CFiber class
The first class is a wrapper around the basic Fiber object:
class CFiber {
public:
CFiber(LPFIBER_START_ROUTINE rtn, SIZE_T stack = 0)
{ fiber = ::CreateFiber(stack, rtn, this); }
CFiber() { fiber = NULL; }
virtual ~CFiber() { ::DeleteFiber(fiber); }
public:
BOOL Create(LPFIBER_START_ROUTINE rtn, SIZE_T stack = 0)
{
ASSERT(fiber == NULL);
fiber = ::CreateFiber(stack, rtn, this);
return fiber != NULL;
}
BOOL ConvertThreadToFiber() {
ASSERT(fiber == NULL);
fiber = ::ConvertThreadToFiber(this);
return fiber != NULL;
}
void Attach(LPVOID p) { ASSERT(fiber == NULL); fiber = p; }
LPVOID Detach() { LPVOID result = fiber;
fiber = NULL; return result;
}
LPVOID GetFiber() { return fiber; }
public:
void run() { ASSERT(fiber != NULL); ::SwitchToFiber(fiber); }
protected:
LPVOID fiber;
};
The CFiber methods
CFiber | Constructors |
~CFiber | Destructor |
Create | Creates a new fiber in an existing CFiber object
|
ConvertThreadToFiber | Used to convert a thread to a fiber for purposes of fiber scheduling
|
Attach | Attaches an existing fiber to a CFiber object
|
Detach | Detaches a fiber from a CFiber object |
GetFiber | Returns the current fiber |
run | Switches to the chosen fiber |
Constructors/Creation
There are several constructors. The key here is the assumption that the “fiber parameter” seen by the fiber will actually be the CFiber
object, or a subclass derived from it.
CFiber::CFiber(LPFIBER_START_ROUTINE rtn, SIZE_T stack = 0);
This constructor will create a fiber that will execute the specified routine rtn
. It has an optional stack size parameter which defaults to the process stack size.
Parameters:
| rtn
| The fiber routine. void CALLBACK rtn(LPVOID p)
|
| stack
| The desired stack size; if omitted, 0 (use default stack size) is assumed.
|
See also the implementation of CFiber::CFiber
.
CFiber::CFiber();
This constructor creates a CFiber
object which is not connected to any fiber. This is typically used when the user wishes to have the fiber parameter be other than the CFiber
object.
CFiber::~CFiber()
The destructor deletes the underlying fiber object.
Notes: The delete
operation must be done carefully on a fiber because deleting the currently executing fiber will terminate the currently executing thread that the fiber is running in. A fiber must not delete itself.
BOOL CFiber::Create(LPFIBER_START_ROUTINE rtn, SIZE_T stack = 0)
Given a CFiber
object which is not associated with a fiber, this will create a new fiber and attach it to the CFiber
object.
Parameters:
| rtn
| The fiber routine. void CALLBACK rtn(LPVOID p)
|
| stack
| The desired stack size; if omitted, 0 (use default stack size) is assumed.
|
Notes:
The following are equivalent:
(1)
CFiber * fiber = new CFiber(function);
(2)
CFiber * fiber = new CFiber;
LPVOID f = ::CreateFiber(0, function, fiber);
Fiber->Attach();
BOOL CFiber::ConvertThreadToFiber()
Remarks
Converts the current thread to a schedulable (target of ::SwitchToFiber
) fiber. The current CFiber
object becomes the fiber parameter.
Result: TRUE
if the underlying ::ConvertThreadToFiber
call succeeded; FALSE
if it failed (use GetLastError
to determine what went wrong).
Notes: After finishing with the fibers and having deleted them all, ::ConvertFiberToThread
should be called to release the fiber-specific data structures which have been allocated to the thread.
void CFiber::Attach(LPVOID p)
This attaches a fiber to a CFiber
object. The existing CFiber
object must not already have a fiber attached.
Parameters:
Notes: Unlike MFC, this does not maintain a map to see if a given fiber is bound to more than one CFiber
object. It would be erroneous to have a program do so, but no additional checks are made.
When created via the constructors, the fiber parameter (returned from ::GetFiberData
) is the CFiber
object. If the application would like to have a different object associated as the fiber parameter, then the fiber can be created and attached.
LPVOID CFiber::Detach()
This breaks the association between the CFiber
and the underlying fiber.
Result: The fiber that was formerly associated with the CFiber
object.
Notes: The fiber retains the fiber parameter with which it was created; consequently, if a fiber is Detach
ed from one CFiber
object and Attach
ed to another, and it has a fiber parameter which is the original CFiber
object, this will now be an erroneous reference. Attach
and Detach
should not be used when the expected behavior is that the CFiber
object is the fiber parameter.
LPVOID CFiber::GetFiber();
Result: The fiber associated with the CFiber
object.
void CFiber::run();
Switches control to the fiber using ::SwitchToFiber
.
The Queue Entry class
The goal here is to have a “time-shared” set of fibers, where the fibers would simply yield control when appropriate and some other fiber would run. In the normal mode of operation, a fiber yields control by calling ::SwitchToFiber
, specifying the next fiber to run. However, this requires actually knowing what the next fiber should be. In some applications, this is part of the specification. In the system I was writing, all I knew was that I wanted to yield control to some other fiber. The choice was to do a round-robin fiber scheduler, which implies a queue. This required a queue-entry class to represent the fibers in the queue. I derived this from the CFiber
class.
class QE : public CFiber {
public:
QE(LPFIBER_START_ROUTINE rtn, SIZE_T stack = 0)
: CFiber(rtn, stack) { next = NULL; }
QE() { next = NULL; }
virtual ~QE() { }
public:
virtual void Display(LPCTSTR s) { }
public:
QE * next;
};
typedef QE * PQE;
The QE methods and member variables
QE
| Constructors
|
~QE
| Destructor
|
Display
| Virtual method for debugging output
|
next
| The link used to insert elements in the queue
|
QE::QE(LPFIBER_START_ROUTINE rtn, SIZE_T stack = 0);
This constructor will create a queue entry and a new fiber associated with that queue entry that will execute the specified routine rtn
. It has an optional stack size parameter which defaults to the process stack size.
Parameters:
| rtn
| The fiber routine. void CALLBACK rtn(LPVOID p)
|
| stack
| The desired stack size; if omitted, 0 (use default stack size) is assumed.
|
QE::QE();
This constructor will create a queue entry which is not associated with any fiber. The CFiber::Attach
call can be used to attach a fiber. See the caveats about the fiber data.
Notes: This code will not function properly if the ::GetFiberData
call returns a value other than the QE
reference.
virtual QE::~QE();
The destructor will delete the QE
object and its associated fiber. See the caveats about deleting the running fiber. The delete
operator should never be called on the QE
representing the running fiber.
virtual void QE::Display(LPCTSTR s);
The intent of this is that subclasses of QE
would implement this virtual method for purposes of debugging. This class does nothing in the implementation of this virtual method. While in principle it could have been a pure virtual method, there may be reasons to create raw QE
objects, and this would not be possible if the method were a pure virtual method.
QE * QE::next;
This provides the queue structure managed by the Queue
class.
Notes: C++ has a mechanism called friend
which allows another class access to private variables. However, this is poorly-designed, because it means that the QE
class has to know the name of the class that will use it. This reduces the generality of the classes; so rather than mark this as protected
and require a known class name, I chose to make it public
and thus allow other implementations of queues.
The Queue class
I wanted to implement a FIFO queue to do round-robin scheduling. I would normally use the MFC CList
class, but this code had to work in a non-MFC environment.
class Queue {
public:
Queue() { queueHead = queueTail = emptyFiber = killFiber = NULL; }
public:
void appendToQueue(PQE qe);
PQE removeFromQueue();
public:
void next();
void yield();
void kill();
public:
void SetEmptyFiber(PQE qe) { emptyFiber = qe; }
protected:
PQE queueHead;
PQE queueTail;
PQE emptyFiber;
protected:
PQE killFiber;
PQE killTarget;
static void CALLBACK killer(LPVOID p);
};
This is intended to support the queuing of fibers. The yield
method puts the current queue entry at the tail of the queue. The next
method dispatches the fiber at the head of the queue. The kill
method does a delete
on the currently running element, but it does so very carefully by using a separate fiber for this purpose. It then initiates the next fiber in the queue.
This code implements simple FIFO queuing. Other classes can be created that implement more sophisticated kinds of queuing.
The Queue methods
Queue management methods
|
Queue
| Constructor
|
appendToQueue
| Adds an element to the tail of the queue
|
removeFromQueue
| Removes an element from the head of the queue
|
Scheduling methods
|
next
| Dequeues the head of the queue and switches to that fiber; if the queue is empty, switches to the empty queue fiber
|
yield
| Schedules the current fiber to the end of the queue and runs the fiber at the head of the queue
|
kill
| Deletes the current fiber and schedules the next fiber
|
SetEmptyFiber
| Specifies the fiber to be switched to if the queue is empty
|
Queue management methods
Queue::Queue();
Constructs an empty queue.
void Queue::appendToQueue(PQE qe);
Places the QE
pointed to by its parameter at the end of the queue.
PQE Queue::removeFromQueue();
Removes the queue element from the front of the queue.
Result: A pointer to the QE
entry which has been dequeued, or NULL
if the queue is empty.
Queue scheduling methods
void Queue::next();
Dequeues the queue element at the head of the queue and switches control to the fiber it represents. If there is no element in the queue, it will switch to the fiber represented by the “empty queue” element (see SetEmptyFiber
).
void Queue::yield();
When a fiber wishes to yield control, it calls this method, and control is transferred to the next fiber in the queue. The current fiber is placed at the end of the queue. Execution will resume on the line following the yield
call when the fiber is re-scheduled.
void Queue::kill();
When a fiber is done computing, it calls this method. Control will be transferred to the next fiber in the queue. The current fiber is not placed at the end of the queue, and therefore will no longer be scheduled.
Notes: To “suspend” a fiber so it can be “restarted”, a more elaborate mechanism will need to be constructed.
void Queue::SetEmptyFiber(PQE qe);
Parameters:
| qe
| A QE reference to the fiber to be scheduled if the queue is empty.
|
Using the queue
The queue is first populated by using appendToQueue
to add a QE
object to the queue. Once the fibers start running, they will normally execute in round-robin FIFO order. The current mechanism does not have a way to “suspend” a fiber; that generalization is left as an Exercise For The Reader.
To start the queue, the main thread must create a QE
which represents the CFiber::ConvertThreadToFiber
call. This is typically set as the default fiber to resume when the queue is empty, so the SetEmptyFiber
method is used to establish this.
The Code
The following sections illustrate the code in this example.
The CFiber class
All of the methods of the CFiber
class are defined in the header file. The CFiber
class wraps the primitive fiber representation, which is an LPVOID
:
protected:
LPVOID fiber;
Constructors/Destructors
CFiber(LPFIBER_START_ROUTINE rtn, SIZE_T stack = 0)
{
fiber = ::CreateFiber(stack, rtn, this);
}
Note that this constructor creates the fiber using ::CreateFiber
, passing this
as the fiber parameter.
CFiber() { fiber = NULL; }
This creates a CFiber
object which is not bound to a particular fiber; the Attach
method can be used later to bind a fiber to this CFiber
object. Note that this has both power and risk: it allows the fiber parameter to be other than the CFiber
object, but some of the later classes are not designed to work under these conditions.
virtual ~CFiber() { ::DeleteFiber(fiber); }
This merely deletes the fiber. However, this method cannot be executed by the running fiber on itself, or the thread which is running the fiber will be terminated.
Creation/Attachment
BOOL Create(LPFIBER_START_ROUTINE rtn, SIZE_T stack = 0)
{
ASSERT(fiber == NULL);
if(fiber != NULL)
return FALSE;
fiber = ::CreateFiber(stack, rtn, this);
return fiber != NULL;
}
In debug mode, this will handle the ASSERT
test to make sure this CFiber
is not already bound to a fiber. In release mode, the ASSERT
disappears, but the method will return FALSE
if the fiber is already bound.
BOOL ConvertThreadToFiber() {
ASSERT(fiber == NULL);
if(fiber != NULL)
return FALSE;
fiber = ::ConvertThreadToFiber(this);
return fiber != NULL;
}
The thread which is going to start scheduling fibers needs to do a ConvertThreadToFiber
. This invokes the API call to map the current thread to a fiber, passing the current CFiber
object as the fiber parameter.
void Attach(LPVOID p) { ASSERT(fiber == NULL); fiber = p; }
This should only be called for CFiber
objects which are not bound to fibers. It is nominally erroneous to attempt to bind a CFiber
to another fiber without doing a Detach
first.
LPVOID Detach() { ASSERT(fiber != NULL);
LPVOID result = fiber;
fiber = NULL;
return result;
}
This breaks the association between the CFiber
and its underlying fiber, and returns the fiber reference. If the fiber has not been bound, the value returned is NULL
. Nominally, it is erroneous to Detach
if there is no association, and the ASSERT
will cause a failure in debug mode if this situation arises. Note that there is no mechanism for changing the fiber parameter on an Attach
of a formerly Detach
ed fiber, and other subclasses described here will not function properly if the fiber parameter is not a reference to the CFiber
object.
void run() { ASSERT(fiber != NULL);
::SwitchToFiber(fiber);
}
This switches the currently executing fiber to the fiber on which the run
method is specified.
The Queue Entry class
This class is derived from the CFiber
class, and consequently inherits the constructor and other methods. The QE
constructor is:
QE(LPFIBER_START_ROUTINE rtn, SIZE_T stack = 0)
: CFiber(rtn, stack) { next = NULL; }
so it is just the construction of a CFiber
, with the addition of initializing the link used to maintain the queue.
The Display
method is used to produce debugging output, but it is the responsibility of the subclass to implement the actual output.
The Queue class
The Queue
class is essentially a singly-linked queue manager combined with the fiber-scheduling logic. The singly-linked queue manager is fairly straightforward. What I’ll show here are the fiber-scheduling functions.
The yield method
Note that to simplify the coding, I chose to assume that the fiber parameter is the actual QE
object (or an instance of a subclass of QE
). This means that this code would not operate correctly if Attach/Detach
had been used in a way that made the fiber parameter inconsistent, since there is no way to reset the fiber parameter (while there is a ::GetFiberData
API, there is no corresponding ::SetFiberData
!).
All this method does is queue the currently-running fiber onto the tail of the queue and activate the next fiber in the queue (using the next
method).
void Queue::yield()
{
PQE qe = (PQE) (::GetFiberData());
qe->Display( _T("yield: returning to queue - ") );
appendToQueue(qe);
next();
}
The kill method
This is called whenever a fiber is terminated, and must be deleted.
void Queue::kill ()
{#ifdef _DEBUG
PQE qe = (PQE) (::GetFiberData());
qe->Display(_T("kill"));
#endif
killTarget = qe;
if(killFiber == NULL)
{
killFiber = new QE();
LPVOID f = ::CreateFiber(0, killer, this);
killFiber->Attach(f);
}
killFiber->run(); }
The problem here is that a fiber cannot delete itself. If ::DeleteFiber
is called on the currently running fiber, the entire thread in which the fiber is running will exit because this will call ExitThread
. In order to kill the fiber, it switches control to a different fiber, the killFiber
. This fiber is able to do a delete
operation on the QE
object (which, remember, is a subclass of CFiber
, and may itself be an instance of some further subclass of QE
), which is implemented by a call on ::DeleteFiber
, because at that point, the fiber being deleted is not the running fiber.
Note that this appears to be inconsistent with the documentation of DeleteFiber
, which states:
If the currently running fiber calls DeleteFiber
, its thread calls ExitThread
and terminates. However, if a currently running fiber is deleted by another fiber, the thread running the deleted fiber is likely to terminate abnormally because the fiber stack has been freed.
This appears to make no sense at all. Due to the nature of fibers, a “running fiber” cannot be deleted by another fiber (in the same thread), since at the time it is being deleted it is, by definition, not the “running” fiber! However, I suspect that if the paragraph is replaced by one which says (with my suggested changes italicised)
If the currently running fiber calls DeleteFiber
, its thread calls ExitThread
and terminates. However, if a currently running fiber is deleted by another thread, the thread running the deleted fiber is likely to terminate abnormally because the fiber stack has been freed. If a fiber has been deleted, any attempt to use SwitchToFiber
to return control to it is likely to cause its thread to terminate abnormally because the fiber stack has been freed.
I am currently researching this with Microsoft. The current documentation paragraph, as written, would make it impossible to ever call DeleteFiber
!
The question then arises: since fibers appear in many ways to be coroutines, how is the parameter passed to the killer? In this case, a member variable killTarget
was added to the Queue
class. Because we are working with non-preemptive fibers, and we are assuming that there is no concurrent access to the Queue
object from another thread (this is a basic assumption in this code), there is no need to worry about any form of synchronization; furthermore, no fiber other than the one being switched to can get control, so there is no problem in the use of such a variable (this would be a fatal design error in a preemptive multithreaded system, but a perfectly valid approach in a non-preemptive fiber system).
To avoid any need for explicit initialization or finalization, I create the kill fiber only if it does not already exist, and will show later where it is deleted. Why do it this way instead of in the constructor and destructor?
Mostly, it was an issue of lifetime. I felt that the fiber should not exist any longer than it needs to. There also seems to be some serious problems with determining exactly when it would be possible to delete the fiber, particularly since the destructor for the queue might run after the program has done a ::ConvertFiberToThread
call to release the fiber resources, so the effects of doing a ::DeleteFiber
under these conditions appear to be undefined.
This does lead to the question of when the kill fiber can be deleted. I chose to delete it when the queue goes empty. Note that this causes a reversion of scheduling to the “main fiber”, actually the empty-queue fiber. Since the queue is empty, there will be no more need to be able to kill fibers, and the killFiber
fiber can be safely deleted. Note that the program may, as a consequence of having done all the processing, choose to create more queue entries; in this case, the killFiber
is automatically re-created, “on demand”.
The killer fiber is quite simple; the key here is that since it is a separate fiber, it can now safely ::DeleteFiber
the fiber which has now completed its operation.
void CALLBACK Queue::killer(LPVOID)
{
delete killTarget;
next();
}
The next method
The next
method dequeues the next element from the queue and schedules it to run by calling ::SwitchToFiber
on its fiber. If there are no more fibers queued up in the queue, it reverts to the fiber established by the SetEmptyFiber
. Note that a failure to call SetEmptyFiber
will be considered a fatal error.
void Queue::next()
{
PQE qe = (PQE)removeFromQueue();
if(qe != NULL)
{
qe->Display(_T("Dequeued QE"));
qe->run();
}
else
{
delete killFiber;
killFiber = NULL;
TRACE( (_T("Queue empty: switch to EmptyFiber\n")) );
ASSERT(emptyFiber != NULL);
emptyFiber->run();
}
}
Some other issues
Why did I use a QE
structure for the killFiber
and the emptyFiber
? Since these are not actually scheduled, why not use a raw CFiber
object for these instead?
In the case of the emptyFiber
, it is possible that someone might choose, instead of the approach I took, to create a priority-scheduled queue, and make the empty fiber be simply the lowest-priority fiber. I felt it would be easier to see how to do this if the emptyFiber
appeared to be a schedulable fiber. Since the caller has to create the fiber, it also meant the underlying implementation could be changed without changing the user interface.
The choice for the killFiber
is harder to justify; in this case, it became more an issue of consistency rather than one of functionality, particularly because this never escapes to the user interface level. However, doing this way allows me to decide to do “lazy killing”, in that I could consider scheduling the kill fiber as simply one more task to do, and could either insert it at the head of the queue or later in the queue. However, note that if I were to insert it at other than the first queue item, then the use of the global killTarget
would no longer be viable. This is because the killTarget
is explicitly set for each fiber to be killed, and if I could have more than one pending kill request, this is insufficient (in this case, I would probably create the fibers on demand, and use the fiber parameter which would have to be a QE
object; I would have to create a class Killer : public QE
to maintain the context I need).
The problem domain
The goal of the sample program is to take a set of file names on the command line, and print out the files, one file per fiber. To make it appear as a concurrent program, the program is defined to print out only a few lines at a time, then the fiber is descheduled and another fiber runs. When end-of-file is reached, the fiber is no longer scheduled. Upon completion of the program, all the fibers are deleted.
To solve this, a class is derived from the QE
class:
class CReaderFiber : public QE {
public:
CReaderFiber(LPCTSTR f, int c, Queue * q) : QE(reader) {
name = f;
count = c;
queue = q;
file = NULL;
}
virtual ~CReaderFiber() { if(file != NULL) fclose(file); }
public:
LPCTSTR name;
int count;
Queue * queue;
public:
virtual void Display(LPCTSTR s);
public:
FILE * file;
char buffer[MAX_LINE];
protected:
static void CALLBACK reader(LPVOID p);
};
This derived class holds all the problem-specific information, such as the number of lines to write during each execution of the fiber, the name of the file, the buffer holding the input, and so on. The fiber function itself is a static class member.
void CALLBACK CReaderFiber::reader(LPVOID p)
{
CReaderFiber * rf = (CReaderFiber *)p;
TRACE( (_T("reader: called for %s\n"), rf->name) );
rf->file = _tfopen(rf->name, _T("ra"));
if(rf->file == NULL)
{
DWORD err = ::GetLastError();
reportError(err, rf->name);
TRACE( (_T("reader: could not open %s\n"), rf->name) );
rf->queue->kill();
return;
}
Once the file is opened, we simply go into an infinite loop in the fiber, reading and printing lines. Note that this code assumes that a Unicode version of the program will be reading Unicode files and an ANSI version of the program will be reading ANSI (8-bit native code page) files. The generalization where either version can read or write either type of file is left as an Exercise For The Reader. Note that within the fiber loop, after printing the count
number of lines, the fiber yields. Therefore another fiber will run.
while(TRUE)
{
for(int i = 0; i < rf->count; i++)
{
if(_fgetts(rf->buffer, MAX_LINE, rf->file) == NULL)
{
TRACE( (_T("reader: done with %s\n"), rf->name) );
rf->queue->killFiber();
ASSERT(FALSE);
}
_tprintf(_T("%s"), rf->buffer);
}
TRACE( (_T("reader: yielding for %s after %d lines\n"),
rf->name, rf->count) );
rf->queue->yield();
}
}
The main program
The command line has the syntax:
argv[0]
is the program nameargv[1]
is the count of the number of lines to display on each iterationargv[2..argc-1]
are the names of the files
int _tmain(int argc, TCHAR * argv[])
{
Queue queue;
if(argc < 3)
{
usage();
return 1;
}
int lines = _ttoi(argv[1]);
if(lines <= 0)
{
_tprintf(_T("Illegal buffer count (%d)\n"), lines);
usage();
return 1;
}
Once the boring details of the validation of the arguments is finished, the real work begins. First, an array is created to hold all the CFiber
-derived objects. To simplify the code and eliminate the need to worry about all the offsets, I just created an array the same size as the argc
array and ignore the first two entries in it.
CReaderFiber ** fibers = new CReaderFiber*[argc];
In order to handle the “main fiber”, which will be this thread converted to a fiber, I will need a QE
object to allow it to be scheduled. The ConvertThreadToFiber
method will accomplish this.
PQE mainFiber = new QE();
if(!mainFiber->ConvertThreadToFiber())
{
DWORD err = ::GetLastError();
reportError(err, _T("Converting thread to fiber"));
return 1;
}
Next, I scan the list of file names. For each file in the command line, I create a new CReaderFiber
object and stick it in the array of fibers. As each instance of a fiber is created, I place it in the queue.
for(int i = 2; i < argc; i++)
{
fibers[i] = new CReaderFiber(argv[i], lines, &queue);
if(fibers[i] == NULL)
{
DWORD err = ::GetLastError();
reportError(err, _T("Creating fiber"));
return 1;
}
queue.appendToQueue(fibers[i]);
}
When all the threads have finished, control will return to this thread, the “main fiber”. To accomplish this, we have to set the mainFiber
(the result of ConvertThreadToFiber
) to send control back here when the queue is empty (all the contents of all the files have been printed).
queue.SetEmptyFiber(mainFiber);
At this point, we start scheduling the fibers. The fibers will now run until all the contents of the files have been printed.
queue.next();
When all the contents have been printed, the ::SwitchToFiber
call will switch execution to this fiber, and the code below will execute.
TRACE( (_T("Finished\n")) );
delete mainFiber;
::ConvertFiberToThread();
return 0;
}
Example of output
The command line arguments supplied were:
5 a.txt b.txt c.txt
where the contents of the files were of the form “File <filename> <line number>”. The output is shown below; the grouping-of-5 is based on the first parameter to the command line. Note the changes shown: file a.txt has 200 lines, b.txt has 250 lines, and c.txt has 300 lines. These were created by a little program called "datagen", by the following command lines:
datagen 200 "File A" > a.txt
datagen 250 "File B" > b.txt
datagen 300 "File C" > c.txt
The output:
File A 1
File A 2
File A 3
File A 4
File A 5
File B 1
File B 2
File B 3
File B 4
File B 5
File C 1
File C 2
File C 3
File C 4
File C 5
File A 6
File A 7
File A 8
File A 9
File A 10
File B 6
File B 7
File B 8
File B 9
File B 10
File C 6
File C 7
File C 8
File C 9
File C 10
File A 11
File A 12
File A 13
File A 14
File A 15
…
File C 191
File C 192
File C 193
File C 194
File C 195
File A 196
File A 197
File A 198
File A 199
File A 200
File B 196
File B 197
File B 198
File B 199
File B 200
File C 196
File C 197
File C 198
File C 199
File C 200
File B 201 <- Note that file A, with 200 lines is no longer in the mix after this point
File B 202
File B 203
File B 204
File B 205
File C 201
File C 202
File C 203
File C 204
File C 205
File B 206
File B 207
File B 208
File B 209
File B 210
File C 206
File C 207
File C 208
File C 209
File C 210
File B 211
File B 212
File B 213
File B 214
File B 215
…
File B 246
File B 247
File B 248
File B 249
File B 250
File C 246
File C 247
File C 248
File C 249
File C 250
File C 251 <- Note that file B, with 250 lines, is no longer in the mix after this point
File C 252
File C 253
File C 254
File C 255
File C 256
File C 257
File C 258
File C 259
File C 260
Some additional fiber issues
This code makes a number of simplifying assumptions which should not be seen as limiting in the general fiber world. For example:
- All the fibers will be running in just one thread, the same thread in which they were created
- The fiber, once created, will remain attached to the same
CFiber
object
These are both implementation choices. While it is possible to have a fiber be executed by one thread, and then another (::SwitchToFiber
doesn’t seem to care which thread the fiber was created in), there are certain risks involved in doing so. If you assume your fibers have no synchronization issues because they are fibers, then suddenly start executing them in separate threads, you have to be exceedingly cautious that you have not introduced a synchronization problem, and furthermore, under maintenance, be careful that no future synchronization problem could arise.
Note that if you choose to run fibers in different threads (that is, a given fiber might be running in different threads; not the situation where multiple threads are each running fibers only within themselves), be sure you read about the /GT compiler option, “Support fiber-safe thread local storage”, before proceeding. This has an impact on the use of __declspec(thread)
variables when used in fibers that might be later scheduled in different threads.
My choice of using the ::GetFiberData
to obtain the fiber data was a simplifying assumption, and need not be considered a definitive approach to how this is done. It simplified the code for one case, which is the case I was implementing.
The usual miscellany
As in any real program, there are a few things done that are not part of the original goal of the program, but need to be done for either functionality, elegance, convenience, or some other good reason. This program is no exception, although some of these components had been developed independently and simply used here.
ASSERT
The MFC ASSERT
macro is well-designed. It merely reports a problem, but it does not abort the program. This is a valuable feature. The programmer can use the ASSERT
macro to aid in debugging, but has the option, upon return, of doing “graceful recovery” and allowing the programmer to do anything from hand-tweaking values with the debugger to doing program changes and using edit-and-go to re-execute lines. Overall, an intelligent and elegant design.
The assert
function of the C library, on the other hand, is poorly-designed, using what I call the “PDP-11 mindset”, referring to the earliest minicomputer that popularized the C language. The philosophy then was that programs were small, and it was OK to terminate a program on an internal error. In the world of GUI design, this is always a bad design decision; only the user is permitted to terminate the program.
So, I needed an ASSERT
macro for non-MFC applications. In addition, I wanted this code to easily fit inside an MFC app (note that my C++ classes in this article are all standalone, rather than being based on MFC classes like CObject
).
The first trick here was the header file, assert.h:
#pragma once
#ifndef ASSERT
#ifdef _DEBUG
#define ASSERT(x) if(!(x))
\
fail(_T("Assertion Failure %s(%d)\n"), __FILE__, __LINE__)
#define VERIFY(x) ASSERT(x)
extern void fail(LPCTSTR fmt, ...);
#else
#define ASSERT(x)
#define VERIFY(x) x
#endif // _DEBUG
#endif // ASSERT
The fail
method is straightforward:
void fail(LPCTSTR fmt, ...)
{
va_list args;
va_start(args, fmt);
TCHAR buffer[1024];
_vstprintf(buffer, fmt, args);
va_end(args);
OutputDebugString(buffer);
}
However, note how dangerous this is! What if the data were to exceed 1024 characters? The answer is simple: this would be a fatal buffer-overflow situation!
Because I am using only a filename (maximum MAX_PATH
characters (nominally 260 characters)) and a line number, this is safe. But it is not safe in general!
Unfortunately, this had been done in Visual Studio 6, and there is no solution. But in Visual Studio .NET, after the numerous security holes caused by buffer overflows that Microsoft has been victimized by, the appropriate primitives were added. The important new calls needed here are _vscprintf/_vscwprintf
, or as defined “bimodally” in tchar.h, _vsctprintf
, which compiles as appropriate for ANSI or Unicode apps. These return the number of characters required to hold the formatted string (not including the NULL
terminator). Therefore, a properly-written function for VS.NET would be as shown below:
void fail(LPCTSTR fmt, ...)
{
va_list args;
va_start(args, fmt);
int n = _vsctprintf(fmt, args);
LPTSTR buffer = new TCHAR[n + 1];
_vsctprintf(buffer, fmt, args);
va_end(args);
OutputDebugString(buffer);
delete [] buffer;
}
In MFC, this would be easily accomplished by using the CString::FormatV
method, which implements a similar algorithm to _vsctprintf
, but the premise in this code is that we are not using MFC.
Note that this code does not pop up a message box or perform interaction with the user. But it does allow the program to continue. I have often just set a breakpoint at the end of this function if I need to have it stop; or you can create something more elaborate using DebugBreak
.
The TRACE macro
In the file debug.cpp, I have an implementation of simple output tracing. Note that it looks remarkably like the fail
method described above, including the buffer-overflow bug (and the same solution applies if VS.NET is being used). However, we have an additional problem with TRACE
that we did not have with ASSERT
: a variable number of arguments.
The macro system in C represents the state-of-the-art for macro systems of the 1950s, and is nothing as elaborate as a lot of programming languages have had. A good macro system is actually Turing-equivalent in that you can write, in the macros, programs that write your code. The C macro system is, well, to call it “primitive” is a bit unfair. In many cases, a stone ax was far more sophisticated. Nonetheless, it is what we have to work with, so we have to deal with it as best as we can.
The trick here is taken from the KdPrint
macro of the Microsoft DDK. To get a variable number of arguments into a macro call, you have to wrap them in an extra set of parentheses, so syntactically, they look like a single argument to the macro. The definition of the TRACE
macro, and its accompanying function, is therefore given as (from debug.h):
void CDECL trace(LPCTSTR fmt, ...);
#ifdef _DEBUG
#define TRACE(x) trace x
#else
#define TRACE(x)
#endif
A call on the macro might be done as:
TRACE( (_T("The result for 0x%p is %d (%s)\n"),
object, object->value, object->name) );
Note my tendency to use whitespace between the parameter parentheses of the TRACE
macro and the parameter parentheses of the argument list.
Note also the use of %p to get an address to print. It is very important to develop this habit instead of using 0x%08x or some other non-portable method. This would not work correctly if the program were ported to Win64 (the format would have to be hand-edited to be 0x%16I64x; note the first argument after the % is the number 16, and the next sequence is the qualifier I64 to indicate the incoming argument is a 64-bit value, and finally the x to specify the desired formatting).
Reporting errors
Generally, I use a much more general implementation wrapping ::FormatMessage
, but for this simple example, a console-based app, with no GUI interface, and no use of MFC, a simpler implementation suffices.
In keeping with my philosophy of "no two places in my code ever issue the same error message", I allow the user to pass in a unique identification string. For this toy example, I have not bothered to provide for localization via the STRINGTABLE
.
void reportError(DWORD err, LPCTSTR reason)
{
LPTSTR msg;
if(FormatMessage(FORMAT_MESSAGE_ALLOCATE_BUFFER | FORMAT_MESSAGE_FROM_SYSTEM,
NULL,
err,
0,
(LPTSTR)&msg,
0,
NULL) == 0)
{
_tprintf(_T("Error %d (0x%08x)\n"), err);
return;
}
_tprintf(_T("Error %s\n%s\n"), reason, msg);
LocalFree(msg);
}