Introduction
Multi-threading, often a necessary performance optimization since the CPU clock frequencies stopped growing, has also become one of the biggest software design disasters.
Multi-threaded applications are very hard to debug. They are not as deterministic, less predictable and more complex than single-threaded applications. Determinism here means that in case of a thread synchronization bug, application may not only do wrong stuff but also do different wrongs from time to time, with program logic depending on thread scheduler's behavior. This makes software testing not as efficient. Just as the trouble with uninitialized variables, you would not want a program or a function even theoretically depend on the previous state of RAM. Being careful no longer helps either, because bugs eventually get into code - and this is the kind of the least wanted bugs due to their testing immunity. I am not going to advocate determinism extremes e.g. Functional Programming in this article. However, the program input, such as something that affects what the program is doing, must always be clear.
Multi-threading is very appealing to many programmers. It may at first seem that converting an existing algorithm to multi-threaded and running it in parallel with something else is a small change. It often comes as a surprize now many of the objects are reused between different parts of the program and now require proper awareness of the threading policy and not-so-intuitive thread safety features.
Furthermore, because of the discarded determinism of relative execution timing, thread safety of an object is nearly impossible to unit-test.
This article describes a design pattern aimed at reducing the number of threads in application design and expresses a POV on how many threads are should be in a variety of design scenarios.
When to use Multi-threading
There are two valid reasons for using MT:
- Extreme case of performance optimization of CPU-bound applications. When you want to utilize more than one available CPU core. We will argue that the CPU-bound part of the application must be identified and optimized locally.
- You are forced to use a 3rd-party components which does not support proper asynchronous API. For example, .NET's BeginConnect() would accept a callback parameter which will be called in an arbitrary thread (maybe the thread of the caller before the function returns, or maybe a different thread -- depending on the IP address). Such 3rd-party API design leaves you no choice.
Another example would be a 3rd-party API which takes a long time to compute. You cannot call it from the UI thread because then your user would not be able to cancel without killing your application, and again you are forced to create a separate thread. If the implementation is under your control, we will demonstrate how to avoid such software interface.
When not to use multi-threading
- In case an application is IO-bound and splitting CPUs in parallel makes no sense as a performance optimization.
- When you are developing mission critical application that has to be tested for wide variety of possible combinations of input.
- When you have a choice of 3rd-party components
Long execution (e.g. algorithms), IO, communications must all have proper asynchronous API support. Proper asynchronous API may look like this:
ProperAPI.Initialize();
ProperAPI.DoSomethingSmall();
ProperAPI.DoSomethingSmallAgain();
...
So, the invoked code would be responsible to maintain the state, and be able to divide the task in pieces. Control (the instruction pointer register) is like a fireball. You catch it, and you must return it quickly to the caller. An algorithm can never block (steal it) for a long time. It is a shared resource!
Another type of good asynchronous API may be a kind of polling (with all due disadvantages of polling designs):
DoSomething();
if(AreYouFinished())...
DoSomethingAndSetThisFlagWhenDone(event);
Such API would imply that the user will have one master loop, waiting for a list of events for one of them to fire. During this wait, the process will not have access to the CPU.
The "Heartbeat" pattern
Now that we are convinced that cooperative multitasking is useful, how do we do it? So, there is one central loop, called the heartbeat loop. For every object that performs long tasks, the object’s heartbeat function is invoked within this loop. By this, each one of the participating objects is given an opportunity to do a small piece of work and return the control back.
Some pseudocode:
class IHeartBeat {
bool HeartBeat() = 0;
};
something::Run()
{
vector<IHeartBeat *> modules;
modules.push_back(&newModule);
while(true)
{
bool didSomething = false;
for (Module &module: modules)
{
StopWatch stopWatch;
stopwatch.Start();
didSomething ||= module->HeartBeat();
stopwatch.Stop();
if(stopWanted)
return;
....
ProcessSomeUI();
}
Sleep(didSomething ? 0 : 1);
}
}
How to break operations and turn a loop inside out
When you break a continuous long operation into parts, you inevitably have to create a state machine.
For example:
DoStuff() {
DoThing1();
DoThing2();
DoThing3();
}
becomes:
StateManagingAlgorithm::DoOneMoreThing() {
switch(_currentState) {
case 1: DoThing1(); _currentState++; break;
case x: DoThingX(); _currentState++; break;
}
}
And
for(int i=0; i<n; i++)
for(int j=0; j<n; j++)
for(int k=0; k<n; k++)
c[i][k]=a[i][j]*b[j][k];
becomes
bool DoNextThing()
{
if(k>=n) { j++; k=0; }
if(j>=n) { i+=; j=0; }
if(i>=n) return false;
c[i][k]=a[i][j]*b[j][k];
return true;
}
where i,j,k are state variables initialized to zero at first.
Where does the heartbeat loop run?
Well, this depends on existing threading model of the application or a framework it is built upon. Here are some examples.
- Dedicated worker thread: sometimes there's just no way around this. You create a special thread and there you do the main app cycle. You signal it to stop by changing some shared state variable.
- Windows API application : you are given a UI thread and you are the one doing the message loop. This is the most convenient. Message loop wait functions are replaced with peak functions. This is called PeekMessage loop. Life is good - you can get away with only one thread for everything. With MFC, for example, this is not practical, because it implements the windows message loop all by itself.
- Winforms.NET GUI app: you are not given a thread. You can use winforms timer event to do one step of the cycle if your modules do not need frequent access to control [D1] or you will have to actually create a worker (non-UI) thread.
In all cases, a decent UI framework will not force you to create a thread separate from UI.
TODO comment to self: extend on other platforms. overlapped IO, existing APIs.
Ideally, there should be one "Wait for interrupt" mechanism and all overlapped IO will be using it to indicate completion. On Windows, for example, writing to sockets, files, serial ports and almost everything else is done with the same API that supports four type of signaling completion (APC, events, completion ports, ...), while UI events use a distinct mechanism (window messages) because it was developed by a different team, and there is also a distinct sockets API that uses one more synchronization object and API. Window messages and kernel sync objects can be waited for with one function, MsgWaitForMultipleObjects().
High performance applications
When using Sleep(1) or event wait functions which would give up the remaining schedler cycle, it may limit the total heartbeat rate.
One may check what heartbeat() returned, and if it is true, the cycle will run again before surrendering control to the system.
Decisions to make
-
[D1] Do I need frequent access to control?
-
If you have something to do every small period of time, you will have to use Sleep(1) (actively give up schedler tick) instead of waiting for event
-
Alternatively, heartbeat can be timer-invoked, every period of time. This can be a large period if you can get communication events invoke heartbeat asynchronously (see below).
-
[D2] If the application has UI, does our UI framework enable doing everything from one thread?
-
Some UI frameworks such as Winforms or MFC would implement the UI msg loop themselves, so the “one thread per one CPU” may not be possible.
-
Luckily, Winforms provides a nice feature for single-threaded programming known as Control.Invoke(). It means, invoke a callback on UI thread of a control:
_listener.BeginAccept(new AsyncCallback((IAsyncResult ar) => { Invoke(processNewConnectionDelegate, ar); }), null);
and life is good. Even though BeginAccept() is bad and will fire a completion callback on a random thread, this construct will route it safely thru the main message loop.
Winforms also has Winforms.Timer(), which can fire timer events in the UI thread. This is a convenient place to run heartbeat, if not D1.
-
[D3] Can we ever give up control completely until something happens (interrupt), and if yes, how? It may be an infrequent interrupt.
- [D4] Are we forced to use any synchronous (blocking) APIs? We will have to wrap them into separate threads. Synchronous means APIs that may not return for extended periods of time. Forced means we are 100% sure there is no asynchronous (overlapped) alternative to these. Overlapped in this context usually means that the 3rd party framework will do the operation in its own threads or by some other magic. Alternatively, an asynchronous API would provide a heartbeat function of its own, i.e. a function to call every now and then.