The best-laid schemes of threads and objects
Go oft awry,
And leave us nothing but grief and pain,
For promised joy!
(with apologies to R. Burns)
Background - The Promised Land
Multi-threading and object-oriented languages, each come with their promises of making life simpler for the creators of complex systems. Both of them offer methods for cutting those complex systems into manageable pieces with well-defined interaction between them. On one side multi-threading tries to divide the work into small pieces and assign each piece to a separate processor, be it physical or virtual. Someone can than just wait for all the pieces of work to be finished and assemble the final results.
The other one, object-oriented languages, says that only the important information should be visible to the outside world, leaving implementation details hidden inside those "objects", and also that more complicated objects can be created from simpler ones through inheritance or composition.
Wouldn't it be grand if we could join these two concepts together and have some little thread objects that do their work and hide all unnecessary implementation details? As we will see it is indeed possible but it's not that easy.
As an example, we will look at how to find all the prime numbers less than a certain value and we will stick to the good old C++ because it is still considered one of the most efficient language.
A Simple Program using std::thread
The C++ standard has included since 2011 the std::thread
objects. As our first multi-threaded program, we will use this:
#include <thread>
bool is_prime (int n)
{
for (auto i = n - 1; i > 1; --i)
if (n % i == 0)
return false;
return true;
}
int main ()
{
std::vector<int> primes;
int n = 0;
auto worker = [&]() {
for (auto i = 2; i < 20; ++i)
{
if (is_prime (i))
primes.at (n++) = i;
}
};
std::thread th (worker);
th.join ();
std::cout << "Primes: ";
for (auto val : primes )
std::cout << val << ' ';
}
What we have here: a very simple-minded is_prime
function is called repeatedly by the worker
function. It then puts the primes in a vector. The main function simply creates a thread that runs the worker
function and waits until it finishes before printing the results. This is not very multi-threaded as we have only a single thread apart from the main thread, but we hope to improve.
Exception Issues
Surprisingly or not, the program doesn't work. It has a pretty obvious bug: the primes
vector is empty and setting a non-existent element:
primes.at (n++) = i;
triggers an std::out_of_range
exception.
We could easily fix it by changing the code to:
primes.push_back (i);
but let's see if we can do some exception handling. We will wrap the whole main
function in a try
...catch
block and let it handle the out of range exception. Here is our new main
function:
int main ()
{
std::vector<int> primes;
int n = 0;
try {
auto worker = [&]() {
for (auto i = 2; i < 20; ++i)
{
if (is_prime (i))
primes.at (n++) = i;
}
};
std::thread th (worker);
th.join ();
std::cout << "Primes: ";
for (auto val : primes)
std::cout << val << ' ';
}
catch (std::exception& x) {
std::cout << "Exception: " << x.what () << std::endl;
}
}
The exception handler is not called and we end up with exactly the same error as before.
The explanation has to do with a very important rule about threads:
Each thread has its own stack.
When an exception occurs, the C++ runtime begins a process called stack unwinding in which it goes through the stack frame of each called function looking for an exception handler. Our exception handler, however, is on the stack of the main thread so it never gets called. Exceptions do not propagate between threads.
Before moving to something else, let's first fix our program. We will do it in two steps. First, we move the try
... catch
block in the thread function:
auto worker = [&]() {
try {
for (auto i = 2; i < 20; ++i)
{
if (is_prime (i))
primes.at (n++) = i;
}
}
catch (std::exception& x)
{
std::cout << "Exception: " << x.what () << std::endl;
}
};
This time, it will indeed catch the exception and the program output is:
Exception: invalid vector subscript
Primes:
As a final step, we now fix our little "bug". The finished program is:
int main ()
{
std::vector<int> primes;
auto worker = [&]() {
try {
for (auto i = 2; i < 20; ++i)
{
if (is_prime (i))
primes.push_back(i);
}
}
catch (std::exception& x)
{
std::cout << "Exception: " << x.what () << std::endl;
}
};
std::thread th (worker);
th.join ();
std::cout << "Primes: ";
for (auto val : primes)
std::cout << val << ' ';
}
And the output is:
Primes: 2 3 5 7 11 13 17 19
Thread Encapsulation
So far, we've seen how to use std::thread
objects to do the work but we still have to figure out how to pack together a thread and its private data in some kind of object.
Let's say that our primality checking thread needs to keep also a count of the number of primes it found. Also, we want the vector of results to be passed somehow to the thread.
A solution could be to derive an object prime_finder
form std::thread
. Something like this:
class prime_finder : public std::thread
{
public:
prime_finder (std::vector<int>& v)
: std::thread ([this] {this->worker (); })
, count (0)
, primes (v) {}
int get_count () { return count; }
private:
int count;
inline
void worker ()
{
try {
for (auto i = 2; i < 20; ++i)
{
if (is_prime (i))
{
primes.push_back (i);
count++;
}
}
}
catch (std::exception& x)
{
std::cout << "Exception: " << x.what () << std::endl;
}
};
std::vector<int>& primes;
};
int main ()
{
std::vector<int> results;
prime_finder th (results);
th.join ();
std::cout << "Found " << th.get_count() << " primes: ";
for (auto val : results)
std::cout << val << ' ';
}
And guess what? It even works:
Found 8 primes: 2 3 5 7 11 13 17 19
But if you value your good night sleep, please, don't use code like that! Not unless you want to be woken up at any hour by irate coworkers or customers complaining your code just crashed and driving you mad that you cannot reproduce those errors.
To find out what's wrong with this code, let's see what happens when you instantiate the prime_finder
object in the main
function. The prime_finder
constructor allocates space for the object, then invokes the constructors for any base objects, in this case, the std::thread
constructor. From the C++ standard for std::thread
constructor:
Creates new std::thread
object and associates it with a thread of execution. The new thread of execution starts executing /*INVOKE*/(std::move(f_copy), std::move(args_copy)...)
The key here is that the new thread starts executing, potentially before the prime_finder
constructor has finished setting up the object. It is now up to the OS scheduler to let the main thread finish the initialization of the prime_finder
object (initialize count to 0
and set the address of primes
vector) or switch immediately to the newly created thread. Things can run smoothly for a long time until the OS scheduler wakes up on the wrong side of the bed and our thread starts running too early and the whole program crashes.
To exemplify this problem, we can introduce an artificial delay in the prime_finder
constructor:
class prime_finder : public std::thread
{
public:
prime_finder (std::vector<int>& v)
: std::thread ([this] {this->worker (); })
, primes (v)
{
std::this_thread::sleep_for (std::chrono::milliseconds (10));
count = 0;
}
Now the result is:
Found 0 primes: 2 3 5 7 11 13 17 19
The count
variable was initialized to 0
long after the worker
function has finished.
The important lesson here is:
DO NOT inherit from std::thread
object.
A Better thread Class
I have to admit, I wasn't particularly impressed with the design of std::thread
class. While the issues related to exception handling are somewhat unavoidable, the idea of running the new thread at construction time seems more like a blunder. Luckily, I didn't have to endure this problem having designed, long before the C++11, my own thread class as part of the mlib library.
Here are the relevant parts:
class thread : public syncbase
{
public:
thread (std::function<int ()> func);
virtual ~thread ();
virtual void start ();
protected:
thread (const char *name=0, bool inherit=false,
DWORD stack_size=0, PSECURITY_DESCRIPTOR sd=NULL);
virtual void init ();
virtual void run ();
private:
static unsigned int _stdcall entryProc (thread *ts);
};
The base class, syncbase
, is just a wrapper for handles of any Windows synchronization objects like semaphores, mutexes or events. The public
constructor is very similar to std::thread
constructor. It creates a thread object that will run the function. However, the new thread is not started yet. To start it, users have to call the start
function. There is also a protected constructor that can be used by derived objects that need a finer control over aspects like thread stack size and security attributes.
On the inside, starting up a new thread is a relatively complicated process that is done in phases:
- The constructor(s) call the Windows
_beginthreadex
function to create a new thread having entryProc
as body. The new thread is created in a suspended state so it is guaranteed not to start running. - After the
_beginthreadex
function returns, the constructor resumes the newly created thread and waits for a created
semaphore to become signaled. - The
entryProc
function can now run. It signals the created
semaphore and waits for the started
semaphore. - Because the
created
semaphore has been signaled, the constructor can now proceed and it returns. If the thread constructor was invoked as part of the constructor for a derived object, the rest of construction process can continue.
As I said before, to really start the new thread, users have to call the start
function. This will signal the started
semaphore and the entryProc
function will invoke first a virtual init
function that can do any initialization work and then the run
function which is the actual run loop of the thread.
Note that these thread objects are not light-weight. Each object comes with two semaphores attached and there are two context switches to create them. They are safe and powerful but there is a price to pay for that.
Here is our program reworked to use the mlib::thread
objects:
#include "mlib/thread.h"
class prime_finder : public mlib::thread
{
public:
prime_finder (std::vector<int>& v)
: primes (v) {
std::this_thread::sleep_for (std::chrono::milliseconds (10));
count = 0;
}
int get_count () { return count; }
private:
int count;
inline void run ()
{
for (auto i = 2; i < 20; ++i)
{
if (is_prime (i))
{
primes.push_back (i);
count++;
}
}
}
std::vector<int>& primes;
};
int main ()
{
std::vector<int> results;
prime_finder th (results);
try {
th.start ();
th.join ();
}
catch (std::exception& x)
{
std::cout << "Exception: " << x.what () << std::endl;
}
std::cout << "Found " << th.get_count () << " primes: ";
for (auto val : results)
std::cout << val << ' ';
}
Throwing Exceptions Across Thread Borders
A sharp-eyed reader will notice that I moved the exception handling code from the worker thread back to the main thread. This is possible because the thread::entryProc
function has a try
...catch
block that catches all exceptions. The exceptions are stored in a std::exception_ptr
object inside the thread. When the main thread calls the thread::wait
function, the exception, if there was one, is re-thrown in the context of the main thread. To verify, we modify the run
function to throw an exception:
inline void run ()
{
int t = std::vector<int> ().at (1); for (auto i = 2; i < 20; ++i)
{
if (is_prime (i))
{
primes.push_back (i);
count++;
}
}
}
The output is:
Exception: invalid vector subscript
Found 0 primes:
You don't have to move the exception handling code in the main thread. You can still place try
...catch
blocks in the run
function if that's more appropriate to the program's logic but, if you need one centralized error handling, mlib::thread
can transfer the errors across thread boundaries. This transfer however is "delayed" - the exception will be re-thrown when the join
function is invoked.
Parting Thoughts
Encapsulating threads in objects is not so simple but offers definite advantages. It allows you to differentiate between code and data that need to be accessed from other threads, that I call foreign, versus the internal data and functions, that I call own. As a general rule, own data and functions should be kept as private
or protected
members while foreign functions form the public
interface. Constructors and destructors are inherently foreign and that's why they require special care. For other foreign functions, I favor a pattern where the caller transmits the request through some command semaphore or event and than waits for results:
class cool_thread : public mlib::thread
{
public:
stuff do_domething_cool () {
thread_critical_section.enter();
command = WHAT_TO_DO;
commad_semaphore.signal ();
thread_critical_section.leave ();
results_semaphore.wait ();
thread_critical_section.enter();
stuff s = get_results ();
thread_critical_section.leave ();
return s;
}
private:
stuff& get_results () {}
Aside from the two issues I discussed, exception handling and construction dangers, there is third one I'd like to mention without providing any code to demonstrate it. Thread destruction can also be a dangerous time. As a rule, it should never be done by invoking the object's destructor because you cannot control the state the thread is in when it gets destructed. In the sample above, if the thread gets destructed while caller waits for results, the caller would deadlock.
History
- 17th June, 2022 - Initial version