This article presents a Thread class that prevents exceptions (such as those caused by bad pointers) from forcing a program to exit. It also describes how to capture information that facilitates debugging when such errors occur in software already released to users.
Introduction
Some programs need to keep running even after nasty things happen, such as using an invalid pointer. Servers, other multi-user systems, and real-time games are a few examples. This article describes how to write robust C++ software that does not exit when the usual behavior is to abort. It also discusses how to capture information that facilitates debugging when nasty things occur in software that has been released to users.
Background
It is assumed that the reader is familiar with C++ exceptions. However, exceptions are not the only thing that robust software needs to deal with. It must also handle POSIX signals, which the operating system raises when something nasty occurs. The header <csignal>
defines the following subset of POSIX signals for C/C++:
SIGINT
: interrupt (usually when Ctrl-C is entered) SIGILL
: illegal instruction (perhaps a stack corruption that affected the instruction pointer) SIGFPE
: floating point exception (includes dividing by zero) SIGSEGV
: segment violation (using a bad pointer) SIGTERM
: forced termination (usually when the kill
command is entered) SIGABRT
: abnormal termination (when abort
is invoked by the C++ run-time environment)
Similar to how exceptions are caught by a catch
statement, signals are caught by a signal handler. Each thread can register a signal handler against each signal that it wants to handle. A signal is simply an int
that is passed to the signal handler as an argument.
Using the Code
The code in this article is taken from the Robust Services Core (RSC). If this is the first time that you're reading an article about an aspect of RSC, please take a few minutes to read this preface.
Compiler options. The approach described in this article requires the following compiler options. They are set in the download's CMake files, but you need to set them in your own project if you are developing code based on this article:
Overview of the Classes
An application developed using RSC derives from Thread
to implement its threads. Everything described in this article then comes for free. This section also describes other classes that collaborate with Thread
.
Thread
Software that wants to be continuously available must catch all exceptions. A single-threaded application could do this in main
. But RSC supports multi-threading, so it does this in a base Thread
class from which all other threads derive. Thread
has a loop that invokes the application in a try
clause that is followed by a series of catch
clauses which handle any exception not caught by the application.
SysThread
This is a wrapper for a native thread and is created by Thread
's constructor. Some of the implementation is platform-specific.
Daemon
When a Thread
is created, it can register a Daemon
to recreate the thread after it is forced to exit, which usually occurs when the thread has caused too many exceptions.
Exception
The direct use of <exception>
is inappropriate in a system that needs to debug problems in released software. Consequently, RSC defines a virtual Exception
class from which all of its exceptions derive. This class's primary responsibility is to capture the running thread's stack when an exception occurs. In this way, the entire chain of function calls that led to the exception will be available to assist in debugging. This is far more useful than the const
char*
returned by std::exception::what
, stating something like "invalid string position
", which specifies the problem but not where it arose and maybe not even uniquely where it was detected.
SysStackTrace
SysStackTrace
is actually a namespace that wraps a handful of functions. The function of most interest is one that actually captures a thread's stack. Exception
's constructor invokes this function, and so does a function (Debug::SwLog
) whose purpose is to generate a debug log to record a problem that, although unexpected, did not actually result in an exception. All SysStackTrace
functions are platform-specific.
SignalException
When a POSIX signal occurs, RSC throws it in a C++ exception so that it can be handled in the usual way, by unwinding the stack and deleting local objects. SignalException
, derived from Exception
, is used for this purpose. It simply records the signal that occurred and relies on its base class to capture the stack.
PosixSignal
Each signal supported within RSC must create a PosixSignal
instance that includes its name (e.g. "SIGSEGV"
), numeric value (11
), explanation ("Invalid Memory Reference"
), and other attributes. The PosixSignal
instances for various signals defined by the POSIX standard, including those in <csignal>
, are implemented as private
members of the simple class SysSignals
. The subset of signals supported on the target platform are then instantiated by SysSignals::CreateNativeSignals
.
Throwing a SignalException
turns out to be a useful way to recover from serious errors. RSC therefore defines signals for internal use in NbSignals.h. An instance of PosixSignal
is also associated with each of these:
constexpr signal_t SIGNIL = 0; constexpr signal_t SIGWRITE = 121; constexpr signal_t SIGCLOSE = 122; constexpr signal_t SIGYIELD = 123; constexpr signal_t SIGSTACK1 = 124; constexpr signal_t SIGSTACK2 = 125; constexpr signal_t SIGPURGE = 126; constexpr signal_t SIGDELETED = 127;
Walkthroughs
Creating a Thread
Now for the details. Let's start by creating a Thread
. A subclass can add its own thread-specific data—which means that there is no need for thread_local
—but we're interested in Thread
's constructor:
Thread::Thread(Faction faction, Daemon* daemon) :
daemon_(daemon),
faction_(faction)
{
priv_.reset(new ThreadPriv);
auto prio = FactionToPriority(faction_);
systhrd_.reset(new SysThread(this, prio,
ThreadAdmin::StackUsageLimit() << BYTES_PER_WORD_LOG2));
Singleton<ThreadRegistry>::Instance()->Created(systhrd_.get(), this);
if(daemon_ != nullptr) daemon_->ThreadCreated(this);
}
This constructor creates an instance of SysThread
, which in turn creates a native thread. The arguments to SysThread
's constructor are the thread's attributes:
- the
Thread
object being constructed (this
) - its entry function (
EnterThread
for all Thread
subclasses; it receives this
as its argument) - its priority (RSC bases this on a thread's
Faction
, which is not relevant to this article) - its stack size, defined by the configuration parameter
ThreadAdmin::StackUsageLimit
The new thread is then added to ThreadRegistry
, which tracks all active threads.
Here is SysThread
's constructor:
SysThread::SysThread(Thread* client, Priority prio, size_t size) :
nid_(NIL_ID),
nthread_(0),
priority_(Priority_N),
signal_(SIGNIL)
{
Create(client, size);
SetPriority(prio);
}
This has invoked two platform-specific functions (see SysThread.win.cpp if you're interested in the details):
Create
creates the native thread. Its platform-specific handle is saved in nthread_
, and its thread number is saved in nid_
. SetPriority
sets the thread's priority.
Entering a Thread
EnterThread
is the entry function for all Thread
subclasses.
static unsigned int EnterThread(void* arg)
{
Debug::ft("NodeBase.EnterThread");
auto thread = static_cast<Thread*>(arg);
return thread->Start();
}
This enters the following, which sets up the safety net before it invokes thread-specific code:
main_t Thread::Start()
{
auto started = false;
while(true)
{
try
{
if(!started)
{
RegisterForSignals();
Ready();
Resume(Thread_Start);
started = true;
}
auto rc = systhrd_->Start();
if(rc != 0) return Exit(rc);
switch(priv_->traps_)
{
case 0:
break;
case 1:
{
priv_->traps_ = 0;
Recover();
break;
}
default:
return Exit(priv_->signal_);
}
Enter();
return Exit(SIGNIL);
}
catch(SignalException& sex)
{
switch(TrapHandler(&sex, &sex, sex.GetSignal(), sex.Stack()))
{
case Continue: continue;
case Release: return Exit(sex.GetSignal());
default: return AbnormalExit(sex.GetSignal());
}
}
catch(Exception& ex)
{
switch(TrapHandler(&ex, &ex, SIGNIL, ex.Stack()))
{
case Continue: continue;
case Release: return Exit(SIGNIL);
default: return AbnormalExit(SIGNIL);
}
}
catch(std::exception& e)
{
switch(TrapHandler(nullptr, &e, SIGNIL, nullptr))
{
case Continue: continue;
case Release: return Exit(SIGNIL);
default: return AbnormalExit(SIGNIL);
}
}
catch(...)
{
switch(TrapHandler(nullptr, nullptr, SIGNIL, nullptr))
{
case Continue: continue;
case Release: return Exit(SIGNIL);
default: return AbnormalExit(SIGNIL);
}
}
}
}
When first entered, this code invoked RegisterForSignals
, which registers SignalHandler
against each signal that is native to the underlying platform. This is done by invoking signal
(in <csignal>
), which must be done by every thread, for each signal that it wants to handle, when it is first entered and after each time that it receives a signal. This ensures that the thread will receive POSIX signals so that it can recover instead of allowing the program to abort:
void Thread::RegisterForSignals()
{
auto& signals = Singleton<PosixSignalRegistry>::Instance()->Signals();
for(auto s = signals.First(); s != nullptr; signals.Next(s))
{
if(s->Attrs().test(PosixSignal::Native))
{
signal(s->Value(), SignalHandler);
}
}
}
We will look at SignalHandler
later. To complete this section, we need to look at Start
, which EnterThread
invoked.
Each time through its loop, Start
began by invoking SysThread::Start
, which allows the native thread to perform any work that is required before it can safely run. This is platform-specific code which looks like this on Windows:
signal_t SysThread::Start()
{
if(status_.test(StackOverflowed))
{
if(_resetstkoflw() == 0)
{
return SIGSTACK2;
}
status_.reset(StackOverflowed);
}
_set_se_translator((_se_translator_function) SE_Handler);
return 0;
}
The first part of this deals with thread stack overflows, which can be particularly nasty. The last part installs a Windows-specific handler. Windows doesn't normally raise POSIX signals, but instead has what it calls "structured exceptions". We therefore provide SE_Handler
, which translates a Windows-specific exception into a POSIX signal that can be thrown using our SignalException
. The code for this will appear later.
Exiting a Thread
Exit
is normally invoked to exit a thread; this occurs when its Enter
function returns or if it is forced to exit after an exception. Exit
is only bypassed if a Thread
somehow gets deleted while it is still running. In that case, TrapHandler
returns Return
, which causes the thread to exit immediately, given that it no longer has any objects to delete.
When a Thread
object is deleted, its Daemon
(if any) is notified so that it can recreate the thread. RSC also tracks mutex ownership, so it releases any mutex that the thread owns. Most operating systems do this anyway, but RSC generates a log to highlight that this occurred. Tracking mutex ownership also allows deadlocks to be debugged as long as the CLI thread is not involved in the deadlock.
main_t Thread::Exit(signal_t sig)
{
delete this;
return sig;
}
Thread::~Thread()
{
Suspend();
ReleaseResources();
}
void Thread::ReleaseResources()
{
Singleton<ThreadRegistry>::Extant()->Erase(this);
if(dameon_ != nullptr) daemon_->ThreadDeleted(this);
systhrd_.reset();
}
Receiving a Windows Structured Exception
As previously mentioned, we register SE_Handler
to map each Windows exception to a POSIX signal:
void SE_Handler(uint32_t errval, const _EXCEPTION_POINTERS* ex)
{
signal_t sig = 0;
switch(errval) {
case DBG_CONTROL_C: sig = SIGINT;
break;
case DBG_CONTROL_BREAK: sig = SIGBREAK;
break;
case STATUS_ACCESS_VIOLATION: sig = AccessViolationType(ex);
break;
case STATUS_DATATYPE_MISALIGNMENT: case STATUS_IN_PAGE_ERROR: case STATUS_INVALID_HANDLE: case STATUS_NO_MEMORY: sig = SIGSEGV;
break;
case STATUS_ILLEGAL_INSTRUCTION: sig = SIGILL;
break;
case STATUS_NONCONTINUABLE_EXCEPTION: sig = SIGTERM;
break;
case STATUS_INVALID_DISPOSITION: case STATUS_ARRAY_BOUNDS_EXCEEDED: sig = SIGSEGV;
break;
case STATUS_FLOAT_DENORMAL_OPERAND: case STATUS_FLOAT_DIVIDE_BY_ZERO: case STATUS_FLOAT_INEXACT_RESULT: case STATUS_FLOAT_INVALID_OPERATION: case STATUS_FLOAT_OVERFLOW: case STATUS_FLOAT_STACK_CHECK: case STATUS_FLOAT_UNDERFLOW: case STATUS_INTEGER_DIVIDE_BY_ZERO: case STATUS_INTEGER_OVERFLOW: sig = SIGFPE;
_fpreset();
break;
case STATUS_PRIVILEGED_INSTRUCTION: sig = SIGILL;
break;
case STATUS_STACK_OVERFLOW: sig = SIGSTACK1;
break;
default:
sig = SIGTERM;
}
Thread::HandleSignal(sig, errval);
}
Receiving a POSIX Signal
We registered SignalHandler
to receive POSIX signals. Even on Windows, with its structured exceptions, this code is reached after invoking raise
(in <csignal>
):
void Thread::SignalHandler(signal_t sig)
{
RegisterForSignals();
if(HandleSignal(sig, 0)) return;
signal(sig, nullptr);
raise(sig);
}
Converting a POSIX Signal to a SignalException
Now that we have a POSIX signal which was either received by SignalHandler
or translated from a Windows structured exception by SE_Handler
, we can turn it into a SignalException
:
bool Thread::HandleSignal(signal_t sig, uint32_t code)
{
auto thr = RunningThread(std::nothrow);
if(thr != nullptr)
{
throw SignalException(sig, code);
}
auto reg = Singleton<PosixSignalRegistry>::Instance();
if(reg->Attrs(sig).test(PosixSignal::Break))
{
if(!ThreadAdmin::TrapOnRtcTimeout())
{
thr = LockedThread();
if((thr != nullptr) && (SteadyTime::Now() < thr->priv_->currEnd_))
{
thr = nullptr;
}
}
if(thr == nullptr) thr = Singleton<CliThread>::Extant();
if(thr == nullptr) return false;
thr->Raise(sig);
return true;
}
return false;
}
The code after the throw
requires some explanation. Break signals (SIGINT
, SIGBREAK
), which are generated when the user enters Ctrl-C or Ctrl-Break, often arrive on an unknown thread. It is reasonable to assume that the user wants to abort work that is taking too long or, worse, stuck in an infinite loop.
But what work should be aborted? Here, it must be pointed out that RSC strongly encourages the use of cooperative scheduling, where a thread runs unpreemptably ("locked") and yields after completing a logical unit of work. RSC only allows one unpreemptable thread to run at a time, and it also enforces a timeout on such a thread's execution. If the thread does not yield before the timeout, it receives the internal signal SIGYIELD
, causing a SignalException
to be thrown. During development, it is sometimes useful to disable this timeout. So in trying to identify which thread is performing the work that the user wants to abort, the first candidate is the thread that is running unpreemptably. However, this thread will only be interrupted if the use of SIGYIELD
has been disabled and the thread has already run for longer than the timeout.
If interrupting the unpreemptable thread doesn't seem appropriate, the assumption is that CliThread
should be interrupted. This thread is the one that parses and executes user commands entered through the console. So unless CliThread
doesn't exist for some obscure reason, it will receive the SIGYIELD
.
If a thread to interrupt has now been identified, Thread::Raise
is invoked to deliver the signal to that thread.
Signaling Another Thread
Sending a signal to another thread is problematic. The raise
function in <csignal>
only signals the running thread. Nor does Windows appear to expose any function that could be used for the purpose. So what to do?
In RSC, the first thing that most functions do is call Debug::ft
to identify the function that is now executing. These calls were removed from the code in this article, but now it is necessary to mention them. The original (and still extant) purpose of Debug::ft
is to support a function trace tool, which is why most non-trivial functions invoke it. What this trace tool produces will be seen later. The pervasiveness of Debug::ft
also allows it to be co-opted for other purposes. Because a thread is likely to invoke it frequently, it can check if the thread has a signal waiting. If so, boom! It can also check if the thread is at risk of overrunning its stack, in which case boom! (This is better than allowing an overrun to occur. As noted in SE_Handler
, Windows no longer even allows a stack overflow exception to be intercepted.)
Here is the code that delivers a signal to another thread:
void Thread::Raise(signal_t sig)
{
Debug::ft(Thread_Raise);
auto reg = Singleton<PosixSignalRegistry>::Instance();
auto ps1 = reg->Find(sig);
auto thr = RunningThread(std::nothrow);
if(thr == this)
{
throw SignalException(sig, 0);
}
if(ps1->Attrs().test(PosixSignal::Exit))
{
if(priv_->action_ == RunThread)
{
priv_->action_ = SleepThread;
Unblock();
priv_->action_ = ExitThread;
}
}
SetSignal(sig);
if(!ps1->Attrs().test(PosixSignal::Delayed)) SetTrap(true);
if(ps1->Attrs().test(PosixSignal::Interrupt)) Interrupt(Signalled);
}
Given that the target thread can throw a SignalException
for itself, via a check supported by Debug::ft
, Raise
does the following:
- invokes
SetSignal
to record the signal against the thread - invokes
Unblock
(a virtual
function) to unblock the thread if the signal will force it to exit - invokes
SetTrap
if the signal should be delivered as soon as possible instead of waiting until the next time the thread yields (this sets the flag that is checked via Debug::ft
) - invokes
Interrupt
to wake up the thread if the signal should be delivered now instead of waiting until the thread resumes execution
In the above list, whether to invoke each of the last three functions is determined by various attributes that can be set in the signal's instance of PosixSignal
.
Capturing a Thread's Stack When an Exception Occurs
SignalException
derives from Exception
(which derives from std::exception
). Although Exception
is a virtual class, all RSC exceptions derive from it because its constructor captures the running thread's stack by invoking SysStackTrace::Display
:
Exception::Exception(bool stack, fn_depth depth) : stack_(nullptr)
{
if(stack)
{
stack_.reset(new std::ostringstream);
if(stack_ == nullptr) return;
*stack_ << std::boolalpha << std::nouppercase;
SysStackTrace::Display(*stack_, depth + 1);
}
}
SignalException
simply records the signal and a debug code after telling Exception
to capture the stack:
SignalException::SignalException(signal_t sig, debug32_t errval) :
Exception(true, 1),
signal_(sig),
errval_(errval)
{
}
Capturing a thread stack is platform-specific. See SysStackTrace.win.cpp for the Windows targets. Here is an example of its output within an RSC log for a Windows structured exception that got mapped to SIGSEGV
. The stack trace is the portion after "Function Traceback
":
THR902 Jun-27-2022 15:16:16.123 on Reigi {3}
in NodeTools.RecoveryThread (tid=20, nid=0x4eb8): trap number 2
type=Signal
signal : 11 (SIGSEGV: Illegal Memory Access)
errval : 0xc0000005
Function Traceback:
NodeBase.Exception.Exception @ Exception.cpp + 53[28]
NodeBase.SignalException.SignalException @ SignalException.cpp + 38[12]
NodeBase.Thread.HandleSignal @ Thread.cpp + 1892[27]
NodeBase.SE_Handler @ SysThread.win.cpp + 147[0]
_NLG_Return2 @ <unknown file> (err=487)
_NLG_Return2 @ <unknown file> (err=487)
_NLG_Return2 @ <unknown file> (err=487)
_NLG_Return2 @ <unknown file> (err=487)
_CxxFrameHandler4 @ <unknown file> (err=487)
__GSHandlerCheck_EH4 @ gshandlereh4.cpp + 86[0]
_chkstk @ <unknown file> (err=487)
RtlRestoreContext @ <unknown file> (err=487)
KiUserExceptionDispatcher @ <unknown file> (err=487)
NodeBase.Thread.CauseTrap @ Thread.cpp + 1264[5]
NodeTools.RecoveryThread.UseBadPointer @ NtIncrement.cpp + 3405[0]
NodeTools.RecoveryThread.Enter @ NtIncrement.cpp + 3304[0]
NodeBase.Thread.Start @ Thread.cpp + 3124[0]
NodeBase.EnterThread @ SysThread.win.cpp + 159[0]
recalloc @ <unknown file> (err=487)
BaseThreadInitThunk @ <unknown file> (err=487)
RtlUserThreadStart @ <unknown file> (err=487)
In released software, users can collect these logs and send them to you. Better still, your software can include code to automatically send them to you over the internet. Each of these logs highlights a bug that needs to be fixed.
Recovering from an Exception
The above log was produced by TrapHandler
, which was mentioned a long time ago as the function that Thread::Start
invokes when it catches an exception:
Thread::TrapAction Thread::TrapHandler(const Exception* ex,
const std::exception* e, signal_t sig, const std::ostringstream* stack)
{
try
{
if(sig == SIGDELETED)
{
return Return;
}
if(Singleton<Threads>::Instance()->GetState() != Constructed)
{
return Return;
}
auto retrapped = false;
switch(++priv_->traps_)
{
case 1:
SetSignal(sig);
break;
case 2:
retrapped = true;
break;
case 3:
return Release;
default:
return Return;
}
if((sig == SIGSTACK1) && (systhrd_ != nullptr))
{
systhrd_->status_.set(SysThread::StackOverflowed);
}
auto exceeded = LogTrap(ex, e, sig, stack);
auto sigAttrs = Singleton<PosixSignalRegistry>::Instance()->Attrs(sig);
if(exceeded | retrapped | sigAttrs.test(PosixSignal::Final))
{
return Release;
}
return Continue;
}
catch(SignalException& sex)
{
switch(TrapHandler(&sex, &sex, sex.GetSignal(), sex.Stack()))
{
case Continue:
case Release:
return Release;
default:
return Return;
}
}
catch(Exception& ex)
{
switch(TrapHandler(&ex, &ex, SIGNIL, ex.Stack()))
{
case Continue:
case Release:
return Release;
default:
return Return;
}
}
catch(std::exception& e)
{
switch(TrapHandler(nullptr, &e, SIGNIL, nullptr))
{
case Continue:
case Release:
return Release;
default:
return Return;
}
}
catch(...)
{
switch(TrapHandler(nullptr, nullptr, SIGNIL, nullptr))
{
case Continue:
case Release:
return Release;
default:
return Return;
}
}
}
Recreating a Thread
If a thread traps too often, it is forced to exit. But if the thread served an important purpose, there needs to be a way to recreate it.
In Creating a Thread, we saw that a thread could register a Daemon
when it was created. And in Exiting a Thread, Daemon::ThreadDeleted
was notified when a thread exited. This function isn't virtual
, but the same for every Daemon
:
void Daemon::ThreadDeleted(Thread* thread)
{
auto item = Find(thread);
if(item != threads_.end())
{
threads_.erase(item);
if(Restart::GetStage() != Running) return;
Singleton<InitThread>::Instance()->Interrupt(InitThread::Recreate);
}
}
When InitThread
runs, it invokes the following when it sees that it was interrupted to recreate threads:
void InitThread::RecreateThreads()
{
auto& daemons = Singleton<DaemonRegistry>::Instance()->Daemons();
for(auto d = daemons.First(); d != nullptr; daemons.Next(d))
{
if(d->Threads().size() < d->TargetSize())
{
d->CreateThreads();
}
}
Reset(Recreate);
}
And the following finally invokes the virtual
function Daemon::CreateThread
:
void Daemon::CreateThreads()
{
switch(traps_) {
case 0:
break;
case 1:
++traps_;
Recover();
--traps_;
break;
default:
RaiseAlarm(GetAlarmLevel());
return;
}
while(threads_.size() < size_)
{
++traps_;
auto thread = CreateThread();
traps_ = 0;
if(thread == nullptr)
{
RaiseAlarm(GetAlarmLevel());
return;
};
threads_.insert(thread);
ThreadAdmin::Incr(ThreadAdmin::Recreations);
}
RaiseAlarm(NoAlarm);
}
Traces of the Code in Action
RSC has 29 tests that focus on exercising this software. Each of them does something nasty to see if the software can handle it without exiting. During these tests, the function trace tool is enabled so that Debug::ft
will record all function calls. For the SIGSEGV
test, which is associated with the log shown above, the output of the trace tool looks like this. When the tool is on, code slows down by a factor of about 4x. When the tool is off, calls to Debug::ft
incur very little overhead.
A Destructor Uses a Bad Pointer
A recently added test uses a bad pointer in the destructor of a concrete Thread
subclass. This test should have been added long ago; it is an especially good one because an exception in a destructor normally causes a program to abort. Although RSC survives if compiled with Microsoft's C++ compiler, what occurs is interesting. The structured exception (Windows' SIGSEGV
equivalent) gets intercepted and thrown as a C++ exception. But this exception is not caught immediately. The C++ runtime code handling the deletion catches the exception itself and continues its work of invoking the destructor chain. This is admirable because it allows the base Thread
class to release its resources. Only afterwards does the C++ runtime rethrow the exception, which is finally caught by the safety net in Thread::Start
. We now have the unusual situation of a member function running after its object has been deleted. Because Thread::TrapHandler
is not virtual
, it gets invoked successfully. When it notices that the thread has been deleted, it returns and exits the thread.
Points of Interest
It is only forthright to mention that the C++ standard does not support throwing an exception in response to a POSIX signal. In fact, it is undefined behavior for a signal handler to do almost anything in a C++ environment! A list of undefined behaviors appears here; those pertaining to signal handling are numbered 128 through 135. The detailed coding standard available on the same website makes these recommendations about signals:
- SIG31-C. Do not access shared objects in signal handlers
- SIG34-C. Do not call
signal()
from within interruptible signal handlers - SIG35-C. Do not return from a computational exception signal handler
Fortunately, much of this is theoretical rather than practical. The main reason that most things related to signal handling are undefined behavior is because different platforms support signals in different ways. Many of the risks that lead to undefined behavior result from race conditions that will rarely occur1. Regardless, what can you do if your software has to be robust? It's far better to risk undefined behavior than to let your program exit.
The same rationale, of not being able to depend on how the underlying platform does something, does not excuse the standard's adoption of noexcept
. If it were possible to throw an exception in reponse to a signal, any noexcept
function would be unable to do so. Even a non-virtual
"getter" that simply returns a member's value is now at risk. If such a function is invoked with a bad this
pointer, it will add an offset to that pointer and try to read memory. Boom! An ostensibly trivial noexcept
function, through no fault of its own, has now caused the invocation of abort
when the signal handler throws an exception to recover from the SIGSEGV
.
The invocation of abort
isn't the end of the world, let alone your program, because your signal handler can turn the SIGABRT
into an exception. But now what are we dealing with, abort
or an exception? What if the exception isn't "allowed", either because it occurred in a destructor or noexcept
function? (Hands up, those of you who have never seen anything nasty happen in a destructor.)
When abort
is invoked, the C++ standard says it is implementation dependent whether the stack is unwound in the same way as when an exception is thrown. That is, local objects may not get deleted. So if a function on the stack owns something in a unique_ptr
local, it will leak. And if it has wrapped a mutex in a local object whose destructor releases the mutex whenever the function returns, the outcome could be far worse. This is assuming, of course, that your program will be allowed to survive. If it won't, it doesn't really matter.
Unless your software is shockingly infallible, it will occasionally cause an abort
, and your C++ compiler better allow this to turn into an exception that unwinds the stack in all circumstances. In the end, both your platform and compiler will make it either possible or virtually impossible to deliver robust C++ software.
To summarize, here are some things that the C++ standard should mandate to get serious about robustness:
- A signal handler must be able to throw an exception when it receives a signal.
- The stack must be unwound if the signal handler throws an exception in reponse to a
SIGABRT
. std::exception
's constructor must provide a way to capture debug information, such as a thread's stack, before the stack is unwound.
The good news is that platform and compiler vendors often make it possible to deliver robust software, despite what the standard fails to mandate.
Notes
1 In UNIX-like environments, signals other than those discussed in this article are sometimes used as a primitive form of inter-thread communication. This greatly increases the risk of these race conditions and is not recommended here.
History
- 3rd September, 2020: Add section on recreating a thread
- 11th August, 2020: Add details about what happens when a thread is exited
- 27th May, 2020: Describe what happens when an exception occurs in a destructor
- 28th August, 2019: Initial version