Introduction
This article shows a way to "abort" a non-cooperating thread. More precisely, it can be used to abort some non-cooperating function called from another thread, and return execution to some "friendly" point within that thread. The method described in this article causes an exception to be raised in the target thread (similar to Thread.Abort
in .NET).
The Method
First let's agree on the exception type that we may want to throw. Let's call it ThreadAbort
and define it:
class ThreadAbort
{
__declspec (noreturn) static void Throw();
public:
static bool RaiseInThread(HANDLE hThread);
static void DontOptimize() throw (...);
};
As you may see, the ThreadAbort
has no member variables. Means - we don't pass any parameters with our exception. In fact parameters may be added as well, but we won't discuss this here. The static
member functions have the following purpose:
RaiseInThread
causes the specified thread to raise the ThreadAbort
exception. DontOptimize
does nothing. However it should be called inside the appropriate try/catch
block of the target thread. This is related to the fact that at the compile time the compiler won't see a possibility for our exception to be raised there, and as a result during optimizations it may omit the try/catch
block, hence our exception won't be handled. Another way to solve this is to set the asynchronous exception handling model (more about this later).
The non-cooperating function that we may call in that thread should be wrapped by the appropriate try/catch
block:
try {
ThreadAbort::DontOptimize();
SomeNonResponsiveFunc();
} catch (ThreadAbort&) {
}
Now let's dig into the implementation of ThreadAbort
:
__declspec (noreturn) void ThreadAbort::Throw()
{
throw ThreadAbort();
}
void ThreadAbort::DontOptimize() throw (...)
{
volatile int i=0;
if (i)
Throw();
}
bool ThreadAbort::RaiseInThread(HANDLE hThread)
{
bool ok = false;
DWORD dwVal = SuspendThread(hThread);
if (INFINITE != dwVal)
{
CONTEXT ctx;
ctx.ContextFlags = CONTEXT_CONTROL;
if (GetThreadContext(hThread, &ctx))
{
ctx.Eip = (DWORD) (DWORD_PTR) Throw;
if (SetThreadContext(hThread, &ctx))
ok = true;
}
VERIFY(ResumeThread(hThread));
}
return ok;
}
As can be seen from the above code, in order to cause a thread to raise an exception we suspend it, modify its EIP
register (instruction pointer) to point straight into the hands of the ThreadAbort::Throw
, and then resume it, to go happily into the abyss. This is a brute-force method: we don't know anything about what that thread is actually doing, it may be in the middle of something. So we may interrupt it any moment.
Compared to Other Methods
There're other methods to interrupt a thread in the middle of what it's doing, however they're less flexible than that with exception:
TerminateThread
can be called to terminate a thread immediately. This however doesn't allow to pass control into a "friendly" code in that thread. That is, we may not want to terminate that thread, we just want to pass control to other code inside that thread. Plus when a thread is terminated by TerminateThread
- its stack memory is not released by the OS. So we have a memory leak (in addition to the leaks allocated by the aborted code). - We may modify the
EIP
to go directly into the "friendly" code, rather than involving the exception handling mechanism which will (hopefully) finish at our friendly code.
The problem here is that we don't give the chance to the being-aborted code to execute its cleanup. Hence - all the resources allocated by it are lost, and we'll probably have resource/memory leaks. On the other hand, if that code is written in an exception-aware way - it may cleanup gracefully.
That is, the method with exception allows to abort gracefully. However, unfortunately, it can't guarantee that we won't have leaks at all. There're several reasons for this:
- Not every piece of code is written in an exception-aware way (means - no allocated resource "in the air", everything is guarded either by destructors of automatic variables, or
__try/__finally
SEH blocks). Some code blocks are written with the assumption that exception may not occur within them. - Even if everything is written in an exception-aware way - the compiler is free to optimize the code. It may omit the needed exception handling records if it doesn't see a possibility for exception to occur. Luckily this can be prevented by selecting so-called asynchronous exception handling model (for more information, please read this article).
- If everything is written in an exception-aware way and even if we select the asynchronous exception handling model - still we may have a problem.
According to C++ rules, the lifetime of an automatic object officially begins after it finishes its constructor. For instance, if the object throws an exception during its constructor - its destructor won't be called.
Now, since we blindly cause our exception at the middle of whatever-thread-is-doing - we may cause an exception right at the end of the constructor of an object. After it allocated its resources, but just before it's officially born. In such a scenario its destructor won't be called, and we'll have leaks.
Luckily this also has a workaround: we may omit doing allocations in constructors. That is, you may not allocate anything in the constructor, just "zero-initialize" your variables. Then actual allocations may be done in some other method which should be called right after the constructor. This is called two-stage object construction.
- And finally, if we do everything in the most careful way - still we may have a problem. This time with destructor. During the normal program flow when the lifetime of the object ends - the code generated by the compiler removes the exception handling information for this object and then immediately calls its destructor. But what if we raise our exception right after the exception handling information is removed but before the destructor of the object is called? Or during its destructor when it didn't finish its cleanup yet?
Unfortunately there's no way to guarantee correct cleanup of a non-cooperating code in all the possible scenarios. This is the reason our method is a brute-force. However, compared to other methods, it's more graceful. Actually you have a good chance of a correct cleanup, and in the worst case (if however everything is written carefully) - you'll have no more than one leak.
Kernel-mode Calls
Our method causes an exception in the target thread when this thread runs in the user-mode. However if that thread is currently executing a system (kernel-mode) call - it won't be aborted immediately. The abortion will be deferred until it returns from the system call. If we talk about a short-duration system call (such as a call to SetEvent
, CreateMutex
or etc.) - there's no problem. But if one calls a waitable function (such as Sleep
, WaitForSingleObject
, etc.) - they may take a long time to complete, or even never complete.
AFAIK it's impossible to abort a system call from within a user-mode code. It may only be achieved by going deep into the OS internals. The only way to abort such a call is to use TerminateThread
.
Conclusion
The most important conclusion is that you should not use non-cooperating code. Every piece of code that is potentially time-consuming must provide a conventional way for abortion.
Next, it's impossible to abort the unknown code and be sure that everything is cleaned up. However the method with injected exception gives the best chances of a graceful cleanup.
It's impossible to abort a system call from the user-mode code. In some situations, there's no choice but to call the TerminateThread
function.
I'll appreciate comments. Criticisms and new ideas are welcome.
History
- 8th April, 2010: Initial post