I recently needed to diagnose an interesting problem with one of our Topshelf’s Windows Services. The service ended in a StopPending
state and we needed to kill it in order to make it work again. But before killing the service, we collected a process dump for further analysis. In this post, I will show you what I read from this dump and how this information might help you better understand your own Topshelf applications.
The aforementioned service is a multithreaded application with each thread processing messages from a queue. I prepared a simplified version of it for the purpose of this post:
namespace LowLevelDesign.Samples
{
class TestWorker : ServiceControl
{
private const int ThreadCount = 5;
private static readonly LogWriter logger = HostLogger.Get<TestWorker>();
public static bool ShouldStop { get; private set; }
private WaitHandle[] handles;
public bool Start(HostControl hostControl)
{
logger.Info("Starting test worker...");
handles = new ManualResetEvent[ThreadCount];
for (int i = 0; i < handles.Length; i++)
{
handles[i] = new ManualResetEvent(false);
}
logger.Info("Starting worker threads...");
for (int i = 0; i < ThreadCount; i++)
{
ThreadPool.QueueUserWorkItem(Test, handles[i]);
}
return true;
}
private static void Test(Object state)
{
var h = (ManualResetEvent)state;
try
{
logger.InfoFormat("Throwing exception");
throw new Exception();
}
finally
{
logger.InfoFormat("Releasing the handle");
h.Set();
}
}
public bool Stop(HostControl hostControl)
{
ShouldStop = true;
logger.Info("Stopping test worker...");
WaitHandle.WaitAll(handles);
return true;
}
}
class Program
{
static void Main()
{
HostFactory.Run(hc =>
{
hc.UseNLog();
hc.Service<TestWorker>();
hc.SetServiceName(typeof(TestWorker).Namespace);
hc.SetDisplayName(typeof(TestWorker).Namespace);
hc.SetDescription("Test worker");
});
}
}
}
As you can read from the snippet, we create 5 threads on the service start (through ThreadPool
), assigning each of them an instance of ManualResetEvent
. Each thread throws an exception which forces the service to stop. I expected the service to wait for all threads to finish and then exit. Unfortunately, the reality was surprisingly different and my service became unresponsive after a while.
Let’s look at the Topshelf source code in order to find out what’s going on. When Topshelf starts a service, it subscribes to all the unhandled exceptions that might happen during the service execution:
namespace Topshelf.Runtime.Windows
{
public class WindowsServiceHost :
ServiceBase,
Host,
HostControl
{
...
public TopshelfExitCode Run()
{
Directory.SetCurrentDirectory(AppDomain.CurrentDomain.BaseDirectory);
AppDomain.CurrentDomain.UnhandledException += CatchUnhandledException;
...
Run(this);
...
}
...
}
}
Then, our TestWorker.Start
method is called which fires 5 threads. For each thrown exception, the Topshelf
’s WindowsServiceHost.CatchUnhandledException
method is called:
namespace Topshelf.Runtime.Windows
{
public class WindowsServiceHost :
ServiceBase,
Host,
HostControl
{
...
void CatchUnhandledException(object sender, UnhandledExceptionEventArgs e)
{
_log.Error("The service threw an unhandled exception", (Exception)e.ExceptionObject);
Stop();
#if !NET35
if (Task.CurrentId.HasValue)
{
return;
}
#endif
int deadThreadId = Interlocked.Increment(ref _deadThread);
Thread.CurrentThread.IsBackground = true;
Thread.CurrentThread.Name = "Unhandled Exception " + deadThreadId.ToString();
while (true)
Thread.Sleep(TimeSpan.FromHours(1));
}
...
}
}
Notice that each unhandled exception calls ServiceBase.Stop
method (which eventually runs our TestWorker.Stop
method) and finally gets into an infinite loop. I guess you now know why our simple service could not be stopped – the finally {}
block in a thread code was never called thus keeping the thread’s ManualResetEvent
in an unsignaled state and effectively blocking the TestWorker.Stop
from finishing.
A simple resolution to this deadlock situation might be adding a catch {}
block that will signal the ManualResetEvent
before handling execution to WindowsServiceHost.CatchUnhandledException
:
private static void Test(Object state)
{
var h = (ManualResetEvent)state;
try
{
throw new Exception();
}
catch
{
logger.InfoFormat("Releasing the handle on exception.");
h.Set();
throw;
}
finally
{
logger.InfoFormat("Releasing the handle");
h.Set();
}
}
If you have any other ideas how to fix this problem, please write them in the comments. And if you would like to experiment with the code presented in this post, it’s available for download here.
Filed under: CodeProject, Diagnosing Windows Services