Debugging Multithreaded Code in Real Time!

Peter Huber SG

4.97/5 (34 votes)

3 Jul 2014Public Domain6 min read

44.9K

604

Non blocking, multithreading safe tracing with only microseconds delays

Download source code - 12.4 KB

Introduction

Writing a multithreaded application is much more difficult than a single threaded program. Even worse, one needs to debug in real time to find these bugs, which is nearly impossible. But fear not, this article explains how just that can be simply done!

Challenge

We all love Visual Studio, using its breakpoints and single stepping through code to find out why a program behaves differently than expected. Alas, setting a break point or single stepping will completely change the behaviour of a multithreaded application, where it matters which thread executes which instruction in which sequence, measured in microseconds or less. Stop or delay anything in the multithreaded system and it behaves completely differently.

So obviously, we cannot stop a single thread when debugging. Which means we should use tracing, looking something like this:

Console.WriteLine("Thread A requests someLock");
lock (someLock) {
  Console.WriteLine("Thread A got someLock");
  //do some work
  Console.WriteLine("Thread A releases someLock");
}
Console.WriteLine("Thread A released someLock");

Do this wherever you guess the error could happen and you will see in the output window which thread caused it. The only problem is that you cannot use Console.WriteLine(), because:

It is way too slow, taking about 500 microseconds on my PC. The traced program would behave rather differently than the not traced program.
Console.WriteLine locks the threads if several try to write at the same time, a big No No in real time multi-threading debugging.

Solution

Don't use the Console for tracing, but write your trace into memory. Once the problem occurs, check in the memory what had happened. Writing to the memory must be done with as little overhead as possible, something like this:

public const int MaxMessages = 0x100;
string[] messages = new string[MaxMessages];
int messagesIndex = -1;

public void Trace(string message) {
  int thisIndex = Interlocked.Increment(ref messagesIndex);
  messages[thisIndex] = message;
}

Basically, Trace() writes the message in a string array. Notice that Interlocked.Increment() is multithreading safe and non blocking, as opposed to many of the thread safe methods in the .NET Framework which are blocking. This absolute minimum approach needs about 2 microseconds, a delay that should be acceptable. After all, there will always be some time variation, even if the exact same code is executed twice, depending what else the microprocessor is doing, if the code and data is in the processor's cache, etc.

Of course, the above code has also the problem that we get an exception when the array is full, which we can easily solve like this:

const int indexMask = MaxMessages-1;

public void Trace(string message) {
  int thisIndex = Interlocked.Increment(ref messagesIndex) & indexMask;
  messages[thisIndex] = message;
}

This forces the index back to 0, once it has reached the end of the messages array. The buffer can store 0x100 messages, in decimal notion 256. If you need more history, increase MaxMessages. But be careful, 0x1000000 % MaxMessages must be equal to 0. Meaning you can use figures like 0x100, 0x200, 0x400, 0x800, 0x1000 but not 0x300.

What happens when messagesIndex reaches int.Max (0x7FFFFFFF or 2,147,483,647)? Basically, the integer will overflow to int.Min (0x8000000 or -2,147,483,648, but luckily no exception gets thrown. The bitwise "And" operator & cuts out the leading 1s and also the very negative number gets mapped into the range 0 - 0x100.

Tracing More Information

It would be nice to know which thread was writing the message when. We can achieve it like this:

static string[] threadNames = new string[MaxMessages];
static long[] ticks = new long[MaxMessages];
static Stopwatch stopWatch;

static RealTimeTracer() {
  stopWatch = new Stopwatch();
  stopWatch.Start();
}

public static void Trace(string message) {
  int thisIndex = Interlocked.Increment(ref messagesIndex) & indexMask;
  ticks[thisIndex] = stopWatch.ElapsedTicks;
  messages[thisIndex] = message;
  threadNames[thisIndex] = Thread.CurrentThread.Name;
}

This will increase the time needed by Trace() to about 3 microseconds. Of course, you can add any other information you wish, but try to minimise the creation of new objects, because memory management could add some unwanted delay.

Retrieving the Trace

You need to write some code which detects in real time when the problem occurs. At this time, just call RealTimeTracer.GetTrace(), which returns all messages in one string. GetTrace() will stop all tracing for a short time to return a consistent set of messages. You can also just set a breackpoint in Visual Studio when the error occurs and display RealTimeTracer.GetTrace() in the Quick Watch Window. Finally, you can use the property RealTimeTracer.IsStopped to control if tracing should be performed or not.

Another approach is just to let trace running for a while, then stop everything and examine the trace if there is an error.

Interpretation of Times Measured

I measured the time the Tracer needs like this:

string[] testResults = new string[maxTestCycles];
string testString = "abcdefgh";
for (int testIndex = 0; testIndex < maxTestCycles; testIndex++) {
  Stopwatch stopWatch = new Stopwatch();
  stopWatch.Start();
  for (int i = 0; i < RealTimeTracer.MaxMessages/10; i++) {
    RealTimeTracer.Trace(testString);
  }
  stopWatch.Stop();
  Console.WriteLine("Test " + testIndex + ": " + 
                    stopWatch.Elapsed.TotalMilliseconds/RealTimeTracer.MaxMessages*1000 + 
                    " microseconds.");
}

When I run the very shortest Tracer version, I got the following measurement results:

Test 0: 1.52734375 microseconds.
Test 1: 0.005859375 microseconds
Test 2: 0.00390625 microseconds.
Test 3: 0.00390625 microseconds.
Test 4: 0.00390625 microseconds.

Why does the first test take so much longer? It might be that .NET needs to prepare some stuff, but for sure the processor needs to load everything into cache before execution. In the next cycle, everything is already in the cache and executes therefore faster.

Which time should you use? One could argue that loading the cache is normal and that will be the time in reality. However, if you try to make your code faster and want to compare different versions, I would recommend taking the fastest times ONLY. The reason is that the results are repeatable, meaning you get the same numbers if you repeat the test.

Be careful about advice like this:

Quote:

As the general rule, when testing the extremely fast algorithms, it's recommended to implement the loop running 10,000...100,000 iterations in order to increase the accuracy of the measurement.

This is actually not true! Because the longer you let a code run, the higher is the chance that random events of the operating system occur. Imagine if during one test, you suddenly have a memory garbage collection running. That time will have nothing to do with your code. Therefore, to measure execution time of code, keep the time short and eliminate all slow results. If your test runs 10 times in 100 microseconds, but takes once 10 milliseconds, it is not your code which is the problem, but something which is not under your control kept the microprocessor busy.

Test Approach for Multithreaded Programs

Even the RealTimeTracer has only few lines of code, it must be thoroughly tested. I recommend testing first sequentially, using only 1 thread. Once the logic works fine for 1 thread, then run another test using several threads.

The solution comes with the console project TestConsole. Program.Main() can run 2 tests:

testSingleThread() writes a running number into the trace using only 1 thread. Then the test verifies that each number is written properly and in the correct sequence into the trace.
testMultiThread() starts writers on several threads writing running numbers. After some time, the main thread stops the writer threads and verifies in the trace that each thread wrote its numbers in sequence and that no number is missing.

My PC has 8 processor cores, so I use 1 main thread, 6 test threads and leave 1 core for the operating system. Just for fun, I run it also with 100 threads, but then the test runs much slower because of all the thread switching.

Using the Code

RealTimeTracer is a class in my TracerLib.dll. I made RealTimeTracer static so that you can start tracing right from the first line of code without the need to setup anything:

RealTimeTracer.Trace("Some Trace Message");

If you need more history, change the constant RealTimeTracer.MaxMessages.

Calling RealTimeTracer.GetTrace() returns the trace as a string. This is convenient for looking at the trace in Visual Studio. It ensures that no new information is traced while the trace is constructed.

With RealTimeTracer.IsStopped, you can control if tracing is done or not.

If you are interested in general tracing for any .NET technology, read my article:

http://www.codeproject.com/Articles/1142178/ACoreLib-Tracer-Tracing-done-right

History

Initial version

License

This article, along with any associated source code and files, is licensed under A Public Domain dedication