Introduction
First a warning, this is a difficult article which goes really deep inside the .NET machinery, so if you don’t get it the first time (or even the second or third time…) don’t worry and come back later.
For a training session I’ve taught at the end of last year, I wanted to demonstrate some subtleties of multi-threading, and more specifically some memory visibility issues that should cause a program to hang.
So I developed a small sample that I expected would be showing the issue, but instead of hanging as expected, the program completed!
After manipulating the program further, I obtained the behavior I wanted, the program was hanging, but it still didn’t explain why it managed to complete with my original version.
<SPOILER>
I suspected some JITter optimizations, and indeed it was the case, but I needed more information to completely explain this strange behavior.
As often, the StackOverflow platform was of great help; if you’re curious, you can have a look at the original SO thread.
</SPOILER>
In this article, I’ll “build” and explain the issue step by step, trying to make it more understandable than the SO thread which is indeed quite dry.
A No-brainer…
Say you are a naive developer who loves simplicity.
You’re asked to synchronize two threads, so you ask yourself this question: what’s the simplest way of synchronizing two threads?
Easy peasy: a spin loop.
So 2 minutes later, you’re done with a simple, but you think brilliant, implementation:
using System;
using System.Threading;
namespace Tests
{
public class AwesomeSpin
{
bool ok = false;
void Spin()
{
while (!ok) ;
}
void Run()
{
Thread thread = new Thread(Spin);
thread.Start();
Console.Write("Press enter to notify thread...");
Console.ReadLine();
ok = true;
Console.WriteLine("Thread notified.");
}
static void Main()
{
new AwesomeSpin().Run();
}
}
}
So the main thread starts another thread which should spin until we press enter to notify it that it has spun enough for today.
You compile your work:
> csc /optimize+ AwesomeSpin.cs
And you run it:
> NaiveSpin.exe
Press enter to notify thread...<You press enter>
Thread notified.
>
The second >
indicates that the program has correctly terminated and that you’re back to the shell which is requesting more commands to execute.
Perfect! It works just as expected.
You’re the boss!
…Well, Almost
You commit and push your code and as you’ve done a pretty good job, you have the right to recover from this long and exhausting coding session with a well-deserved coffee.
But before you’ve ended drinking your coffee, you receive a message from the testing team:
Hello,
your new component has been running since a few minutes now without any output and it seems stuck!
The testing timeouts have been hit!
Could you please check that all is OK?
Regards
Like any developer in this situation, your first thought is “WTF?”.
Oops!
Then you decide to have a closer look at the situation and check how the testers have run your code and you realize the code has been compiled and run with many different configurations and hosts combinations.
The testing team has sent you a report like the following:
Platform | Host | Result |
---|
AnyCPU | x86 | OK |
AnyCPU | x64 | KO |
x86 | x86 | OK |
x86 | x64 | OK |
x64 | x86 | X |
x64 | x64 | KO |
Hum! Seems like there is an issue with 64-bit…
Note that by default, CSC flags the resulting assembly as supporting platform “AnyCPU” meaning it will run in a 64-bit CLR if one is available on the host and in a 32-bit CLR otherwise.
As you’re a conscientious employee and/or a curious geek, you try to reproduce the issue yourself.
You setup a 64-bit machine and update your code.
First, you force your .NET binary to be run only by the 32-bit CLR:
> csc /platform:x86 /optimize+ AwesomeSpin.cs
And you rerun it:
> AwesomeSpin.exe
Press enter to notify thread...
Thread notified.
>
So far so good.
Then, you try in 64-bit mode:
> csc /platform:x64 /optimize+ AwesomeSpin.cs
And you rerun it again:
> AwesomeSpin.exe
Press enter to notify thread...
Thread notified.
^C
Oops, indeed it’s stuck and you have to CTRL-C to stop the program. :/
But this is a good thing: a bug that you can reproduce can be considered as half fixed.
Note that you could have used CorFlags.exe too to set the assembly’s cor-flags to run it with different CLRs but recompiling best illustrates the way you do it with VS.
When Things Become Crazy
The code is quite light and the only idea you have to confirm that the issue is in the Spin
method is to use your best debugging wizardry … console output:
void Spin()
{
Console.WriteLine("\nBefore spin loop.");
while (!ok) ;
Console.WriteLine("After spin loop.");
}
And here we go again:
Compile:
> csc /platform:x86 /optimize+ AwesomeSpinDebug.cs
and run:
> AwesomeSpinDebug.exe
Press enter to notify thread...
Before spin loop.
Thread notified.
^C
Ok it’s still stuck.
But … wait … I’m in x86 mode!
Just to check, you comment the two Console.WriteLine
lines:
void Spin()
{
while (!ok) ;
}
One more compilation:
> csc /platform:x86 /optimize+ AwesomeSpinDebug.cs
And one more run:
> AwesomeSpinDebug.exe
Press enter to notify thread...
Before spin loop.
Thread notified.
>
And it works again!
…
As developers, we all know these moments, when you feel you’ve lost control over things and the machine does what it wants.
This time, you really think software development is not a job for you and it’ll get you crazy, and you start to ask Google if there is not an open position at the closest fast-food restaurant.
WTF?
But in a last fit of pride, you decide to investigate more and you decompile your executables with your favorite IL disassembler.
(I often use ILSpy but for simple cases like this one, ILDasm does the job too.)
With platform x86 without debugging output (the original version), you get:
.method private hidebysig instance void Spin() cil managed
{
.maxstack 8
IL_0000: ldarg.0
IL_0001: ldfld bool Tests.MemoryVisibility::ok
IL_0006: brfalse.s IL_0000
IL_0008: ret
}
With x64 platform, still without debugging output:
.method private hidebysig instance void Spin() cil managed
{
.maxstack 8
IL_0000: ldarg.0
IL_0001: ldfld bool Tests.MemoryVisibility::ok
IL_0006: brfalse.s IL_0000
IL_0008: ret
}
And finally with platform x86 with debugging output:
.method private hidebysig instance void Spin() cil managed
{
.maxstack 8
IL_0000: ldstr "\nBefore spin loop."
IL_0005: call void [mscorlib]System.Console::WriteLine(string)
IL_000a: ldarg.0
IL_000b: ldfld bool Tests.MemoryVisibility::ok
IL_0010: brfalse.s IL_000a
IL_0012: ldstr "After spin loop."
IL_0017: call void [mscorlib]System.Console::WriteLine(string)
IL_001c: ret
}
As you’re probably not a CIL guru (pardon if you are), let me give you a little insight.
The important part, i.e. the spinning, is in these 3 lines of code:
IL_0000: ldarg.0
IL_0001: ldfld bool Tests.MemoryVisibility::ok
IL_0006: brfalse.s IL_0000
It means:
- Push the first method argument, i.e. the implicit “
this
” reference, at the top of the thread stack - Pop the object reference which is at the top of the stack and push the value of the object’s “
ok
” field - Check the boolean value at the top of the stack: if
false
, go to the instruction 2 lines above, else continue to the next instruction
Conclusion: The spinning part is exactly the same (except of course the labels’ offsets) for the three programs.
And you suddenly remember that the platform (x86 or x64) only instructs the C# compiler to generate metadata that will determine which CLR will run the code, without impacting the way the C# compiler generates IL code.
And this is a good thing, only a native compiler should take care of the x86/x64 dichotomy issues.
So the issue is not at the IL level and you know what that means: you’ll have to go deeper, where no .NET developer should have to go (and where 99.42% of them will never go, and this is a good thing): in the native assembly Mordor!
But as an ex seasoned C/C++ programmer, you don’t fear it!
Inside the Mount Doom
In a last effort to preserve your mental sanity, you run your programs again but this time you attach Visual Studio to check the resulting native assembly code of the Spin
method.
With platform x86 and with output, you get:
00000000 push ebp
00000001 mov ebp,esp
00000003 push esi
00000004 mov esi,ecx
00000006 call 5BE97904
0000000b mov ecx,eax
0000000d mov edx,dword ptr ds:[03352178h]
00000013 mov eax,dword ptr [ecx]
00000015 mov eax,dword ptr [eax+3Ch]
00000018 call dword ptr [eax+10h]
0000001b movzx eax,byte ptr [esi+4]
0000001f test eax,eax
00000021 je 0000001F
00000023 call 5BE97904
00000028 mov ecx,eax
0000002a mov edx,dword ptr ds:[0335217Ch]
00000030 mov eax,dword ptr [ecx]
00000032 mov eax,dword ptr [eax+3Ch]
00000035 call dword ptr [eax+10h]
00000038 pop esi
00000039 pop ebp
0000003a ret
The spinning part being:
0000001f test eax,eax
00000021 je 0000001F
With platform x86 but no output:
00000000 push ebp
00000001 mov ebp,esp
00000003 cmp byte ptr [ecx+4],0
00000007 je 00000003
00000009 pop ebp
0000000a ret
Spinning part:
00000003 cmp byte ptr [ecx+4],0
00000007 je 00000003
And with platform x64 without output:
00000000 mov al,byte ptr [rcx+8]
00000003 movzx ecx,al
00000006 test ecx,ecx
00000008 je 0000000000000006
0000000a rep ret
Spinning part:
00000006 test ecx,ecx
00000008 je 0000000000000006
Again, all this bunch of cryptic code deserves some explanations:
- The
cmp
instruction compares its two operands and set some CPU flags depending on the result: the zero-flag a.k.a ZF
is set (1) if they are equals, unset (0) otherwise - The
test
instruction does a binary AND between its two operands and depending on the result set some flags: the ZF
flag is set (1) if the result is 0, unset (0) otherwise - The
je
instruction jumps to the instruction at the given label if the ZF
flag is set (1)
So the loops run while a zero (false
) value is provided either as the first operand of cmp
or as the first and second operands of test
.
But the most important thing to notice is that:
- Sometimes the .NET JITter directly compares the “
ok
” flag “from memory” in a place shared by the main thread and the spin thread (at address [ecx+4]
)
- Sometimes it caches the value in a CPU register (
eax
or ecx
) where it will be only accessible from the spin thread
In the latter case, the spin thread can’t see the new flag value because it only looks in a register: this is the memory visibility issue I wanted to demonstrate at first.
So you get the answer to the initial question: the behavior varies depending on the way the different JITters (the one of the x86 CLR and the one of the x64 CLR) optimize the code when they compile the IL code in native binary code.
Solution
So now you’ve understood why your code was behaving “strangely” in some context.
Of course, you can’t release such a code into the wild and you must fix it so that it will have a consistent behavior whatever the CLR used to run it.
There is a well known solution for this “issue”: tagging the data you want to protect from any over-optimization with the “volatile” metadata.
The volatile
concept exists in most of the languages: it instructs the compilers that they should not try to do too clever optimizations because they could completely mess up the program, as demonstrated above: checking a copy of the value instead of the value is indeed not a good idea but the compiler does not understand your code’s semantics.
With languages that are directly compiled to native code like C or C++, the volatile
keyword is directly interpreted by their respective compilers when they generate the native library or executable.
But the C# compiler does far less work than a C/C++ compiler as most of the optimizations are deferred to the native compilation step done by the CLRs’ JITters.
And indeed the C# volatile modifier is simply forwarded, through some assembly metadata (System.Runtime.CompilerServices.IsVolatile
) to the JITters, informing them that they should be cautious and ensure that:
- every read of the variable returns the latest value
- every write updates the variable with the latest value
This means that the JITters can’t do some optimizations anymore, like caching the value in a register for faster access, which is of course a bad idea in our case.
So let’s try with this fix:
volatile bool ok = false;
void Spin()
{
while (!ok) ;
}
If you now have a look at the IL generated by the C# compiler, you see:
.method private hidebysig instance void Spin() cil managed
{
.maxstack 8
IL_0000: ldarg.0
IL_0001: volatile.
IL_0003: ldfld bool modreq([mscorlib]System.Runtime.CompilerServices.IsVolatile) Tests.AwesomeSpinFixed::ok
IL_0008: brfalse.s IL_0000
IL_000a: ret
}
So this is the same code protected with some “volatile” metadata.
And now it works like a charm whatever the code you put before and after the spin loop, whatever the platform you set at compile time and whatever the CLR you use at runtime.
This time, you can gaze at your brilliant work with pride for good.
Conclusion
This was a tricky one, probably the trickiest thing I’ve done with .NET, but it’s a really interesting one for at least three reasons:
- It demonstrates the memory visibility issue
- It shows that multi-threaded code can be quite subtle, particularly when optimizations come into play, so that great care should be taken when writing it
- .NET does its best to encapsulate the underlying platform in a consistent manner but as a lot of abstractions it’s not a perfect abstraction but a “leaky” abstraction, meaning that the programmer has sometimes to be aware of some underlying things which are not perfectly abstracted by the higher level abstraction.
The latter point is not an issue by itself and there are more important leaky abstractions like floating point numbers (but that’s for another article).
Of course, you should never synchronize your threads with such a basic construct, and if after a thorough profiling of your code, you determine that you really need spinning, then you can use the .NET framework SpinLock; but be aware that it’s a value type so be very cautious when using it too.
Kudos to Hans Passant for confirming the issue and to MagnatLU for providing the debug wisdom necessary to extract the native assembly code and make the issue “clearly” appear.
If you catch any typo or mistake, have additional questions or want to share this kind of crazy experience, feel free to leave a comment.
Note to any future employer: this is not a real-life story, I promise that, except for demonstration purposes, I’ve never tried to spin a thread this way!