Introduction
This bug was first reported by
Jochen Kalmbach on April 12th 2002 (no links
available to original posting), when VS.NET 7.0 was doing its initial rounds; and it's quite
inconceivable why the bug still exists in VS.NET 2003. Just about every week, at
least two people report issues related to this bug and I thought it might be a
good idea to have an article on it here on CodeProject. What's really annoying
is that the developer might spend several hours or even a full day on the
problem before realizing that it is not a problem with his code.
The bug
The most common scenario where the bug is reported is when someone has a
mixed mode C++ program that has a managed class, which accesses an unmanaged
class in an unmanaged DLL. Now if the unmanaged class has a virtual
function
that returns a bool
, then irrespective of what value it returns, the managed
caller *always* gets back true
. But it's not necessary for the code
to be in two separate entities (the EXE and the DLL), the bug occurs if the
unmanaged class is defined in a #pragma unmanaged
block in a mixed
mode EXE or DLL.
Minimal code to reproduce bug
#pragma unmanaged
class Unmanaged
{
public:
virtual bool IsAlive()
{
return false;
}
};
#pragma managed
__gc class Managed
{
public:
void Test()
{
Unmanaged* um = new Unmanaged();
if(um->IsAlive())
{
Console::WriteLine("Function returned true. BUG!!!");
}
else
{
Console::WriteLine("Function returned false. No Bug :-)");
}
delete um;
}
};
int _tmain()
{
Managed* mg = new Managed();
mg->Test();
return 0;
}
Trying to figure it out
Let's examine the disassembly for the IsAlive
function
:-
004010B0 push ebp
004010B1 mov ebp,esp
004010B3 push ecx
004010B4 mov dword ptr [ebp-4],ecx
004010B7 xor al,al
004010B9 mov esp,ebp
004010BB pop ebp
004010BC ret
As you can see, the result of the function is returned in the AL register and
this is what the contents of my registers looked like at this point :-
EAX = 00401000 EBX = 0012EFB4 ECX = 06C42C88 EDX = 00425410
ESI = 00168930 EDI = 00000000 EIP = 004010B9 ESP = 0012EFA8
EBP = 0012EFAC EFL = 00000246
Now let's see the disassembly for the caller code :-
00000065 mov eax,dword ptr [ebp-18h]
00000068 mov eax,dword ptr [eax]
0000006a mov esi,dword ptr [eax]
0000006c mov ecx,dword ptr [ebp-18h]
0000006f mov eax,esi
00000071 push 1692D0h
00000076 call F9759F50
0000007b movzx esi,al
0000007e test esi,esi
00000080 je 0000009A
The return value is obtained from the AL
register.
Let's see the contents of the registers now :-
EAX = 00000001 EBX = 0012F0C8 ECX = 00000004 EDX = 00000000
ESI = 00000001 EDI = 04A719C8 EBP = 0012F070 ESP = 0012F044
Horror of horrors! AL
is now 1 (more precisely
EAX
has been set to 1). I had stepped through the disassembly and
AL
was 0 at the time the RET
instruction was
executed; therefore the register corruption must have occurred during the
managed-unmanaged transition.
Workarounds
The simple workaround is to use a BOOL
(typedef
for an int
) instead of a bool
.
class Unmanaged
{
public:
virtual int IsAlive()
{
return false;
}
};
The casting is implicit from am int
to a
bool
and so we don't really have to do anything extra.
A slightly bizarre looking workaround [see section titled "More info" for
heheh more info] suggested by someone (possibly Microsoft
Support) is to set EAX
to a value under 255 before
returning from the unmanaged function.
class Unmanaged
{
public:
virtual bool IsAlive()
{
__asm mov eax,100
return false;
}
};
More info
I got some more information regarding this issue from Tom Archer (my friend,
fellow-CPian and co-author) who got this information from a friend of his, who
is in the VC++ compiler team. It seems this bug occurs when one of the upper 24
bits of the EAX
register is non-zero. They have a hot-fix
for this bug for both VC++.NET 7 and for VC++.NET Everett, but it might be a
better idea to wait for the next service pack.
Still more info (Thanks Jochen)
Jochen's post gave me a few links which provided even more info on this bug.
The bug occurs due to the way the CLR marshals boolean values. The CLR thinks
that a boolean is 4 bytes (as it is under .NET) but the C++ bool
type is only a single byte (so much for efficiency and the hassles it brings
about). What happens during marshalling is that the CLR examines the higher
three bytes and if they contain any data, it assumes that the boolean value
being passed is true
. As far as I understood from the
postings made by MS support, there was a sort of vague argument between the CLR
team and the VC++ compiler team. The VC++ compiler team believed (and rightly so
in my opinion) that the issue was with the CLR's marshalling code, but it seems
the CLR team wanted the VC++ team to emit a custom MarshalAs
attribute for the method that returns a bool
. But
obviously you cannot apply .NET attributes to an unmanaged function, as methods
compiled as unmanaged don't appear in the meta-data. Anyway, now we know why its
so important to clear the upper 3 bytes of the EAX
register.
Related Microsoft KB links
Conclusion
What's really dangerous about this bug is that it's quite easy not to see it,
because most functions that return bool
return
false
to indicate an error, and thus by getting true all the time, we
never realize that there is anything amiss. Thus it's quite easy to miss the bug
until it's really late into the software development cycle. I have been working
with mixed mode programs for quite a while now, specially since I began my book
with Tom (Extending
MFC Applications With the .NET Framework) and this is an issue in which I am
quite naturally interested; and I would like to hear more intelligent analysis
than mine from some of the gurus that frequent CP.
History
- Aug 27 2003 - First published
- Aug 30 2003 - Updated with more info and related KB link
- Sep 03 2003 - Updated with more info provided by Jochen