Introduction
This article is about a strange bug and the scenario to reproduce it. Pow(10, n)
function does not work the sixth time. Strange ha? That was the title of the bug reported to me, and here is the analysis done.
Analysis
When debugging, I found that the result of pow(10, n)
, where n happens to be 1 or 2 in all test cases, is fine several times, but at some point of time, it starts to return 1#INF instead of just 10 or 100.
Going deeper inside pow
disassembly, I found that at some point of time, an fld1
command which is intended to insert a 1.0 in the coprocessor stack, is inserting 1#IND instead of 1.
The first thought I got was that something is corrupting the status of the coprocessor. So I decided to move the pow
line up the call stack and count at which iteration it will return the 1#INF. I repeated the check before and after each call to a function as going up the call stack. This is the investigation code I used:
static int y = 0;
y++;
double x = pow(10, 2);
ASSERT(x==100.0);
double x = pow(10, 2);
ASSERT(x==100.0);
At last, I found some function that when I make these two calls after and before its call, failed after calling. This happened in the 6th iteration. But when I make the checks inside the function in its start and end, it fails in the 7th iteration and at the start check.
It was clear now that the returning from that function is where the coprocessor status is corrupt. At first, I couldn't figure out why just returning should be a problem.
Having a look to the prototype of the function, it was quite simple. The function was declared to return double
and it was exported from a DLL. The calling EXE was loading the DLL with LoadLibrary
and casting the pointer coming from GetProcAddress
to a pointer to a function that returns void
. I have always known that this is not a problem, but when I fixed this, the pow
bug disappeared.
Conclusion
It is just that simple. When C++ compiler is compiling a function that returns an int
, it puts it in the processor generic register EAX. If you just ignored the result or even cast the function to a void
-returning function, everything will be just fine. I thought it is the case for returning double
too, but it wasn't.
When C++ compiler compiles a function that returns double
, it pushes the return value on the coprocessor stack (i.e. on ST0). If the caller ignored the return value, the compiler generates the instructions that free the stack of the coprocessor. But if we cast the function pointer to a void
-returning function, the caller won't free the coprocessor and therefore it will stack-overflow after a while. This is why fld1
, which is intended to add 1.0, will add 1#IND. Sometimes this throws an exception and sometimes not, I don't know why.
Example
This example shows how to reproduce the bug:
#include "stdio.h"
#include "math.h"
double function()
{
return 0.0;
}
typedef void (*LPVOIDPROC) ();
int main()
{
LPVOIDPROC lpVoidProc = (LPVOIDPROC)function;
double dValue;
int iCounter = 0;
do
{
lpVoidProc();
dValue = pow(10, 2);
printf("Iteration %d -> Value %g\r\n", ++iCounter, dValue);
}
while(dValue == 100);
return 0;
}
You will find that the output is 100 in the iterations from 1 to 5 and the 6th iteration, it will give 1#INF.
When calling the function returning double
, the compiler will ignore the value on the coprocessor stack because we are casting the function to a void
-returning function. Of course this will make other strange floating-point calculation problems not only with the pow
function. pow
is just the example I stuck with.
History
- 31st July, 2006: Initial post