Introduction
In
Part 1, I discussed the problems of debugging a program that has been put into production or shipped to a customer. On the developer's workstation, the program might work fine, but on the customer's system, there are random crashes. To fix the problem, you need to know where the program is crashing, and in Part 1 I discussed an approach to determining this.
But what if that is not enough? Sometimes it would be helpful to know how you got to the crash point - in other words, you need to look at the call stack. In this article I will describe a simple method to make the call stack reveal itself.
The Next Step: Diving Into the Stack
As you saw from the above listing of ERRORLOG.TXT in Part 1, the stack is dumped and is available for inspection. In its present form, the stack dump would be very tedious to analyze. Without any outside intervention or third-party tools, is there anything we can do to make the stack analysis easier?
The answer is Yes, if you are willing to do some minor modifications to the source code. These types of modifications are called source code instrumentation, and various tools use instrumentation for purposes such as determining test coverage, measuring performance bottlenecks, etc. In our case, we will be instrumenting the source code to make the stack dump more readable.
Preparation
We use the same technique that is used by the ATL string conversion macros such as T2A
. The USES_CONVERSION
macro uses _alloca
to allocate bytes on the stack, which then is automatically freed when the function exits. We do the same thing with our FUNCTION_STACK_TRACE
macro:
#define FUNCTION_STACK_TRACE(x) \
char * function_stack_trace_buffer = (char *) _alloca(8); \
CopyMemory(function_stack_trace_buffer, x, 8);
This copies 8 bytes from x to the stack. But what are these 8 bytes? We choose the convention that the first 4 bytes represent the numeric module code, and the next 4 represent the alpha function code. So the Test2 program looks like:
void CTest2Dlg::OnButton1()
{
FUNCTION_STACK_TRACE("0001AAAA");
Func1(1);
}
void CTest2Dlg::Func1(int n)
{
FUNCTION_STACK_TRACE("0001AAAB");
Func2(n);
}
void CTest2Dlg::Func2(int n)
{
FUNCTION_STACK_TRACE("0001AAAC");
Func3(n);
}
void CTest2Dlg::Func3(int n)
{
FUNCTION_STACK_TRACE("0001AAAD");
int *p = 0;
if (n)
*p = n;
}
In this example, the first four bytes are always 0001
, because the functions are all in the same module.
Theory Into Practice
Here is what ERRORLOG.TXT looks like for Test2.exe:
Test2 caused an Access Violation (0xc0000005)
in module Test2.exe at 001b:00402da7.
Exception handler called in ExceptionAttacher.cpp - AfxWinMain.
Error occurred at 10/18/2003 21:06:56.
D:\temp1\XCrashReportTest\1.1\Test2\Release\Test2.exe, run by hdietrich.
Operating system: Windows XP (5.1.2600).
1 processor(s), type 586.
30% memory in use.
1024 MBytes physical memory.
712 MBytes physical memory free.
2462 MBytes paging file.
2221 MBytes paging file free.
2048 MBytes user address space.
2033 MBytes user address space free.
Write to location 00000000 caused an access violation.
Context:
EDI: 0x0012fe70 ESI: 0x0012fe70 EAX: 0x00000001
EBX: 0x00000001 ECX: 0x31303030 EDX: 0x44414141
EIP: 0x00402da7 EBP: 0x0012f7d4 SegCs: 0x0000001b
EFlags: 0x00010202 ESP: 0x0012f7cc SegSs: 0x00000023
Bytes at CS:EIP:
a3 00 00 00 00 8d 65 00 5d c2 04 00 90 90 90 90
Stack:
0x0012f7cc: 31303030 44414141 0012f7ec 00402d6e 0001AAAD....n-@.
0x0012f7dc: 00000001 31303030 43414141 0012fe70 ....0001AAACp...
0x0012f7ec: 0012f804 00402d2e 00000001 31303030 .....-@.....0001
0x0012f7fc: 42414141 0012fe70 0012f81c 00402cec AAABp........,@.
0x0012f80c: 00000001 31303030 41414141 004043c0 ....0001AAAA.C@.
0x0012f81c: 0012f82c 73dd23d8 004043c0 00000111 ,....#.s.C@.....
0x0012f82c: 0012f85c 73dd22ae 0012fe70 000003e8 \....".sp.......
0x0012f83c: 00000000 00402cc0 00000000 0000000c .....,@.........
0x0012f84c: 00000000 00000000 0012fe70 000003e8 ........p.......
0x0012f85c: 0012f880 73dd8fc5 000003e8 00000000 .......s........
0x0012f86c: 00000000 00000000 000003e8 0012fe70 ............p...
.
.
.
The stack dump shows the sequence: 0001AAAA, 0001AAAB, 0001AAAC, and finally 0001AAAD. Without even looking at the instruction pointer, it is pretty clear where the program crashed.
Summary
This is a simple method for using stack dumps in a post-mortem analysis. We write identifiable markers to the stack, and the markers lead us through the call stack.
There are some drawbacks. First, the source must be modified. This could probably be semi-automated with DevStudio macros, and possible even fully automated with an external tool. Second, you must keep track of all the module and function codes. Third, you must have the discipline to keep doing this with new source. Fourth, this technique is useless where you are calling third-party libraries that you do not have source code for.
In Part 3, I will show you how I overcame these drawbacks.
Revision History
Version 1.1 - 2003 October 19
Usage
This software is released into the public domain. You are free to use it in any way you like, except that you may not sell this source code. If you modify it or extend it, please to consider posting new code here for everyone to share. This software is provided "as is" with no expressed or implied warranty. I accept no liability for any damage or loss of business that this software may cause.