Introduction
Often an application needs to refer to objects indirectly. These can be textures, meshes, or other resources in a game engine, plug-ins in a CAD system, records in a database, controls in a GUI system, etc. This is done by assigning each object a unique ID.
One approach is to use text strings as IDs. The big advantage of using text strings is that they can be easily read by us - both in the source and also during debugging. They however have numerous problems. The strings have different lengths. In order to have such IDs in a structure or a class you either need to have a buffer big enough to store the longest string, or you need to allocate the string from the heap � both have their shortcomings. Comparing two strings is a costly operation. If you search for objects by their name in a data structure, comparing keys is what takes the maximum time.
Instead of text strings, you can use the hash values of strings. Identical strings produce identical hash values and different strings usually produce different values. CRC32 is one of the most commonly used. It is a 32-bit value that fits nicely in any structure. The 32-bit values are easily stored in registers. Comparing two CRCs is also fast. For more on the strings vs. CRCs, check out [1].
At Pandemic Studios (where I work) we've been using CRCs for years in our games. We found them to be a very versatile tool. We use them as resource names, object names, file names, for identifying AI scripts and for many other purposes. The main problem with CRCs is that during debugging you only see a plain number. Wouldn't it be nice if you could view the original strings in the debugger?
Let's first start with a simple CRC class implementation.
CRC class implementation
The presented CRC
class is pretty straightforward. You can construct a CRC
object from a string or from a direct numeric value, you can use a copy constructor, and also compare two CRC
objects. It also has an Append
method, which allows you to append more characters to a CRC
object:
CRC crc1("Hello World");
CRC crc2("Hello");
crc2.Append(" World"); // Now crc1 and crc2 have the same value
Once CRC_STRINGS
is defined, the CRC
class keeps a global tree structure (a std::map
) that contains all the strings in the application. When a new CRC
object is constructed from a string the data is added to the tree. This way, the tree structure grows in size for the duration of the application. Every new unique string is added to it, and nothing is removed. You can use the GetStr
function to look up the string from the CRC
value:
#ifdef CRC_STRINGS
const char *CRC::GetStr( void ) const
{
static const char *null="NULL";
CCRCMap::const_iterator it=s_CRCMap.find(m_Crc);
if (it!=s_CRCMap.end())
return it->second;
else
return null;
}
#endif
This is how you use it:
printf(crc2.GetStr()) => Hello World
Now comes the tricky question...
How can you do that during debugging?
First try calling GetStr()
directly in the watch window
During debugging, you can type crc1.GetStr()
in the watch window to get the string. This is not very convenient as you have to type GetStr()
for each CRC object that you want to inspect. Since this approach executes code inside the debugged process, it might have unwanted side effects. Because of this, it doesn't work when inspecting a postmortem crash dump.
An improvement using Autoexp.dat
Visual Studio supports user-defined rules for displaying custom types through the file Autoexp.dat. If you add:
CRC=<GetStr()>
at the end of the [AutoExpand] section then the debugger will call the GetStr
function when it wants to display a CRC
object. At first glance it all seems to work for the simple case above. However, it fails in a more complex situation. Let's have a structure with a CRC
member inside it:
struct A
{
CRC a1;
CRC a2;
} a;
a.a1=CRC("aaa");
a.a2=CRC("bbb");
If you put the object a
in the watch window you get (Visual Studio 2003 and Visual Studio 2005):
a.a1
evaluates properly as a separate expression, but not when a
is expanded. Strangely, in VC6 it works fine:
With that approach, you also have the disadvantage of executing the code in the debugged process, and it won't work for postmortem debugging.
The best solution so far: A custom Expression Evaluator DLL
An Expression Evaluator (EE) is a DLL that extends the Visual Studio debugger with support for new types. For more details on this, check out [2]. The evaluator must retrieve the 32-bit CRC value, locate the string tree, find the node for the given value and return the string. Unfortunately, the EE system in Visual Studio is very limited. It only lets you to retrieve data from a given address through the ReadDebuggeeMemory
function. So, how to locate the string tree? That's where the VSHelper add-in comes in, check out [3].
Check out the file DebugData.cpp in the CRCTest source code. It creates a 4 K buffer called g_DebugData
. The buffer contains pairs of INT_PTRs
, the first one is an identifier, and the second is a pointer to some data structure. In our case, the first one is 'CRCV', and the second is a pointer to the head node of the string tree. The last pair has a terminating identifier 0.
When the debugger enters the break mode, either when it hits a breakpoint, when you hit Ctrl+Break, or when you do step by step debugging, it fires an OnEnterBreakMode
event. The VSHelper add-in catches the event and evaluates the address of the g_DebugData
buffer. The EE DLL can communicate with the VSHelper DLL and get the value.
That's how the CRCView DLL works. Once it gets the head node of the string tree, the rest is easy. It traverses the binary tree until it finds the right key and returns the string. To activate the evaluator add this to Autoexp.dat:
CRC=$ADDIN(<path to the DLL>\CRCView.dll,CRCView)
This solution works in all cases � works great for members of structures, for debugging crash dumps, with debug tooltips, and even for remote debugging:
At Pandemic Studios we've been using a similar add-in for a couple of years now, with no problems.
What about big-endian?
With Visual Studio you can debug a remote system that runs on a different kind of CPU. It can be an embedded system, a cell phone, or a game console. Sometimes the remote machine can be big-endian. The latest version of CRCView.dll will detect that by searching for both 'CRCV' and 'VCRC' identifier. If 'CRCV' is found then the target is little-endian. If 'VCRC' is found then the target is big-endian and the evaluator will byte-swap all values it gets from the debugger:
void BSwap( DWORD *data )
{
__asm
{
mov esi,data
mov eax,[esi]
bswap eax
mov [esi],eax
}
}
Extending the DebugData system
The g_DebugData
buffer supports 511 data pairs. You can register your own data by calling AddDebugData
. Give it a unique FourCC identifier and a pointer to your data. Then, write your own EE DLL that searches for that FourCC identifier. For an example on how to do that, check out the FindDebugData
function in the CRCView project.
What about VC6?
VC6 doesn't support the OnEnterBreakMode
event. The only solution I could find was to place the g_DebugData
array on a fixed address (0x3FFF0000 for example). It worked fine when I tested it, but I'm not sure if that address will always be available.
Installation and usage
First download and install the VSHelper add-in [3]. Then, download the CRCView.zip. In the CRCView\Release folder, you'll find the CRCView.dll. Then add this to the [AutoExpand] section of Autoexp.dat:
CRC=$ADDIN(<path to the DLL>\CRCView.dll,CRCView) <- notice there is no space
between , and CRCView
In VS 2003 and 2005, the Autoexp.dat file is located in <Visual Studio folder>\Common7\Packages\Debugger. In VC6, the Autoexp.dat file is located in <Visual Studio folder>\Common\MSDev98\Bin.
The last step is to include the CRC
class into your own project. Just copy the files CRC.cpp/h and DebugData.cpp/h from the CRCTest folder. Define CRC_STRINGS
in the project settings. If you don't define it, the string tree will be disabled and GetStr
will be unavailable. The evaluator will not work as well. You may want to use CRC_STRINGS
in your debug version, and disable it in release version to save memory.
Troubleshooting tips: what to do if the CRCs are not shown correctly?
Sometimes instead of the correct text you see {m_Crc=<some number>} or {???} in the debugger. If you see {m_Crc=<some number>} then the Autoexp.dat is not modified correctly. Probably:
- The CRC=$ADDIN... line was not added to the [AutoExpand] setction. If you added the line at the very end, it most likely is inside the [hresult] section
- Maybe you put the CRC class in some namespace. You have to use the full class name in Autoexp.dat
- There may be more than one Autoexp.dat files. For example an embedded system uses an alternate debugger within Visual Studio and has its own Autoexp.dat file
If instead you see {???} in the debugger, then the CRC class is found in Autoexp.dat, but there is another problem. Probably:
- There is a typo in Autoexp.dat (there must be no space between the comma and "CRCView" - see above)
- The CRCView.dll is not found (check if the path listed in Autoexp.dat is correct)
- The CRCView.dll doesn't export the
CRCView
function or exports it with a decorated name - this can happen if you compiled the DLL yourself and forgot to include the DEF file in the linker settings. Use "dumpbin /exports CRCView.dll" to verify what symbols are exported
- The CRCView function returns an error (no longer the case, see below)
The latest version of CRCView.dll comes with some troubleshooting features. It never returns an error. Instead, if error is detected, it will put an error message in the output text and return S_OK. Possible error messages are:
- Can't access CRC value -
ReadDebuggeeMemory
failed. Most likely the address of the CRC value is invalid
- VSHelper is disabled - the evaluator detected the VSHelper add-in, but it did not provide valid
g_DebugData
value. Most likely the functionality is disabled. Check the VSHelper settings [3]
- Can't find 'CRCV' data - the
g_DebugData
was not found. Most likely CRC_STRINGS
is not defined in your project settings
- Can't access CRC table - the
g_DebugData
was found, but the string table (the std::map) it points to is corrupted
- No text available - the CRC is not in the string table. This can happen if you create a CRC object from a numeric value directly (For example CRC crc(10); )
- Failed to retrieve text -
ReadDebuggeeMemory
failed to access the text from the string table. Most likely the table is corrupted
Licensing
The source code, the binaries and this article are owned by Pandemic Studios. They can be freely used for commercial and non-commercial purposes under the terms of the MIT license. A copy of the license is included in the readme.rtf file in CRCView.zip.
Future development
Currently all the memory for the string database, the map nodes and the strings themselves, is allocated from the CRT heap. This data structure grows in size, and is freed at shutdown. It will be more optimal to remove that load from the CRT heap and use some sort of custom allocator optimized for such behavior. One way is to request big blocks of memory from the heap and do multiple sequential allocations inside them. Another way is to reserve a big chunk of the address space with VirtualAlloc and grow the number of physically allocated pages as needed. It would be nice to add the CRCView.dll to the installer of VSHelper. The installer must also register the DLL in the Autoexp.dat file.
The DebugData system provides a way for the EE add-ins to access arbitrary data from the application. Maybe someone can come up with another cool use of that feature.
Special thanks
My special thanks to Pandemic Studios and to the Full Spectrum Warrior engineering team with lead coder Alex Boczar.
Links
[1] Practical Hash IDs By Mick West, Game Developer Magazine, Dec 05
[2] EEAddIn Sample: Debugging Expression Evaluator Add-In
[3] VSHelper - Visual Studio IDE enhancements
History
- Jan, 2006: First version
- Simple CRC32 implementation with Expression Evaluator for Visual Studio
- Oct, 2006: New features
- Support for big-endian targets
- Troubleshooting features
- Feb, 2007: Published under the MIT license