Introduction
As Windows software developers, we all extensively use Visual Studio and/or WinDbg to step into our code, set breakpoints, watch variables, and perform many other useful
tasks related to the debugging of applications. We somehow know that an internal mechanism exists in order to enable debuggers to map source code to binary and step
into many of the available runtime libraries. For this purpose, debuggers use Program Database (PDB) files for managed as well as unmanaged code. PDB for managed
code contains less debug information since these are located in the metadata section of the PE sections.
This article has several goals:
- Show the existence of PDB files and how debuggers use them.
- Show the existing technology used to retrieve their content.
- Give an idea about the importance of PDB files while debugging and the kind of information embedded in them.
- Present a project that implements a comfortable C++ wrapper on top of the esoteric DIA classes as well as a PDB inspector front end. This is the first part
of a series dedicated to PDB and their executables counterpart. This article concentrates on one aspect of these PDB files, namely the modules referenced.
Background
As explained by John Robbin in the article mentioned below, "a native C++ PDB file contains a lot of information:
public
, private
, and static
function addresses
- Global variable names and addresses
- Parameter and local variable names and offsets where to find them on the stack
- Source file names and their lines, etc..."
A .NET PDB file only contains two pieces of information: (from John Robbin in the article mentioned below)
- The source file names
- Their lines and the local variable names
All the other information is already in the .NET metadata so there is no need to duplicate the same information in a PDB file.
For those of you not familiar with the Windows Debug Interface Access, Program Database (PDB), and the basic ideas presented here, a few essential links:
When compiled with debugging information, an executable file contains two references to the associated PDB file:
- A GUID that matches the one placed in the expected PDB file
- The full path of the associated PDB file that will be used during the debugging session
When a program to be debugged is launched, the debugger goes into the executable file and tries to locate the correct PDB file to proceed to the debugging session.
The links above explain these along with how to setup a Symbols server.
Using the Code
The PDB project presented here consists of three parts:
- PdbParser: C++ project - implements the PdbParser.dll which is a wrapper to the DIA interface.
- PdbInspectorConsole: C++ Win32 console project - consumes the
PdbParser
and shows the modules referenced in a PDB file.
- PdbInspector: C++ MFC project - consumes the
PdbParser
and shows the modules referenced in a PDB file and a few of the available details related to the modules.
Environment
The project has been developed and tested on Windows Vista Ultimate 32bit only.
Classes Hierarchy
As mentioned earlier, the Microsoft DIA SDK is a COM-based interface to handle PDB files. The problem with this SDK is that it consists of a tremendous collections
of interfaces and functions. The PdbParser
presented here abstracts these details and offers a simple task oriented set of interfaces. In this version,
the PdbPaser
concentrates on the collection of modules. The PdbParser
is organized into a set of abstract layers. Opening a PDB file is done in two steps:
- Instantiate
PdbParser
using the IPdbParserFactory::Create()
function:
IPdbParser* pIPdbParser = IPdbParserFactory::Create();
Open a specific file using the IPdbParser::Open()
function:
IPdbParser* pIPdbParser = IPdbParserFactory::Create();
IPdbFile* pIPdbfile = pIPdbParser->OpenFile(L"test.pdb");
In order to retrieve details about a specific module referenced in a PDB file, you has to go through three additional steps:
- Collect the Modules using the
IPdbFile::GetModules()
function.
- Collect the details about a specific module using the
IPdbModule::GetModuleDetails()
function.
- Use the
IPdbModuleDetails
functions available.
vector<ipdbmodule*> vModules = pIPdbfile->GetModules();
vector<ipdbmodule*>::iterator it = vModules.begin();
for( ;it!=vModules.end();it++)
{
IPdbModule* pIPdbModule = *it;
wprintf(L"%ws\n", pIPdbModule->GetName().c_str());
}
In order to retrieve the source file names of a specific module, one has to go through three steps:
- Collect the modules using the
IPdbFile::GetModules()
function.
- Collect the files referenced by a specific module using the
IPdbModule::GetSourceFiles()
function.
- Use the
IPdbSourceFile::GetFileName()
function.
std::vector<ipdbsourcefile*> vSources = pIPdbModule->GetSourceFiles();
std::vector<ipdbsourcefile*>::iterator it = vSources.begin();
for( ;it!=vSources.end(); it++)
{
IPdbSourceFile* pIPdbSourceFile = *it;
wprintf(L"%ws\n", pIPdbSourceFile->GetFileName().c_str());
}
When appropriate, the resources allocated by PdbParser
are freed using one last step.
- Release the allocated resources using the
IPdbParserFactory::Destroy()
function.
IPdbParserFactory::Destroy();
The image below shows the accessors-based hierarchy:
History
- 19.06.2009 - The focus in this project is the enumeration of the modules and some of their details
- 23.06.2009 - Added the enumeration for the source file names
- 02.07.2009 - Corrected an open/close issue; added the
IsStripped()
method
- 20.08.2011
- Added support for drag and drop of a PDB file on the UI
- Removed the console demo
- Updated my web address
- 30.08.2011
- Shows compiler name and version
- Shows checksum type and value
- 26.06.2013
- Changed the path the DIA SDK
- Built and tested with VStudio 2k8 on Windows 7-64 bit in debug and release modes