Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Symbols File Locator

0.00/5 (No votes)
28 Mar 2012 3  
How to use the Debug Interface Access (DIA) Application Programming Interface to locate the debug symbols file referenced by an assembly.
Snapshot_Prompt.jpg - Click to enlarge image

Introduction

My previous article was meant to be an introduction to the latest Microsoft Debug Interface Access (DIA) infrastructure. The focus of the article was the Program Database (PDB) files, showing a set of functions of the DIA family and also presenting a project that wraps DIA in a set of programmer friendly virtual C++ interfaces.

In this article, I will continue the investigation of the Microsoft DIA potential, focusing on the portable executable files. As in the previous article, I will also present a project, with sources, that wraps the DIA interfaces into a set of virtual C++ interfaces. The snapshot below shows the console application presented here and the kind of information that can be retrieved with my sample:

  • The location of the public Symbols Store
  • The location of the local Symbols Store
  • The GUID of the PDB symbols file that is referenced by the assembly
  • The full path of the PDB symbols file that is referenced by the assembly
  • The real path of the PDB symbols file that has been found by the system
  • The list of path attempts made by the system when searching for the PDB symbols file

Background

The Debug Interface Access (DIA) is a new application programming interface that client applications should use when coping with symbols information hosted in Program Database (PDB) files. The image below shows an overview of the different existing APIs.

Symbols_File_Locator/API_Available.jpg

For historical reasons, most developers coping with symbols files typically use the well known DbgHelp.dll interface. The new DIA interface can not only be used to query PDB files but also to investigate portable executable (EXE, DLL, SCR, CPL, SYS…) files and to collect debug related information out of them.

Global View

Just as a reminder, the image below shows the global view of the different parts involved when producing code and debugging it. The compiler creates an assembly together with its associated debug symbols file. The debugger opens the assembly and tries to find the file containing the debugging information.

Symbols_File_Locator/PDB_and_EXE_accessed.jpg

Search of the Symbols File

Before being able to setup a breakpoint or to watch any variable, the debugger begins its job with the quest to locate the debug symbols file that is associated with the assembly to debug. As previously mentioned, native and managed assemblies embed the GUID and the full path name of the PDB file that contains the debug data of the code.

By default, the linker writes the full qualified name to the PDB file into the image file. If you want it, you can strip the path to the PDB file and keep only the name (and the extension) of the PDB file by using the following technique. Have you ever take a look to the Windows image files (e.g. notepad.exe, kernel32.dll,..)? As far as I see, Microsoft always strips the path to the PDB when building their images. This saves a few bytes when building the image files and hides the name of the directory on the build machine where an image has been built (e.g. "d:\temp\version2\free_demo_build_whithout_some_features\.... xy.pdb" ).

When a debugger attaches to an assembly, it reads its image file and checks whether it has been compiled with debug information. When positive, it reads the embedded GUID of the expected associated PDB file. Based on this information, it searches at different locations and tries to find the PDB whose GUID correlates with the one found in the image file to debug. On its quest, the debug engine searches at the following locations:

  • Location pointed by the full path (when available) of the PDB file as embedded in the image file
  • The directory from which is image is loaded
  • Local symbol cache when available
  • Remote symbols server when available. When the file has been found on the remote symbols server, the symbols engine copies the PDB file to the local symbols cache. (By the next time the process will be debugged, the launch of the debug session will be must faster since the PDB file will be found on the local symbols cache.)

Symbols_File_Locator/Symbols_search.jpg

Architecture

The software architecture of the project presented here is built with the same concept as in my previous article. Interfaces cannot be directly created; they can only be obtained. The idea behind this concept is to free the consumer of any memory management and responsibility.

classes.jpg - Click to enlarge image

All classes are virtual and therefore cannot be instantiated directly.

Using the Code

The project presented here consists of two parts:

  • SymbolsParser: C++ project - implements the SymbolsParser.dll which is a wrapper around a few DIA interfaces
  • ConsoleSymbolsParser: C++ Win32 console project - consumes the SymbolsParser and shows a little information about the PDB file referenced by an assembly.

Opening a Portable Executable (PE) file is made in two steps:

  • Instantiate the SymbolsParser using the ISymbolsParserFactory::Create() function:
    ISymbolsParser* pISymbolsParser = ISymbolsParserFactory::Create(); 
  • Open a specific executable file using ISymbolsParser::Open() function
    wstring sFile = L"c:\\windows\\system32\\eventvwr.exe";
    IPeFile* pIPeFile = pISymbolsParser->OpenPeFile(sFile); 

One can then invoke the IPeFile functions.

  • Retrieve the GUID of the referenced PDB file using the IPeFile::GetGuid() function.
    // Retrieve the file GUID
    wstring sGuid = pIPeFile->GetGuid();
    wcout << L"GUID:" << sGuid.c_str() << endl;
  • Retrieve the full path of the referenced PDB file using the IPeFile::GetBuiltinSymbolsPath() function.
    // Retrieve the Symbols Path referenced by the File
    wcout << L"Built-in PDB Path:" << 
        pIPeFile->GetBuiltinSymbolsPath().c_str() << endl;
  • Retrieve the full path of the location where the PDB file has been found using the IPeFile::GetFoundSymbolsPath() function.
    // Retrieve the Symbols Path found by the system
    wcout << L"Found PDB Path:" << pIPeFile->GetFoundSymbolsPath().c_str() << endl; 

In its quest to locate the PDB file that is referenced by an executable file, the symbols engine searches at different locations. Using the ISymbolsParser::GetSymbolsSearch() interface, one can obtain an object that can be used to enumerate the paths visited during this search.

// Collect Search paths details
ISymbolsSearch* pISymbolsSearchPath = pISymbolsParser->GetSymbolsSearch();
typedef std::map<wstring, bool> Paths;
Paths paths = pISymbolsSearchPath->GetPaths();
Paths::iterator it = paths.begin();
for ( ;it!=paths.end();it++)
{
std::wstring s = it->first;
wcout << s.c_str() << endl;
}

The location of the local and remote Symbols servers can also be retrieved by obtaining a pointer to the ISymbolsStore interface.

// Obtain a pointer
ISymbolsStore* pIEnvironment = pISymbolsParser->GetSymbolsStore();
// Collect data
wcout << L"Public Symbols Store:" << 
    pIEnvironment->GetPublicSymbolsStore().c_str() << endl;
wcout << L"Local Symbols Store:" << 
    pIEnvironment->GetLocalSymbolsStore().c_str() << endl;

When appropriate, the resources allocated by the SymbolsParser must be freed using the ISymbolsParser::Destroy() function.

Environment

The project has been developed with Visual Studio 2008 and tested on Windows Vista Ultimate 32bit only.

Suggestion

Once a IPeFile has been obtained, one could retrieve a IPdbFile and continue to collect more details about the PDB file which is referenced by the executable. This project does not implement this bridge.

Symbols_File_Locator/From_IPeFile_to_IPdbFile.jpg

I left the implementation of this bridge as an exercise for the reader. One can have a look at my previous article, which presents the IPdbFile interface.

Links

History

  • 6th July, 2009: Initial post

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here