Optimized Enter/Leave Logger

Santhosh G_

4.73/5 (4 votes)

14 Jul 2013CPOL18 min read

28.1K

741

Optimized method to log Enter and Leave information of functions.

Introduction
Background
Details
Implementation

How to get instruction pointer of a function
Save the raw data whenever an issue occurs
Use PDB file to get the function name
Steps to get the function name from a Virtual Address
Load address of all modules
Optimized way to save the Time
Optimized way to save Thread ID

How to use LogEnterLeave in your application
LogParser application
Limitations
Points of Interest
References
History

Introduction

This article introduces an optimized enter and leave log in your code. Sometimes we put

TRACE()

, or other methods in all functions to identify the call sequence of our software system. If all functions call TRACE(), indicating some thread entered in this function or returned from a function, then at a later time, or when an issue happens, developer can check these log files to identify the improper call sequence or incorrect call sequence. Anyway, Enter and Leave logs in a software system will be helpful to identify issues at a later time. Here a simple concept is implemented with two applications (LogTest and LogParser). Please send your suggestions and comments to improve this logging mechanism and it will be my pleasure to correct/improve this Enter/Leave log method.

Background

There are many ways to prepare Enter and Leave log of all functions. Preparation of a formatted log will consume a small amount of processor time. If we can prepare required data in an encrypted or raw form at runtime, then it will reduce the processing cost. The encrypted or raw form of data will be converted to readable format at later time, or whenever needed with the help of PDB files. PDB files contain all information ( source file names, function names. etc ) of a binary, then It will be easy to convert these instruction address at runtime to a function name at a later time with the help of corresponding PDB files.

Details

There are many methods to provide Enter and Leave log of a function.

An example code will be like this:

C++

Class1::Fun1()
{
MyLogger::Log( "Entering  Class1::Fun1()");
// …
Do necessary things of Fun1()
// 
MyLogger::Log( "Leaving Class1::Fun1()");
}

And its log file will be like this.

 LogFile.txt
Time    	Log
---------------------------------------------
08:10:22:00 Entering Class1::Fun1()
08:10:22:10 Leaving  Class1::Fun1()

Here I present an optimized way to log Enter and Leave information of functions. It is intended in Windows operating systems. Currently it is implemented in 32 bit architecture, and it can be prepared in 64 bit versions also. There are many methods to prepare Log or Trace information in a software system. Most commonly it can be achieved by writing enter/ leave information to a log file.

Here I have provided a basic view of the Enter/Leave logging mechanism. Many other information can also be there in log files like Thread ID, entering time, leaving time, duration of function execution, etc.

Normal logging mechanism has some disadvantages.

Processing Cost: Loggers prepare a string indicating all information to the log file and save them to disk. The output text formatting of the function will be a little bit costly if the frequency of logging is too high.
Disk Space: All error logs will be saved to the hard disk (in most cases) and it will consume some hard disk space. If the size of the log file increases, the next file writing will be slower and will create performance issues.
Chance of Incorrect logging: If Enter or Leave log is placed in a wrong way, then Logger will provide the wrong information unless we find a problem in that function.

There is an interesting approach to reduce the processing cost of the function Enter/Leave Log. Prepare a unique number for each function and log these numbers to a log file. I have a suggestion to prepare a unique identifier for each function, and log this identifier instead of the entire string, and later you can parse these identifiers and generate the function names.

An example will be like this:

C++

// Enumeration for each functions.
// These enums will be saved at runtime.
enum
{
    FUNCTION_1,
    FUNCTION_2
};
 
// String equivalents.
char* pFunctionNames[]=
{
    "Class1::Function1",
    "Class1::Function2",
};
 
// In code.
Class::Func1()
{
    Logger::Log(FUNCTION_1);
    // Here logger will keep integer value of FUNCTION _1 instead of lengthy string.
}
Class::Func2()
{
    Logger::Log(FUNCTION_2);
    // Here logger will keep integer value of FUNCTION _1 instead of lengthy string.
}

In this case numbers (instead of the whole function name) will be saved to LogFile, and at a later time (after completion of process execution), these numbers will be converted to string.

The above suggestions are not good methods to implement in a real system. It requires a unique number for each and every function. When a new function is added or removed we have to modify the constants and their corresponding enumerations etc.. Therefore it will be very difficult to implement this method in a real software system.

Here is the new method.

An instruction pointer (virtual address of instruction) holds the address of the current instruction. If the PDB (Program Data Base) file of the corresponding binary is available, at a later time it is possible retrieve the function name from the Instruction Pointer. The basic concept is to save the instruction pointer of a function, instead of writing the entire function name. At a later stage, whenever the call sequence of the software is needed, the instruction address will be converted to the actual function names. This conversion can be performed with the help of PDB files. This way the call sequence of all functions and threads can be easily retrieved.

It will solve the above mentioned three problems to an extent.

Processing cost will be less because the address is saved instead of the formatted string.
Disk Space will be reduced: this method saves 4 bytes in x86 machines instead of a large class name and function name.
Chance of Incorrect Log will be avoided because it saves the address at runtime. Then the incorrect string will not be logged.

C++

Class1::Fun1()
{
    MyLogger::Log();
    // …
    Do necessary things of Fun1()
        // 
        MyLogger::Log();
}
MyLogger::Log();
// It will retrieve the instruction pointer
// of calling function and save to file or memory.

Implementation

1. How to get the instruction pointer of a function

When debugging an application, register windows show a register named EIP (Extended Instruction Pointer), but its contents are not accessible programmatically. I tried to retrieve EIP like other registers by writing inline assembly code to move the contents to my integer or DWORD members. But it gives a compilation error that EIP label is undefined etc.

When a function is called, a new "stack frame" will be created to allow access to both function parameters and local function variables. When a function is called, a new stack frame is created and its address will be saved in ESP. The following figure shows the layout of the stack in X86 machines. The EBP register in x86 machines holds the stack base pointer and can be used to access different information related to the currently executing function. Here the return address of the called function is placed in the very next location of EBP. Therefore anyone can get the address of the returning function by analyzing the EBP register.

Layout of stack in X86 machines.

Then the following code can be used to get the return address of a function.

C++

int* nAddress = new int;
__asm
{
    mov nAddress,EBP
}
nAdress++; // To get next location of EBP.
// Now nAdress pointing to the address of returned function.

The above information can be proved by debugging a code and analyzing the contents of the EBP register and memory. Following is a demonstration of the above scenario, Can I get the instruction pointer of the caller address by analyzing the EBP register and memory?

The above code is used to analyse the EBP register and memory. Here the _tmain() function calls MyFunction1(), and it will call GetCallerAddress(). Put a breakpoint in MyFunction1() and note down the address of the very next instruction after calling GetCallerAddress(). That is 0x0033144A. The following screenshot is the disassembly of MyFunction1(). It shows the instruction address in the left column and its corresponding assembly instruction in the next right columns. Bold lines are the actual source code, and its assembly will follow it in next row(s). Therefore calling GetCalllerAddress() is at 0x331445. And its assembly code is as follows.

00331445 call GetCallerAdddress(3310Dch)

After execution of GetCallerAddress(), the program execution should switch to the instruction at 0x0033144A.

Then put a breakpoint in GetCallerAddress(), and note down the contents of the EBP register. And the contents near EBP is watched with the memory watcher window. The very near location of EBP holds the address of instruction which should be invoked on returning GetCallerAddress(), which is 0x0033144A. In the below screenshot, the contents of the next location of EBP is watched with the memory watch window, and it shows 4A 14 33 00 in the nearest location. 0x0033144A is represented in the above format, because Windows NT is designed around Little Endian Memory architecture.

Screenshot of Memory Watcher shows the contents of the EBP register

It is possible to retrieve the address of the caller function by processing the EBP register. The following code retrieves the caller address from the EBP register and retrieves the return address.

C++

void GetCallerAddress()
{
    int nEBPContents;
    __asm mov nEBPContents, EBP; // Retriving 
    // Contents of EBP is converted as a pointer. Then very next location
    // indicates return address the current function 
    int* ptrAddress = (int*)nEBPContents;
    int nReturnIP = *( ++ptrAddress );
    // Here nReturnIP holds the address of caller function.
}

Now it is very easy to retrieve the address of a function from a called function.

2. Save the raw data whenever an issue occurs

We can save the function address to a file in sequential order with some other additional information like thread ID, entering time, leaving time, etc. These log information are written to a log file. But a user or developer can't identify the order of the call sequence from this function addresses. It is not possible to find out the name of a function from its address. Here comes the next problem. How to retrieve the function name from the address of a function (Instruction Pointer)?.

3. Use PDB file to get the function name from the virtual address of that function

Now we have retrieved the virtual address of all the called functions, and we will save these information to a log file. Instead of saving the entire function name [like Entering MyClass::MyFunction1] and Enter Leave information, we will save the virtual address of the function [as mentioned in Steps 1 and 2] with a few added bytes indicating the time, thread ID, enter or leave flag, etc. In a conventional Enter/Leave logging mechanism, the name of the function will be logged to the file, and its size will be greater than four bytes in all cases. In most cases it will take 20 to 30 bytes of data. This mechanism saves the virtual address of the function. It will be 4 bytes for all functions, even if the name of the function is “MyClasssBlaBlaBlaBlaBlaBlaBlaBla::MyFunctionFooFooFooooFoooOooooo”. It just saves the address of the function at runtime. At a later time another application (LogParser, which is responsible to parse the 4 byte virtual address of all functions to its original name) should read these addresses and retrieve the actual name of the function. Here the PDB file and DIA SDK play their role to provide the function name from the virtual address.

Before that let me say a few words related to a PDB file, a PDB file provides debug information required for a binary (DLL or exe). It contains debug information, including the function name, source file details, source file line numbers, function prototype, and other information. The file format of PDB is undocumented and Microsoft proprietary, information can be extracted from a PDB file using the DIA (Debug Interface Access) interfaces. The PDB format is hidden, and Microsoft is not willing to share the file format with us. Microsoft DIA SDK provides different interfaces to access debug information of a binary.

More information related to the DIA SDK is available at Debug Interface Access SDK.

Steps to get the function name from a Virtual Address

The following steps are required to get the function name of an instruction address from its PDB and binaries with DIA SDK.

Create an object of IDiaDataSource, COM interface.

C++

HRESULT hr = CoCreateInstance( CLSID_DiaSource, NULL,
                                   CLSCTX_INPROC_SERVER,
                                   __uuidof( IDiaDataSource ),
                                   (void **) &m_pSource);

Call m_pSource->loadDataForExe( binaryFileName, PdbSearchPath,0 ) to load the debugging information of the module.

C++

if( FAILED( m_pSource->loadDataForExe( m_ptszModuleName, m_ptszPDBPath, 0 )))
{
    return false;
}

Create a session to query symbols from the PDB by calling openSession.

C++

if( FAILED( m_pSource->openSession( &m_pSession )))
{
    return false;
}

Provide the load address of the corresponding module by calling m_pSession->put_loadAddress(nLoadAddress).

C++

HRESULT hReturn = m_pSession->put_loadAddress( nLoadAdddress_i );

Call pSession->findSymbolByVA() to get the symbol of a particular virtual address. The first argument is the virtual address to the function, and the second argument is SymTagFunction indicating the function details are required. An object to IDiaSymbol will be returned.

C++

CComPtr<IDiaSymbol> pSymbol = 0;
HRESULT hResult = m_pSession->findSymbolByVA( nFunctionVA_i, SymTagFunction, &pSymbol );

Call get_name() of the returned symbol and it will provide the function name of the corresponding virtual address.

C++

BSTR bFunctionName = 0;
pSymbol->get_name( &bFunctionName );

4. Load address of all modules

In step 4, the put_loadAddress() function is called with the load address of the module. Therefore we have to retrieve the load address of the module to process it with the DIA SDK. On starting an application, Windows loader identifies dependent modules (DLLs) and loads them into the process memory. Imports section of the PE holds details of the imported functions from different DLLs. These dependent DLLs will be loaded by the Windows loader, and made available to the process to call at a later time. Dependent DLLs will be loaded to the preferred load address of the corresponding DLL. If the preferred load address is not vacant, then the Windows loader will load that particular DLL in any free address in the process, therefore the loaded address of a DLL cannot be identified after or before the execution of the process.

In DIA (Debug Interface Access) Interfaces, prior to findSybolByVA(), the Debug session should call put_loadAddress() of the corresponding module.

Therefore it is necessary to retrieve the load address of all modules at runtime and save them into a log file in some format. Later LogParser (application responsible to parse the log) should read these module details and use them to call DIA SDK functions (loadDataForExe, put_loadAddress).

Initialization of the Logger will read the load address of all the modules and save them to a log file and at a later time, LogParser will read the module information and call the necessary functions (put_loadAddress) of the DIA SDK.

GetModuleInformation and EnumProcessModules APIs are used to read the information (load address, size of image) of all the modules in the current process. The source code to enumerate all modules of the process is as follows.

C++

// To Enumerate all modules in the process.
bReturn = EnumProcessModules( hCurrentProcess, phModuleArray, sizeof( HMODULE ) * nModules, &nModules ); 
 
for( int nIndex =0; nIndex < nModules; nIndex++ )
{
    // Retrieve information of a module( Load address and Size ).
    GetModuleInformation( hCurrentProcess, phModuleArray[nIndex], &stModuleInfo, sizeof( stModuleInfo ));
    GetModuleFileName( phModuleArray[nIndex], tcBufferModule, MAX_BUFFER_SIZE );
 
}

findSymbolByVA() accepts a virtual address, and returns the IDiaSymbol interface to the corresponding symbol belonging to the virtual address, or its nearest address. Before calling this function, the load address of the module should be given the IDiaSession, then conversion from the virtual address to relative virtual address will be performed by the DIA SDK. After that get_name() of the corresponding IDiaSymbol will provide the name of the function. That’s all! We got the name of the function from its virtual address (address of the function at runtime) with the help of the PDB file.

The retrieved name of the function will be saved to the output file with additional information like Enter or Leave, Time, Thread ID. This way a raw format log file is converted to a readable log file.

5. Optimized way to save the time

Our objective is to save minimal data with less processing cost, i.e., Enter/Leave information of a function should be saved with minimal instructions. GetLocalTime() can be used to get the current local time in a structure, SYSTEMTIME. This structure holds Hour, Minute, Second, and Millisecond information.

C++

typedef struct _SYSTEMTIME {
    WORD wYear;
    WORD wMonth;
    WORD wDayOfWeek;
    WORD wDay;
    WORD wHour;
    WORD wMinute;
    WORD wSecond;
    WORD wMilliseconds;
} SYSTEMTIME, *PSYSTEMTIME, *LPSYSTEMTIME;

If we call the GetLocalTime() API, it will retrieve the current system time in the SYSTEMTIME structure. The size of the SYSTEMTIME structure is 16 bytes, therefore each function should save 16 bytes of data on entering and leaving the function. Here is another optimized way to get the current system time. The GetTickCount() API returns the number of milliseconds (as a DWORD) that has elapsed since the system was started. If we know the start-up time as SYSTEMTIME, it is easy to convert the returned tick count to the current SYSTEMTIME structure.

In our case, we can save the tick count and local time at start-up. Conversion of a new tick count (retrieved while entering/leaving a function) to SYSTEMTIME will be as follows.

DWORD TickCountStart indicates the tick count at the time of startup of the application.
SYSTEMTIME stStartupTime holds the SYSEMTIME structure, which contains the Minute, Second, Millisecond information.
DWORD dwNewTickCount, New tick count, retrieved when entering to a function.

Find difference between TickCountStart and dwNewTickCount, and convert it seconds
Prepare CTimeSpan instance timeSpan with the calculated seconds, in step 1.
Prepare CTime instance with stStartupTime and add it with timeSpan, returned CTime object will be the new time corresponding to dwNewTickCount.

C++

 /* 
This function receives a TickCount( DWORD ) and provides SYSTEMTIME as output.
The starttime( SYSTEMTIME instance) and its tickcount is used to convert
current tickcount to SYSTEMTIME.
 */
SYSTEMTIME TimeParser::ConvertTickCountToSysTime( const DWORD dwTickCount_i )
{
    DWORD dwDiff = dwTickCount_i - m_dwStartTime; // Ms after startup Time.
    // CTimeSpan timeSpanObj( dwDiff / 1000 );
    // CTime timeInSec = m_SystemStartTime + timeSpanObj;
    SYSTEMTIME stToAdd = m_stStartTime;
    stToAdd.wMilliseconds += dwDiff;
    long lSeconds = 0;
    if( stToAdd.wMilliseconds > 1000 )
    {
        lSeconds = ( stToAdd.wMilliseconds / 1000 );
    }
    // LONG lDays, int nHours, int nMins, int nSecs
    CTimeSpan timeSpan( 0, 0, 0, lSeconds ); 
    int nMsOutput = stToAdd.wMilliseconds % 1000;
 
    // Reset Millisecond to 0, avoid calculation from CTime class.
    stToAdd.wMilliseconds = 0;
    CTime       stToAddObj = stToAdd;
    stToAddObj = stToAddObj + timeSpan;
 
    stToAddObj.GetAsSystemTime( stToAdd );
    stToAdd.wMilliseconds = nMsOutput;
    return stToAdd;
}

6. Optimized way to save Thread ID

Windows provides an API to get the ID of the current thread with the GetCurrentThreadId() function. After some searching in programming forums, I got a method to get the Thread ID without calling any Windows functions: Thread Information Block, which holds many information related to a thread. TIB can be used to get a lot of information on the process without calling the Win32 API. To retrieve TIB, we have to write inline assembly in our code as follows. TIB holds the thread ID in its 0x24 location, and its address is moved to the EAX register. The contents of EAX is moved to dwThreadID and it can be saved to our log file. Following is the code to retrieve the current Thread ID in a fast way.

C++

// Get Thread ID from TIB.
DWORD dwThreadID;
__asm {
    mov EAX, FS:[0x24]
    mov [dwThreadID], EAX
}

How to use LogEnterLeave in your application

Add Log.h, Logger.h to your project. These files can be downloaded from the Download HeaderFiles section of this article.

Logger.h should be included once in your application, say in App.cpp. Log.h contains the LogEnterLeave class declaration and definition. To create Enter and Leave log of a function, create an instance of LogEnterLeave in the first line of the function. That’s all, the LogEnterLeave constructor and destructor will do the required things. LogDetails.bin and LegDetails.dat files will be created, which contain enter and log information of the functions with LogEnterLeave objects. Following is the class declaration of LogEnterLeave:

C++

// Declaration of LogEnterLeave in Log.h
class LogEnterLeave
{
 
public:
    LogEnterLeave()
    {
        int nEBPContents;
        __asm mov nEBPContents, EBP;
        int* ptrAddress = (int*)nEBPContents;
        const int nReturnIP = *( ++ptrAddress );
        // Here nReturnIP holds the address of caller function.

        WriteToFile( nReturnIP, true ); // Here Enter flag is true
    }
    ~LogEnterLeave()
    {
        int nEBPContents;
        __asm mov nEBPContents, EBP;
        int* ptrAddress = (int*)nEBPContents;
        const int nReturnIP = *( ++ptrAddress );
        // Here nReturnIP holds the address of caller function.

        WriteToFile( nReturnIP, false ); // Here Enter flag is false
    }
 
private:
 
    void WriteToFile( const int nAddress_i, const bool bEnter_i )
    {
        // Get Thread ID from TIB.
        DWORD dwThreadID;
        __asm {
            mov EAX, FS:[0x24]
            mov [dwThreadID], EAX
        }
 
        const int nTickCount = GetTickCount();
        const LOG_DETAILS_t stLogDetails = { nAddress_i, dwThreadID, nTickCount, bEnter_i };
 
        FILE* pFile = fopen( "LogDetails.dat", "ab" );
        fwrite( &stLogDetails, sizeof( stLogDetails ), 1, pFile );
        fclose( pFile );
    }
};

Now LogDetails.bin and LegDetails.dat are created, these are small sized log information. The LogParser application should be used to convert the LogDetails.bin file to a readable log file. LogParser expects the PDB and exe files are kept in the same folder. If PDB files are in a different path, the “PDB Search Path” button can be used to provide the search path of the PDB files.

A sample dialog based application is created to demonstrate the usage of LogEnterLeave.

Please download the LogTest application and check its code changes in LogTestDlg.cpp and LogTest.cpp. LogTest.cpp adds a header file Logger.h, and it will perform the necessary initialization of the logger. Please do not add Logger.h in other header or source files. After that “Log.h” is added in LogTestDlg.cpp and its functions are modified to create a LogEnterLeave object to log Enter and Leave information. A code portion of LogTestDlg.cpp is shown below.

C++

void CLogTestDlg::OnBnClickedButton2()
{
    LogEnterLeave LogObj; // To log Enter and Leave Information.
    AfxMessageBox( L"CLogTestDlg::OnBnClickedButton2()" );
}
 
void CLogTestDlg::OnBnClickedOk()
{
    LogEnterLeave LogObj; // To log Enter and Leave Information.
    AfxMessageBox( L"CLogTestDlg::OnBnClickedOk()" );
}
 
void CLogTestDlg::OnBnClickedCancel()
{
    LogEnterLeave LogObj; // To log Enter and Leave Information.
    AfxMessageBox( L"CLogTestDlg::OnBnClickedCancel()" );
    OnCancel();
}

Following is the output prepared by a sample execution of the LogTest application.

LogParser Application

The LogParser application is responsible to convert the binary format log information to a readable format. It reads the module details and log details and prepares the output file in the required format.

Screenshot of LogParser application

Classes and their responsibilities:

LogParserDlg: Display main dialog of the application and handle different GUI events. This class uses all other supporting classes (ModuleParser, PdbParser, and TimeParser) to prepare output log file.
ModuleParser: It reads LogDetails.bin and saves module information (load address, module name, image size). This file contains the module information of all dependent modules in the process. GetModuleFromVA receives a virtual address and returns the MODULE_INFO_t that holds the details of the module loaded in the specified virtual address.
PdbParser: This class is a wrapper of the DIA SDK, it provides two public member functions, Initialize() and GetFunctionNameFromVA(). Initialize() receives a module name and load address of the module. DIA component instances (IDiaDataSource, IDiaSession) are created for that module. Then querying function names will be made with these component instances. GetFunctionNameFromVA() reads a virtual address and returns the name of the corresponding function.
TimeParser: This class receives the start time (as SYSTEMTIME) and start tick count (a DWORD) in its constructor. Then ConvertTickCountToSysTime() can be used to convert a new Tickcount to SYSTEMTIME. Details are explained in an optimized way to save the time section in this article.

Limitations

This method cannot be used for modules loaded with explicit linking. If a DLL is loaded at runtime by LoadLibrary(), then its module load address cannot be tracked at startup. And the same load address may be assigned to other modules at a later time. I have a plan to hook LoadLibrary() and FreeLibarary() to track the load address of the modules with an explicit link method.
This method cannot be used to provide RealTime logging (like logging to console) that we can see during execution of the program. It saves the data in raw format and is not understandable directly. LogParser is required to convert the log to readable format.
Logs from multiple threads are not implemented in LogEnterLeave. It is possible to add a CriticalSection instance to lock the LogFile written by multiple threads.
If PDBParser uses an incorrect PDB file, then this mechanism will provide the wrong information. There is another mechanism to verify the PDB and its corresponding binary. It is well explained in the debuginfo article.

Points of Interest

Writing text contents and binary data to a single file causes problem. Therefore split up the log file and module details to two files. Initially I prepared a single file to hold the module information and log data. But reading binary format provided incorrect data, and I prepared two files, one for binary writing (log data) and another for text writing (module information).

References

History

14-July-2013 - Initial version.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)