Microsoft Symbol Engine

Introduction

Last year, I wrote an article detailing some code to provide a stack trace with symbols in Microsoft Windows. [ Orr2004 ]

On reflection, I think the Microsoft symbol engine deserves greater explanation so this article discusses more about the symbol engine, what it does and where to get it from. The ultimate aim is to provide useful information which helps you diagnose problems in your code more easily.

What Are Symbols?

When a program is compiled and linked into executable code, a large part of the process is turning human readable symbols into machine instructions and addresses. The CPU, after all, does not care about symbolic names but operates on a sequence of bytes. In systems that support dynamic loading of code, some symbols may have to remain in the linked image in order for functions to be resolved into addresses when the module is loaded. Typically though, even this is only a subset of the names appearing in the source code.

When everything works perfectly this is usually fine; the difficulties occur when the program contains a bug as we would like to be able to work back from the failing location to the relevant source code, and identify where we are, how we got there and what are the names and locations of any local variables. These pieces of information can all be held as symbol data and interrogated, usually by a debugger, to give human readable information in the event of a problem.

Most programmers using Microsoft C++ on Windows are familiar with the Microsoft Debug/Release paradigm (many other environments have a similar split). In this model of development, you begin by compiling a 'Debug' build of the code base in which there is no optimisation and a full set of symbols are emitted for each compiled binary. This generally gives the debugger the ability to work backwards from a logical address and stack pointer to give the source line, stack trace and contents of all variables. Later in the development process you switch over to building the 'Release' version of the code which typically has full optimisation and generates no symbolic information in the output binaries.

There are several pitfalls with this approach. In my experience the most serious is when you have problems which are only reproducible in the release build and not in the debug build. Since there are no symbols in the release build it can be very hard to resolve the problem.

Fortunately this is easily resolved. It is relatively easy to change the project settings to generate symbolic information for the release build as well as for the debug build. An alternative approach is to abandon (or at least modify) the Debug/Release split, perhaps material for another article!

For Microsoft.NET 2003 C++ you enable symbols in release build by setting options for the compile and link stages. First set 'Debug Information Format' to 'Program Database' in the C/C++ 'General' folder. Then set the linker settings Generate Debug Info to 'Yes' in the 'Debugging' folder, and specify a .PDB filename for the program database file name. Finally you must set 'References' to 'Eliminate Unreferenced Data' and 'Enable COMDAT Folding' to 'Remove redundant COMDATs' in the 'Optimization' folder because the Microsoft linker changes its default behaviour for these two options when debugging is enabled. (Settings exist in other versions of the Microsoft C++ compiler, and also for VB.NET and C#. See [ Robbins ] for more details.)

I also recommend removing other one optimisation setting, that of stack frame optimisation, to greatly improve the likelihood of being able to get a reliable stack trace in a release build. If performance is very important in your application, measure the effect of this optimisation to see whether it makes a sufficient difference to be worth retaining. With these settings applied to a release build the compiler generates a PDB file for each built EXE or DLL, in a similar manner to the default behaviour for a debug build. The PDB file, also known as the symbol file, is referred to in the header records in the EXE/DLL but none of the symbols are loaded by default, so there is no impact on performance simply having a PDB file.

The Symbol Engine

Microsoft do not document the format of the PDB file and it often seems to change from release to release. However they do provide an API for accessing most of the information held in the PDB file and the key to this is a file DbgHelp.dll . This library contains functions to unpack symbol information for addresses, local variables, etc. A version of this DLL is present in Windows 2000, XP and 2003 but Microsoft make regular updates available via its website as 'Debugging tools for Windows' [ DbgHelp ]. Note that if you want to write code using the API you need to install the SDK (by using the 'Custom' installation).

However it is hard to update DbgHelp.dll in place in a running system (and attempts to do so can render some other Windows tools inoperable) so it is recommended that you either:

ensure the correct version of the DLL is placed with the EXE which is going to use it , or
load the DLL explicitly from a configured location.

Personally, I find both these solutions cause unnecessary complications so I simply copy the DLL to DbgCopy.dll and generate a corresponding Dbgcopy.lib file from this DLL, which is included at link time. The makefile included in the source code for this article has a target dbgCopy which builds this pair of files.

The debug help API usually expects to find the PDB file for a binary EXE/DLL by looking for the file in its original location, or along the path. However the Debugging Tools for Windows package also contains a DLL that can connect to a so-called 'Symbol Server' to get the PDB file. Microsoft provide a publicly accessible symbol server containing all the symbols for the retail versions of their operating system, which lets you get symbolic names (and improved stack walking) for addresses in their DLLs. This is invaluable when you get problems inside a system DLL; usually, but not always, caused by providing it with bad data!

This DLL, SYMSRV.DLL , is activated by setting the environment variable _ NT_SYMBOL_PATH to tell DbgHelp to use the symbol server. Note that this only works correctly if the DbgHelp.DLL and SymSrv.DLL are both loaded from the same location and are from the same version of 'Debugging Tools'.

The environment variable can be set from the command line for the current windowed command prompt, or more typically set via the control panel for the current user or even for the current machine. An example setting to load symbols from the Microsoft site is using a local cache in C:\Symbols is:

set _NT_SYMBOL_PATH=SRV*C:\Symbols*
http://msdl.microsoft.com/download/symbols

There are a couple of problems with this simple approach. Firstly, the Microsoft site may not be available (for example, a company firewall may not grant access to the location specified) so the symbols for system DLLs are inaccessible to the symbol engine. Secondly, the symbol engine tries to access the Microsoft site for every EXE or DLL that it loads for which it cannot find local symbols. This can take quite a long time if have many DLLs that do not have any debugging information.

As an alternative you can set up the path as above and use the Symchk program to load symbols for a number of common DLLS (for example KERNEL32 , MSVCRT , NTDLL ), and then remove the http://... portion of the environment variable to just access the local cache.

A more advanced technique which is also available is to set up a symbol server, running on your own network. You can then publish symbol files, built in-house or arriving with third party libraries, to this symbol server for use throughout your company without needing to explicitly install them on every machine.

Using the Symbol Engine

I present some basic code to use the symbol engine, show how to convert an address to a symbol and show a simple example of the stack walking API. Please refer to the help for the debugging DLL (provided with the Debugging Tools SDK - DbgHelp.chm ) for more information and description of other methods that I am not covering in this introductory article.

The symbol engine needs initialising for each process you wish to access. Each call to the symbol engine includes a process handle as one of the arguments, this does not actually have to be an actual process handle in every case but I find it much easier to stick to that convention. Calls to initialise the symbol engine for a given process 'nest' and only when each initialisation call is matched with its corresponding clean up call does the symbol engine close down the data structures for the process.

Note: there are a small number of resource leaks in DbgHelp.dll , some of which are retained after a clean-up, so I would advise you to try and reduce the number of times you initialise and clean up the symbol engine. My simple example code uses the singleton pattern for this reason.

Here is a class definition for a simple symbol engine:

/** Symbol Engine wrapper to assist with 
    processing PDB information 
*/
class SimpleSymbolEngine
{
public:
    /** Get the symbol engine for this process
    */
    static SimpleSymbolEngine &instance();

    /** Convert an address to a string */
    std::string addressToString(
        void *address );

    /** Provide a stack trace for the
        specified stack frame 
    */
    void StackTrace(
        PCONTEXT pContext, 
        std::ostream & os );

private:
   // not shown 
};

This class can be used to provide information about the calling process like this:

void *some_adress = ...;
std::string symbolInfo=
    SimpleSymbolEngine::instance().
        addressToString( some_address );

I've picked a simple format for the symbolic information for an address - here is an example:

    0x00401158 fred+0x56 at 
testSimpleSymbolEngine.cpp(13)

The first field is the address, then the closest symbol found and the offset of the address from that symbol and finally, if available, the file name and line number for the address.

Using the stack trace is more difficult as you must provide a context for the thread you wish to stack trace.

The context structure is architecture-specific and can be obtained using the GetThreadContext() API when you are trying to debug another thread.

CONTEXT context = {CONTEXT_FULL};
::GetThreadContext( hOtherThread, &context );
SimpleSymbolEngine::instance().
    StackTrace ( &context, std::cout );

You have to be slightly more devious to trace the stack of the calling thread since the GetThreadContext() API will return the context at the point when the API was called, which will no longer be valid by the time the stack trace function is executed.

One approach is to start another thread to print the stack trace. Another approach, which is architecture-specific, is to use a small number of assembler instructions to set up the instruction pointer and stack addresses in the context registers. You have to be careful if you wish to provide this as a callable method to ensure the return address of the function is correctly obtained, for this article I simply use some assembler inline.

Here is a simple way (for Win32) to use the symbol engine to print the call stack at the current location:

CONTEXT context = {CONTEXT_FULL};
::GetThreadContext( 
    GetCurrentThread(), &context );
_asm call $+5
_asm pop eax
_asm mov context.Eip, eax
_asm mov eax, esp
_asm mov context.Esp, eax
_asm mov context.Ebp, ebp

SimpleSymbolEngine::instance().
    StackTrace( &context, std::cout );

In this case the tip of the call stack will be the pop eax instruction since this is the target of the call $+5 which I use to get the instruction pointer.

Implementation Details

The constructor initialises the symbol engine for the current process and the destructor cleans up.

SimpleSymbolEngine::SimpleSymbolEngine()
{
   hProcess = GetCurrentProcess();
   DWORD dwOpts = SymGetOptions();
   dwOpts |=
       SYMOPT_LOAD_LINES |
       SYMOPT_DEFERRED_LOADS;
   SymSetOptions ( dwOpts );

   ::SymInitialize( hProcess, 0, true );
   }
   SimpleSymbolEngine::~SimpleSymbolEngine()
   {
       ::SymCleanup( hProcess );
   }

I am setting the flag to defer loads which delays loading symbols until they are required. Typically symbols are only used from a small fraction of the DLLs loaded when the process executes.

The code to get symbolic information from an address uses two APIs: SymGetSymFromAddr and SymGetLineFromAddr . Between them these APIs get the nearest symbol and the closest available line number/source file information for the supplied address.

   std::string SimpleSymbolEngine::addressToString( void *address )
   {
       std::ostringstream oss;

       // First the raw address
       oss << "0x" << address;

       // Then any name for the symbol
       struct tagSymInfo
       {
           IMAGEHLP_SYMBOL symInfo;
           char nameBuffer[ 4 * 256 ];
       } SymInfo = { { sizeof( IMAGEHLP_SYMBOL ) } };

       IMAGEHLP_SYMBOL * pSym = &SymInfo.symInfo;
       pSym->MaxNameLength = sizeof( SymInfo ) - offsetof( tagSymInfo, symInfo.Name );

       DWORD dwDisplacement;
       if ( SymGetSymFromAddr( hProcess, (DWORD)address, &dwDisplacement,  pSym) )
       {
           oss << " " << pSym->Name;
           if ( dwDisplacement != 0 )
               oss << "+0x" << std::hex << dwDisplacement << std::dec;
       }
        
       // Finally any file/line number
       IMAGEHLP_LINE lineInfo = { sizeof( IMAGEHLP_LINE ) };
       if ( SymGetLineFromAddr( hProcess, (DWORD)address, &dwDisplacement, &lineInfo ) )
       {
           char const *pDelim = strrchr( lineInfo.FileName, '\\' );
           oss << " at " << ( pDelim ? pDelim + 1 : lineInfo.FileName ) << "(" << lineInfo.LineNumber << ")";
       }
       return oss.str();
   }

The main complication with the two APIs used is that both need the size of the data structures to be set up correctly before the call is made.

Failure to do this leads to rather inconsistent results. Particular care is needed for the IMAGEHLP_SYMBOL since the structure is variable size.

Note too that the documentation for DbgHelp refers to some newer APIs ( SymFromAddr , SymGetLineFromAddr64 ) which do the same thing as these two.

I have used the older calls here since they are available on a much wider range of versions of the DbgHelp API.

The stack walking code sets up the structure used to hold the current stack location and then uses the stack walking API to obtain each stack frame in turn.

All the source code for this article is available at:

http://www.howzatt.demon.co.uk/articles/SimpleSymbolEngine.zip

   void SimpleSymbolEngine::StackTrace( PCONTEXT pContext, std::ostream & os )
   {
       os << "  Frame       Code address\n";

       STACKFRAME stackFrame = {0};

       stackFrame.AddrPC.Offset = pContext->Eip;
       stackFrame.AddrPC.Mode = AddrModeFlat;

       stackFrame.AddrFrame.Offset = pContext->Ebp;
       stackFrame.AddrFrame.Mode = AddrModeFlat;

       stackFrame.AddrStack.Offset = pContext->Esp;
       stackFrame.AddrStack.Mode = AddrModeFlat;

       while ( ::StackWalk(
          IMAGE_FILE_MACHINE_I386,
          hProcess,
          GetCurrentThread(), // this value doesn't matter much if previous one is a real handle
          &stackFrame, 
          pContext,
          NULL,
          ::SymFunctionTableAccess,
          ::SymGetModuleBase,
          NULL ) )
       {
           os << "  0x" << (void*) stackFrame.AddrFrame.Offset << "  "
              << addressToString( (void*)stackFrame.AddrPC.Offset ) << "\n";
       }

       os.flush();
   }

The code provided here is specific to the x86 architecture - stack walking is available for the other Microsoft platforms but the code to get the stack frame structure set up is slightly different.

The context record is used to assist with providing a stack trace in certain 'corner cases'. Note that the stack walking API may modify this structure and so for a general solution you might take a copy of the supplied context record.

The stack walking API has a couple of problems. Firstly, it quite often fails to complete the stack walk for EXEs or DLLs compiled with full optimisation. The presence of the PDB files can enable the stack walker to continue successfully even in such cases, but this is not always successful. Secondly, the stack walker assumes the Intel stack frame layout used by Microsoft products and may not work with files compiled by tools from other vendors.

Conclusion

I hope that this article enables you to get better access to symbolic information when diagnosing problems in your code.

Various tools in the Windows programmer's arsenal use the DbgHelp DLL. Examples are: the debugger 'WinDbg' from the Microsoft Debugging Tools, the pre-installed tool 'Dr. Watson', and Process Explorer, from www.sysinternals.com .

If you build symbol files for your own binaries, tools like these can then provide you with additional information with no additional programming effort.

You can also provide symbolic names for runtime diagnostic information in a similar manner to these tools with a small amount of programming effort. I have shown here a basic implementation of a symbol engine class you can use to map addresses to names or provide a call stack for the current process.

I intended it to be easy to understand both what the code does and how it works. This example can be used as a basis for more complicated solutions, which could also address the following issues:

the code is currently not thread-safe since the DbgHelp APIs require synchronisation.
the code only handles the current process, it can be generalised to cope with other processes. Incidentally this provides a good example of why the singleton is sometimes described as an anti-pattern!
no use is made of the APIs giving access the local variables in each stack frame.

Happy debugging!

References

[Orr2004] 'Microsoft Visual C++ and Win32 Structured Exception Handling', Overload 64 , Oct 2004

[DbgHelp] http://www.microsoft.com/whdc/devtools/debugging/default.mspx

[Robbins] Debugging Applications for Microsoft .NET and Microsoft Windows , John Robbins, Microsoft Press