Although we try not to put bugs into software, sometimes we must track them down to remove them. Roger Orr considers the difficulties when they can only be seen in release builds.
Most programmers are familiar with debugging; although the amount of time spent debugging depends on the programmer as well as the environment and the problem domain. However, in a number of different segments of the I.T. industry, there is a dichotomy between 'Debug' and 'Release' builds. This is most often related to development in a compilable language rather than one which is interpreted.
The phrasing implies you debug using the 'Debug' build and then release software built with the 'Release' build. I personally don't like this split - my own preference is to have a single build - but in particular this nomenclature is misleading.
Experience shows that it's not this simple - not all the bugs are removed during development, some will be discovered using the release build. Unfortunately the phrasing (and some of the tool chains) make it harder than it needs to be to debug any problems found in the release version of the product.
I will re-examine the difference between the two builds and then provide some examples of things that can be done to make it easier to find and fix faults in the 'Release' build. The examples are for C/C++ but similar concerns exist in build environments for other languages.
What is the difference between a 'Debug' and a 'Release' build?
The idea behind the split builds is fairly sound for all but the most agile of development processes. There are two main target groups for software - the developers and the users - whose use of the software places different requirements on it. For example, during software development it is usually preferable to stop the program as soon as possible after a problem is detected to make the job of detecting - and removing - the cause of the fault as easy as possible. By contrast, most users of the program would prefer that some attempt is made to recover from the fault and to ensure no valuable data is lost.
A second difference is the level of access that should be granted to the two teams. The developers usually have full access to the original source code, and can be allowed access to the internals of the program at runtime. The users are probably not interested in the internal workings of the program and, for commercial programs, there may be strong reasons to restrict such access to try and retain intellectual property rights.
Hence many of the tool chains provide two (or sometimes more than two) targets with different characteristics. A 'Debug' build is designed for developers and typically:
- contains full symbolic information for the binary files
- has not been optimised
- provides additional tracing and debugging functionality
- often contains checks for memory use (stack, heap or both)
A 'Release' build is designed for users and typically:
- is smaller in size and built with optimisation
- is provided as an installable package
- may contain other artifacts, such as documentation and release notes
- may take longer to build
While agreeing that developers and users may have different requirements for the software, I consider that the phrase 'Debug build' is a poor choice.
As an example, I was recently helping to solve a problem which had been detected while running the release build of a product. The developer tried to reproduce the problem by running the debug build of the program under a debugger, but this did not fail. I suggested running the release build under a debugger (since it was the release build which demonstrated the fault) but the developer hadn't realised you could do this - they had assumed only a debug build could be debugged.
I prefer to use the descriptions 'Developer' and 'Retail' build to the more traditional 'Debug' and 'Release' build as, to my mind, this moves the spotlight onto the target audiences rather than focussing on the specific issue of debugging the program. I'll generally be using these phrases in the rest of the article.
There are several disadvantages with having two builds. Firstly, there is some duplication within the build process itself, and there is a danger that the two build streams will diverge. If you are fortunate the divergence will be caught by a compilation failure; if you are unlucky a necessary change will be made to the developer build only and the same change will not occur in the retail build. More importantly, you now have two different executables and they may not have the same bugs. The developer build usually has much more testing during product development so any problems specific to the retail build are typically only found late in the release cycle. To make this worse, these problems are build related, and so will not be found if debugging is attempted with the developer build. Some of the common causes of bugs that are visible only in a retail build are:
- Optimisation: either caused by compiler bugs, or exposing an existing bug hidden in the non-optimised build
- Use of assert or conditional code (eg trace or logging statements) with unforseen side-effects
- Memory set to fixed 'fill' values in a developer build and uninitialised in a retail build causing different behaviour
Over the years I have encountered many problems that were only present in the retail build; as well as some application bugs with different symptoms in developer and retail builds (for example, local variable layout re-ordering meaning pointer errors corrupted different variables). These bugs are often expensive to find and to fix because they do not occur in the standard development environment.
As I mentioned at the start of the article, my own preference where possible is to avoid having two separate builds and just have a single build. This simplifies the build process and also means the end user gets the same code that we've been testing during development. Where this is not achievable I try and bring the two builds as close to each other as possible, at least in terms of code generation. In practice nearly all tool chains provide ways to configure the system to select which characteristics are part of which build.
A 'release' build can be debugged
Even where a having two separate builds make good sense, there is often no technical reason why a retail ('release') build of the program cannot be run under a debugger. The main problems using a debugger with the usual default retail build configuration are:
- the order of execution may not match the source code (retail builds are usually optimised)
- there are no names for some (or all) functions (symbols are usually omitted in retail builds)
- variables may be missing or appear to have the wrong values (a combination of the above reasons)
These problems may also affect dependent components - for example in the Microsoft world there are different C runtime libraries for their 'Debug' and 'Release' builds and there is much more symbolic information in the debug library than the release one.
If you have the source for a dependent component you can simply change the retail build settings; in some cases you may be able to push back on the supplier of third party components to deliver builds containing more symbols or using different optimisation levels.
The first problem to deal with is the effect of optimisation. There are three potential problems with optimisation. Firstly, it makes the resultant code harder to debug; secondly optimising the code may introduce bugs and finally running an optimiser can make the whole build process take longer. How much optimising does your program really need - and where?
Much of the code in your program may not gain value from optimising and you may decide that the benefits of the developer and retail builds being more similar are worth a slight performance degradation or a slight gain in the size of the binary. Indeed, with most tool chains you can selectively enable optimising just for the parts of the program that benefit from it.
The actual decision that you make will depend on factors such as how important performance is to users of your application and how much time is spent finding and fixing problems in the retail build.
One specific optimisation which should be considered for disabling, especially on the Intel x86 architecture, is frame pointer optimisation. On stack based machines each function has a 'frame' of memory which contains the local variables, function arguments and the return address. On entry to the function the pointer to the previous stack frame is saved, and it will be restored when the function returns.
The stack frame pointers in un-optimised code normally form a chain through the stack, allowing tools to work out the call chain for the current function and identify the function arguments. This makes many debugging tasks easier as knowing 'how you got here' is often a key component to working out the root cause of a problem.
When code is optimised the stack frames can be set up in non-standard ways - the code in the function itself knows how to unwind the frame but a general purpose tool, such as a debugger, can't work back up the call stack. Both g++ (on many architectures) and Microsoft Visual C++ allow you to turn this optimisation off.
I have measured the impact of turning this optimisation off and, in my own experience, the impact has been minimal. As always with optimisation you need to measure the impact in your own specific cases.
- for MSVC use /Oy-
- for g++ use -fno-omit-frame-pointer [ WildingBehman ]
Microsoft themselves seem to consider the ease of debugging outweighs the performance improvement - starting with Windows XP service pack 2 the operating system itself has been compiled with frame pointer optimisation disabled. This makes it much easier for debuggers to work back up the stack from a problem detected in a system component to the application code that, for example, passed a bad parameter to a Windows API.
Debugging programs without symbolic information is hard as all you have are assembler mnemonics and memory addresses with no idea of their usage. As John Robbins puts it: ' If you're paid by the hour, spending forever at the assembly language level could do wonders for paying your mortgage. ' [ Robbins ]
Both g++ and Microsoft Visual Studio allow you to add symbolic debug information to retail builds. In both cases you don't need to ship all the resultant information to your customers if you wish to preserve your company's intellectual property. I strongly recommend that you check the retail builds of your software do provide as much symbolic information as possible.
Use -g debugging option(s) in combination with /O[n] optimising options. The resultant program will be optimised and contain debugging symbols.
The Unix model by default puts all the debugging information into the exectable program. This does not usually cause any execution time overhead since the data is not loaded from disk into memory unless a debugger is being used. It does mean that the executable may be larger; in some cases considerably larger, depending on how much debugging information was created.
On many platforms the debugging information can be extracted from the executable enabling use of the debug information only at the time when debugging is required. This also provides a way to restrict access to the debugging symbolic information - simply don't ship the debug information to your customers!
An example of splitting the debug information out:
- Link the executable with -ggdb -O2 producing, for example, prog .
- Run " objcopy --only-keep-debug prog prog.dbg" to create a file containing all the debugging info.
- Run " objcopy --strip-debug prog" to remove the debugging info from the executable.
- Run " objcopy --add-gnu-debuglink=prog.dbg prog" to link the debugging info with the executable.
Visual Studio 2005 now puts debug information into Release builds by default. For projects created with earlier versions of the tools you must
- Use /Zi at compile time (in the C/C++ 'General' tab under 'Debug Information Format')
- Use /DEBUG at link time (in the Linker 'Debugging' tab under 'Generate Debug Info')
- Additionally, to reduce the exectable size, set the linker /OPT:REF and /OPT:ICF options (in the 'Optimization' tab)
Under Visual Studio the debugging symbols are stored in the PDB file with a link record in the binary file. Note that the debuggers verify that the PDB file was created by exactly the same linker execution that produced the executable file. Some debuggers do allow you to use mismatched files, but when this is possible the symbols in the PDB file may not longer have any connection with the binary addresses in the exectable program, so make sure you always keep the two files together.
Just as in the g++ case, you have the option on whether or not you ship the symbolic information to your customers; simply miss out the PDB file. There are also ways to provide PDB files with less information for public consumption and retain the full symbol files for internal use. See the program pdbcopy.exe in Microsoft Debugging Tools for Windows [ MSDebug ] which allows you to strip private symbols from your PDB files. Microsoft use this technique themselves for the symbols they make publicly available.
Microsoft symbol servers
There are a couple of main problems with the debugging symbol files. Firstly, it can be quite hard to ensure that the right version of the file is always kept with the corresponding executable. Secondly the files are quite large - often larger than the executable binaries - but only required when someone is actually debugging the application.
Microsoft have addressed these problems in their recent debuggers with the result that you need never be without the right symbols at the right time. The secret weapon is the symbol server engine (symsrv.dll) which is shipped with the Visual Studio Debugger and Windbg. This engine is able to locate the right version of the symbol file for the executable being debugged, either from a subdirectory on the hard disk or a networked drive, or using http across the network (either Intranet or Internet).
Microsoft have been providing symbols for all their retail releases of Windows and other of their products for some time now, and setting up your machine to access this information can greatly improve debugging on the Windows platform.
The engine uses the environment variable _NT_SYMBOL_PATH . This environment variable can contain multiple paths (semicolon delimited) and any path can be marked for the symbol server engine by using the syntax SRV *cache location[*server] .
For example, setting the value to SRV*C:\Symbols*http://msdl.microsoft.com/download/symbols tells the debugger to look for symbol files in the cache directory of C:\Symbols and, if not found there, to look on the Microsoft Web site and download (and cache) any matching debug files.
The symbol server makes sure the right PDB is always used for the executable file by using subdirectories in the local cache and using information put in the binary by the linker to access the correct subdirectory for the executable.
So, for example, my cache directory contains several different versions of kernel32.pdb reflecting different versions of Windows and various hot fixes which have been applied (Figure 1).
Directory of U:\Symbols\kernel32.pdb 14/06/2007 20:03 <DIR> . 14/06/2007 20:03 <DIR> .. 02/07/2006 14:42 <DIR> 3E8016FF2 25/09/2006 22:37 <DIR> 44C5EB742 02/07/2006 14:42 <DIR> 75CFE96517E5450DA600C870E95399FF2 14/06/2007 20:03 <DIR> 7FD4C98964054C24B2C472948D829DF52 13/06/2007 01:15 <DIR> DAE455BF1E4B4E249CA44790CD7673182
Using the internal timestamp of the DLL automatically makes sure the right symbols are always used with no need for input from the programmer during debugging.
The downside with this approach is that the symbol server engine will look on the Microsoft website for all symbol files, even for third party DLLs. This can significantly slow down starting the debugger. My own technique is to do most debugging with _NT_SYMBOL_PATH containing the directory of the symbol cache but not the Microsoft website: _NT_SYMBOL_PATH=SRV*U:\Symbols . If I find symbols are missing for a Microsoft DLL or EXE then I attach a debugger with the full symbol path to force a download of the relevant symbols.
It can make a lot of sense to automatically add the symbols from your own application builds to a symbol store so that people debugging the program have access to the right symbols. This also enables easier debugging of mini-dumps from customers since the debugger can automatically find and load the right symbols for the actual versions of the program being run at the time of the crash. There are several ways to do this, with varying levels of complexity, and I refer anyone interested in this to the Microsoft Debugging Tools documentation for a fuller explanation than this article can provide.
The simplest way is to add your own builds into the symbol store used for the files downloaded from Microsoft. The symstore program can be used to add files to the symbol store. For example, to add all the binary and symbol files from version 1.2 of 'my product' (Figure 2).
C:symstore add /r /s C:\symbols /t MyProduct /v 1.2 /f C:\MyProduct\Build Finding ID... 0000000321 SYMSTORE: Number of files stored = 107 SYMSTORE: Number of errors = 0 SYMSTORE: Number of files ignored = 576
This step can be added to the automated build for retail versions of your product to ensure the binaries are collected. Depending upon disk space you might need to purge old versions (or pulled releases) from the symbol server, but compare the costs of disk space to programmer time before deleting any files.
The common paradigm of having 'Debug' and 'Release' builds has some utility, reflecting the different needs of developing or testing code and running it 'for real'. I prefer to name the two builds 'Developer' and 'Retail' builds to express their intent more clearly.
However, there are downsides to having two different builds and it is worth making an informed choice about whether the benefits outweigh the costs.
Should you choose to retain two builds, the retail build is likely to need some debugging and it is well worth spending some time up-front to make sure that this task will be as easy as possible. An important part of this is to ensure that the debugger has maximal access to any available symbolic information.
Thanks to the Overload review team for the various improvements they suggested for this article.