Introduction
A few weeks ago, I upgraded a program that I initially developed using Visual Studio 6, and wrote entirely in C++, to use the new CRT library that ships with Visual Studio 2013. Since the Security Development Lifecycle checks are enabled by default (even in a project that is an upgrade from Visual C++ 6), the first compiler log flagged all of its many calls to swprintf
[1] as potentially insecure, and recommended replacing them with calls to swprintf_s
[2].
Although the Visual C++ team thoughtfully provided convenience macros to ease the transition, they cannot be used unless the output buffer is a static array [3]. Unfortunately, all but two of the swprintf
calls in question use static buffers that are accessed through pointers. Nevertheless, those weren’t the problem, because the DLL that owns the buffers exports companion functions that return their sizes.
The problem arose with the two others, both of which use smaller buffers, also accessed through pointers. Alas, when I added the sizeOfBuffer
argument to them, I habitually specified the same size as the bigger buffers exported by the DLL. Suddenly, I had a very unexpected, and ugly, buffer overflow exception. What happened?
Background
Since the overwritten memory included the stack frame of the function that owned the buffer, the immediate cause was obvious, but why was a CRT routine creating a buffer overflow? The answer lay deep in the new code that differentiates swprintf_s
from its legacy processor, swprintf
. To eliminate distractions, I wrote a very simple test program that allocated a static buffer on the local stack, as did the program in which the problem arose, and called it from a loop that varies the value to use for the sizeOfBuffer
argument to swprintf_s
. Table 1 lists 4 cases, though only the last test matters to the issue at hand, though case 1 deserves a word.
Table 1 is a summary of the test cases implemented by the demonstration program.
Case
|
sizeOfBuffer
|
Relative to szBuffer
|
Test Outcome
|
1
|
128
|
Smaller
|
Since the test message contains more than 128 characters, causing _set_invalid_parameter_handler to be invoked [4].
|
2
|
255
|
Smaller
|
Since the test message fits comfortably, printing succeeds without causing a buffer overflow.
|
3
|
260
|
Same
|
The outcome is the same as for case 2.
|
4
|
384
|
Bigger
|
Output is formatted, and the subsequent call to _tprintf to display it on the console succeeds. The overflow isn’t caught and reported until the test routine attempts to return to the loop in the main routine.
|
Cautious single step debugging identified the culprit, deep in the CRT library. A new feature of swprintf_s
is macro _SECURECRT__FILL_STRING
, which hides a call to memset
. However, it isn’t in swprintf_s
, itself; you must drill down a level, to _vswprintf_s_l
, and follow that routine almost to its very end.
- The last statement in
_vswprintf_s_l
(Listing 2) that does anything significant is implemented as a macro, _SECURECRT__FILL_STRING(string, sizeInWords, retvalue + 1)
, shown in Listing 3, that expands into a call to CRT function memset. The prototype of memset is as follows.
void *memset(
void *dest,
int c,
size_t count
);
- In the context of
_vswprintf_s_l
, the argument values are as follows.
dest
= the byte just past the null character that terminates the output written starting at the address given by string
c
= the fill character, expressed as an integer, 0xfe
count
= the number of bytes to fill with character c, starting at address dest, given by the formula discussed next
- The third argument to
memset
, which specifies the number of bytes to write, is a ternary expression, of which the significant part is the false block that follows the colon: ((_Size) - (_Offset))) * sizeof(*(_String))
.
- Substituting the macro arguments into the expression yields the following expression, which becomes part of the C code that replaces the macro:
((sizeInWords) - (retvalue + 1)))) * sizeof(*(string))
, where:
sizeInWords
= sizeInWords
(buffer size) expressed in characters (TCHARs)
retvalue
= characters written by _vswprintf_helper
(the workhorse of the formatted printing routines), also expressed in characters (TCHAR
s), which eventually becomes the return value of swprintf_s
.
- Since
_UNICODE
is defined, sizeof(*(string))
corresponds to sizeof ( TCHAR )
, which is equal to 2.
The following example uses actual values from the test program, which should make it a lot easier to visualize what happens.
Table 2 lists actual values taken from notes made during a careful debugging session, which are the values used in the example that follows, in which the string value plays no role.
base
|
string
|
sizeInWords
|
retvalue
|
sizeof *(string)
|
Decimal
|
4,454,748
|
384
|
148
|
2
|
Hexadecimal
|
0x0043f95c
|
0x00000180
|
0x00000094
|
0x00000002
|
The following example uses the decimal values shown in Table 2.
Macro expression
|
((_Size) - (_Offset))) * sizeof(*(_String))
|
Expanded expression
|
((sizeInWords) - (retvalue + 1)))) * sizeof(*(string))
|
Substituting values
|
((384) - (148 + 1)))) * 2)
|
Evaluation, step 1
|
(384 – 149) * 2
|
Evaluation, step 2
|
235 * 2
|
Result
|
470
|
Contrast this result with the actual number of slack bytes in the buffer.
Macro expression
|
((_Size) - (_Offset))) * sizeof(*(_String))
|
Expanded expression
|
((sizeInWords) - (retvalue + 1)))) * sizeof(*(string))
|
Substituting values
|
((260) - (148 + 1)))) * 2)
|
Evaluation, step 1
|
(260 – 149) * 2
|
Evaluation, step 2
|
111 * 2
|
Result
|
222
|
To summarize, the size of the buffer overrun is 248 bytes, more than enough to trample the stack frame that sits above it.
Slack space computed based on invalid size argument of 384................... 470
Slack space computed based on actual buffer size of 260............................ 222
Amount of overrun....................................................................................... 248
Proof: Invalid buffer size............................................................................................ 384
Correct buffer size............................................................................................ 260
Excess TCHARs..................................................................................... 124
Size of TCHAR......................................................................................................... 2
Overrun, in bytes............................................................................. 248
The code shown in Listing 1 through Listing 3 is copied verbatim from the CRT library source files that ship with Microsoft Visual Studio 2013. Their default installation directory is C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\crt\src
. Function _vswprintf_s_l
and the other formatted printing routines call upon one function, _vswprintf_helper
, to process the format control string and optional arguments. Since that routine is long, complex, and has no bearing on the buffer overflow, I omitted it from these listings. If you are curious, it is also in the CRT source files, in vswprint.c
.
To keep the source listings close to these examples, the narrative resumes below Listing 3.
int __cdecl swprintf_s (
wchar_t *string,
size_t sizeInWords,
const wchar_t *format,
...
)
{
va_list arglist;
va_start(arglist, format);
return _vswprintf_s_l(string, sizeInWords, format, NULL, arglist);
}
Listing 1 is all of function swprintf_s, which creates a private reference to the optional arguments that follow the format control string, then returns through _vswprintf_s_l.
int __cdecl _vswprintf_s_l (
wchar_t *string,
size_t sizeInWords,
const wchar_t *format,
_locale_t plocinfo,
va_list ap
)
{
int retvalue = -1;
_VALIDATE_RETURN(format != NULL, EINVAL, -1);
_VALIDATE_RETURN(string != NULL && sizeInWords > 0, EINVAL, -1);
retvalue = _vswprintf_helper(_woutput_s_l, string, sizeInWords, format, plocinfo, ap);
if (retvalue < 0)
{
string[0] = 0;
_SECURECRT__FILL_STRING(string, sizeInWords, 1);
}
if (retvalue == -2)
{
_VALIDATE_RETURN(("Buffer too small", 0), ERANGE, -1);
}
if (retvalue >= 0)
{
_SECURECRT__FILL_STRING(string, sizeInWords, retvalue + 1);
}
return retvalue;
}
Listing 2 is every line of function _vswprintf_s_l
, which is also deceptively simple, but includes macro _SECURECRT__FILL_STRING
, which is the source of the overrun. If it succeeds, _vswprintf_helper
returns the number of characters actually written into the output buffer, excluding the trailing null character, for which macro _SECURECRT__FILL_STRING
compensates by adding 1 to it. The first and second arguments, string
and sizeInWords
, are passed along unchanged from swprintf_s
.
#if !defined (_SECURECRT_FILL_BUFFER_THRESHOLD)
#ifdef _DEBUG
#define _SECURECRT_FILL_BUFFER_THRESHOLD __crtDebugFillThreshold
#else /* _DEBUG */
#define _SECURECRT_FILL_BUFFER_THRESHOLD ((size_t)0)
#endif /* _DEBUG */
#endif /* !defined (_SECURECRT_FILL_BUFFER_THRESHOLD) */
#if _SECURECRT_FILL_BUFFER
#define _SECURECRT__FILL_STRING(_String, _Size, _Offset) \
if ((_Size) != ((size_t)-1) && (_Size) != INT_MAX && \
((size_t)(_Offset)) < (_Size)) \
{ \
memset((_String) + (_Offset), \
_SECURECRT_FILL_BUFFER_PATTERN, \
(_SECURECRT_FILL_BUFFER_THRESHOLD < ((size_t)((_Size) - (_Offset))) ? \
_SECURECRT_FILL_BUFFER_THRESHOLD : \
((_Size) - (_Offset))) * sizeof(*(_String))); \
}
#else /* _SECURECRT_FILL_BUFFER */
#define _SECURECRT__FILL_STRING(_String, _Size, _Offset)
#endif /* _SECURECRT_FILL_BUFFER */
#if _SECURECRT_FILL_BUFFER
#define _SECURECRT__FILL_BYTE(_Position) \
if (_SECURECRT_FILL_BUFFER_THRESHOLD > 0) \
{ \
(_Position) = _SECURECRT_FILL_BUFFER_PATTERN; \
}
#else /* _SECURECRT_FILL_BUFFER */
#define _SECURECRT__FILL_BYTE(_Position)
#endif /* _SECURECRT_FILL_BUFFER */
Listing 3 is macro _SECURECRT_FILL_BUFFER
and its dependents, which are defined in internal.h
. (Look for internal.h in the directory where your CRT source code is installed. Mine is in C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\crt\src\.
)
In most cases, it is enough to know that arguments passed into functions and automatic variables are allocated somewhere on the stack, and their machine addresses are not especially important. In this case, it matters a great deal, and the order in which variables are are defined has significant consequences, and exposes the risk of using automatic variables as buffers.
For the benefit of readers who aren’t thoroughly familiar with how the C and C++ compilers assigns memory to arguments and variables, the following section offers a brief illustrated tutorial.
Please feel free to skip the next section (Kaboom!) if you know this cold.
Memory for Arguments and Automatic Variables
Management of function arguments (parameters) and automatic variables in Windows programs revolves largely around two CPU registers, while four others play minor parts, summarized in Table 3.
Table 3 summarizes the roles played by CPU registers in managing argument lists and automatic variables.
Abbr.
|
Full Name
|
Role
|
EBP
|
Extended Base Pointer
|
Within each function, the addresses of its arguments and local (automatic) variables are usually encoded as offsets relative to the address stored in this register, which lies somewhere within the address space reserved for the stack.
|
ESP
|
Extended Stack Pointer
|
The stack pointer is primarily used to get function arguments and return addresses on and off the stack. A secondary, but related role of the stack pointer is to mark off the boundaries of a function’s working storage, from which its automatic variables are allocated.
|
ESI
|
Extended Source Index
|
In the function body of a debug build, the ESP register is saved for sanity checking the stack pointer when control returns from a function that follows the __cdecl calling convention.
|
EDI
|
Extended Destination Index
|
This register plays two roles.
¨ In the function prologue of a debug build, the ESI register is used, along with the ECX register (discussed next) to initialize the memory that is being set aside for its automatic variables.
¨ It fulfills the roles usually fulfilled by the ESI register when a function call is nested inside another function call, since ESI is tied up tracking the stack frame for the outer function.
|
ECX
|
Extended Counter
|
- In the prologue of a debug build, the ECX register is loaded with the number of machine words that are being reserved for the function’s automatic variables, which tells the
rep stosd instruction that initializes it when to stop.
- When an instance method is called, a pointer to the object (the ubiquitous this variable) goes into the ECX register immediately before the method is invoked via the call instruction. Since the prologue of a debug build needs ECX, it is the last item pushed onto the stack before the memory initialization code is set up and run. Following the initialization, ECX is popped off the stack, and then a copy is saved into the very top of the function’s automatic storage area.
|
EIP
|
Extended Instruction Pointer
|
Throughout its run, the EIP register points to the instruction that is about to execute. Since the calling routine expects to resume where it left off when the function returns, before it transfers control to the first instruction of the called function, the call instruction pushes the address of the instruction that immediately follows the call onto the stack.
The call, itself, is executed by jumping to the address of the first instruction in the called routine. Like any other jump, this changes the EIP to refer to the instruction at that location.
|
EAX
|
Extended Accumulator
|
Regardless of which of several common calling conventions a function follows, almost all functions place their return value into the EAX register. Both of the most common conventions, __cdecl and __stdcall , return through EAX.
|
The stack is just a big block of memory, allocated to the process by the loader when it starts, and mapped to an address well above the program’s code. By default, most applications get a one megabyte stack, which sounds like a lot, until you realize how many things go into it. When a process starts, its stack pointer (ESP) points to the highest address in the space reserved for the stack.
Since the stack pointer points to its top address, its value decreases as things are added to the stack (by pushing them onto it) and increases when items are removed from it (by popping them off).
- When it is first allocated to a process, the stack pointer (ESP) and base pointer (EBP) point to the same location, but this state of affairs is short lived. The first thing that any routine does is push the current value of EBP onto the stack, then set EBP to the new value of ESP, which is 4 bytes less than it was before the push executed.
- The EBP register doesn’t change again until another routine is called, when the process just described is repeated. This process is repeated each time your code dives deeper into its lower level routines, makes a system call, or invokes a runtime library routine, such as
printf
. Since many library routines use helper routines, calls can go much deeper in a hurry.
- There are two circumstances that cause the stack pointer (ESP) to change during the lifetime of a function.
- The prologue decreases its value by the number of bytes that it needs to reserve for automatic variables. The effect of this adjustment is that subsequent additions to the stack happen beneath the local storage used by the function, preventing subsequent uses of the stack from overwriting its data.
- When one function calls another, the arguments, if any, are pushed onto the stack, working from right to left as they appear in its prototype, so that the first argument goes on last. For example, when you call
printf
with a format control string and a series of variables to substitute into it, the format string goes onto the stack last. As described previously, once the last argument is on the stack, the call instruction pushes the address of the instruction that immediately follows it, and hands control off to the called routine.
- As each function completes its work and prepares to return, the processes that happened during its prologue are reversed. Items are popped off the stack in reverse order, and the decrement that moved the stack pointer below the caller’s reserved code is reversed by adding the same amount to the stack pointer. Finally, the function copies its own base pointer into the stack pointer, which then points to the location where the caller’s base pointer was pushed. It is then popped off, and the function returns. If the calling convention is
__stdcall
, the return instruction has one modifier, which tells it how many bytes to add to the stack pointer to account for the function’s arguments. Otherwise, the return simply pops off the return address, which becomes the target of a reverse jump.
Important: Although the arguments and return address are not actually removed from the stack, increasing the stack pointer as each function returns to its caller conserves the finite space reserved for the stack, which is reused for subsequent function calls.
Memory for Automatic Variables
The preceding section alluded to a block of memory set aside by the function prologue for use by its local (automatic) variables. The last concept to be grasped in order to understand why the buffer overflow happened is how memory from this block is assigned to variables.
Since it uses memory allocated from the stack, it is not very surprising to learn that variable assignments begin at the top, and work down, so that the address of each new variable is lower than that of the one defined above it. Significantly, address assignments are made as soon as a variable is defined, even if initialization is deferred, as it is in the case of the second variable, szBuffer
, shown in Listing 4. This is necessary because the compiler must avoid assigning another variable to the same address, or there would be serious and frequent chaos. Table 4 shows how this plays out for the local variables defined at the top of the demonstration routine in the included sample application, Exercise_stprintf_s
. Especially noteworthy is that the address of szBuffer
is 528 bytes below that of rintResult
. The reason for the large gap is that it needs 520 bytes, enough room for MAX_PATH
wide characters, plus the 8 byte buffer left by the compiler between variables.
int rintResult = SPH_TEST_SUCCEEDED;
TCHAR szBuffer [ SPH_BUFFER_ACTUAL_SIZE ] ;
INT32 intNumericVariable1of2 = SPH_NUMERIC_VARIABLE_VALUE_1 ;
INT32 intNumericVariable2of2 = SPH_NUMERIC_VARIABLE_VALUE_2 ;
_invalid_parameter_handler oldHandler , newHandler ;
oldHandler = _set_invalid_parameter_handler ( newHandler ) ;
Listing 4 is the local variables that are defined at the top of function Exercise_stprintf_s, giving them function scope.
Table 4 lists the machine addresses of the local (automatic) variables defined by the test function, Exercise_stprintf_s, and reported by it. Note that the value of the stack pointer is lower than the base pointer by 864, which is 28 bytes more than the space reserved by its prologue. The extra space is occupied by hidden data structures used to manage its exception handlers.
Machine Addresses
|
Contents
|
Label
|
Hexadecimal
|
Decimal
|
Hexadecimal
|
Decimal
|
Base Pointer (EBP)
|
0x0031FAF0
|
3,275,504
|
|
|
Stack Pointer (ESP)
|
0x0031f78c
|
3,274,636
|
|
|
rintResult
|
0x0031fad4
|
3,275,476
|
0x00000000
|
0
|
szBuffer
|
0x0031f8c4
|
3,274,948
|
|
|
intNumericVariable1of2
|
0x0031f8b8
|
3,274,936
|
0x00001000
|
4,096
|
IntNumericVariable2of2
|
0x0031f8ac
|
3,274,924
|
0x0000ffff
|
65,535
|
NewHandler
|
0x0031f894
|
3,274,900
|
0x010010B9
|
16,781,497
|
oldHandler
|
0x0031f8a0
|
3,274,912
|
0x00000000
|
0
|
Taking into account where the output buffer intended for use by swprintf
is allocated, the puzzle almost solves itself. When the code generated by macro _SECURECRT__FILL_STRING
(Listing 3) calls on memset
to backfill the buffer, it uses the capacity of the buffer that it is told, through the new SizeInWords
argument to derive the number of bytes to use for the count
argument of memset
. Like any good program, memset
obeys its master, filling the specified number of bytes from the end of the text written into it by swprintf
. What happens next is made painfully clear by the last column in Table 5. Since the backfill value is 0xfe
(decimal 254), the outcome is enough to cause the machine to contain the damage by forcibly killing the application. The specific message is, “Run-Time Check Failure #2 - Stack around the variable 'rintResult' was corrupted.”
Table 5 summarizes the relationship between the asserted buffer size and the stack frame that sits above the buffer, and holds the argument list, and points the way back to the caller.
Test Case
|
1
|
2
|
3
|
4
|
Address of szBuffer
|
3,274,948
|
3,274,948
|
3,274,948
|
3,274,948
|
Asserted buffer size (TCHAR s)
|
128
|
255
|
260
|
384
|
Size of 1 TCHAR
|
2
|
2
|
2
|
2
|
Asserted buffer size (bytes)
|
256
|
510
|
520
|
768
|
Actual Buffer Size
|
520
|
520
|
520
|
520
|
Underrun or Overrun
|
-264
|
-10
|
0
|
248
|
Headroom
|
36
|
36
|
36
|
36
|
Overlap Beyond Headroom
|
|
|
|
212
|
Thankfully, the buffer overflow is easy to spot in the Visual Studio debugger, though you will need to display the Memory window, and enter the address of the buffer to see it. In case you’ve missed it, the memory window is accessible during any debugging session by pressing ALT-6
(that is, the ALT key and the numeral 6 key on the top row of your keyboard).
Figure 1 shows the overrun buffer as it appears in the Visual Studio 2013 debugger. The legitimate text is in the top portion of the buffer, followed by the backfill. For the eagle eyes among you, the machine address shown here differs from that shown in the other examples, because my development machine has EMET installed and configured to enforce mandatory Address Space Layout Randomization (ASLR).
When the code executes from the desktop or a command prompt, the error report comes in the form of the large, rather ugly message box shown in Figure 2. Since the message box is displayed with its Application Modal flag disabled, you can get an unobstructed view of the output window shown in Figure 3. This is very handy, since the default action is Abort, which promptly terminates the program, causing its output to disappear if it launched from a desktop.
Figure 2 is the message box that reports the fatal error when you run the debug build from a command prompt.
Figure 3 is the command window, which can be activated because the message box is displayed with its application modal flag switched off. The bogus test number is further evidence of the buffer overrun, since the message should read, “Test # 4 Done.”
Important: The release build of swprintf
doesn’t backfill, because the release version of the _SECURECRT__FILL_STRING
macro is null (That is, it generates no code.).
There are two reasons that I was relieved to discover that a retail build doesn’t backfill buffers.
- Setting the buffer size too high doesn’t cause an overrun.
- Backfilling wastes processor cycles and time.
As with most such engineering matters, this, too, is a compromise.
¨ Although no backfilling occurs, if a print operation uses more space than is actually reserved for the buffer, you can still get a buffer overrun. If you’re lucky, the overrun will cause a spectacular crash.
¨ On the plus side, even the retail build of the new overloads of swprintf
and its cousins fail, reporting a trappable error, if the specified size indicates that the buffer is too small to accommodate the formatted output. There are two ways to detect this error.
- The value returned by
swprintf
is -2, which can be evaluated without creating a scratch variable by wrapping the function call in a switch or if statement.
- When the buffer is too small, swprintf invokes the _invalid_parameter_handler routine. The CRT library provides a default invalid parameter handler that raises an assert in a debug build, and fails silently in a release build. However, a program, or one of its functions, can install its own handler. I did so, and its output, when it executes in a retail build, is at the bottom of Listing 5. The output generated in a debug build, shown in Listing 6, is a bit more useful.
Begin Test # 1: Asserted buffer size = 0x00000080 (128 decimal):
Buffer Address = 0x003efc50 (4127824 decimal)
Actual Size (bytes) = 0x00000208 (520 decimal)
Actual Size (TCHARs) = 0x00000104 (260 decimal)
Actual Top = 0x003efe58 (4128344 decimal)
Numeric Variable 1 of 2: Address = 0x003efc4c (4127820 decimal)
Value = 0x00001000 (4096 decimal)
Numeric Variable 2 of 2: Address = 0x003efc48 (4127816 decimal)
Value = 0x0000ffff (65535 decimal)
Base Pointer (EBP) = 0x003efe5c (4128348 decimal)
Stack Pointer (ESP) = 0x003efc34 (4127796 decimal)
ERROR: Invalid parameter detected in function (null).
File: (null)
Line: 0
Expression: (null)
ERROR: Nothing printed!
Test # 1 Done
Total characters printed by last output statement = -1
Outcome of test # 1 = Success
Listing 5 is the output of the first of the four tests that uses swprintf, which fails because the specified buffer size of 128 characters is too small, by 21 characters, to accommodate the formatted output and its terminal null character, which requires a buffer size of at least 149.
ERROR: Invalid parameter detected in function _vswprintf_s_l.
File: f:\dd\vctools\crt\crtw32\stdio\vswprint.c
Line: 280
Expression: ("Buffer too small", 0)
Listing 6 is the output generated by my invalid parameter handler when it runs in a debug build. The output of the debug version is considerably more useful, although it still leaves a lot to be desired. Nevertheless, compared to the output generated by the same routine when it runs in a release build, shown in Listing 5, its first line gives you a place to start.
Using the Code
The demonstration project is the program that generated all of the output shown in the foregoing tables and listings. Since only the debug build exhibits the buffer overrun, its output directory (the \Debug
directory off the main project directory) deserves the most attention. Nevertheless, I left the retail build, so that you can quickly see for yourself that it completes without incident.
The first time you open the project in Visual Studio, the size of the solution directory will balloon when the IntelliSense data base file, SecurePrintFHazard.sdf,
is regenerated. To reduce the overall size of the package, I deleted it from the distribution package, because Visual Studio recreates it from scratch when it is missing.
Unlike many of my projects, including the demonstrations for the last two articles I wrote about C++ applications, this solution is completely self-contained. However, there are a few things that I must call to your attention.
- Mixed Languages: The modules that comprise this project target three distinct programming languages, each with its own compiler.
- The two main modules,
SecurePrintFHazard.cpp
and Exercise_stprintf_s.CPP
, are implemented in C++.
- One of two helper routines,
ProgramIDFromArgV
, defined in module ProgramIDFromArgV.C
, is imported from another project, and is implemented in straight ANSI C.
- The other helper routine,
CPURegisterPeek
, defined in module CPURegisterPeek.ASM
, is written in assembly language, and must be assembled by a downlevel assembler, MASM 6.11. The need for the dwonlevel assembler is explained in item 4 below.
- No precompiled headers: Due to the unavoidable overlap in header usage between the C and C++ modules, precompiled headers are impractical. Since this project contains only 3 modules that target the C/C++ compiler, the whole project builds from scratch in only a few seconds, and they aren’t missed.
- No stdafx.h: Since I dispensed with precompiled headers, I renamed
stdafx.h
to SecurePrintFHazard.h.
This is something that I almost always do, to remind myself that precompiled headers are disabled. Concurrently, I delete stdafx.cpp
, which generates a fatal compiler error and fails the project build if you use the file explorer to delete it, and forget to remove it from the Solution Explorer before your next attempt to build the solution.
- CPURegisterPeek.ASM is incompatible with MASM 12.0.31101.0: I used the copy of Microsoft ® Macro Assembler Version 6.11 that I have installed on an older machine to assemble it. The newer assembler emitted error A2071: initializer magnitude too large for specified size, calling out the
endp
directive at the bottom of the source file although I suspect the real issue is the expression at line 167, _ARG_UPPER_LIMIT equ ( __REG_INDEX_END - _REG_INDEX ) / _SIZEOF_DWORD
. However, since the older assembler assembled a provably correct version, I didn’t investigate further. Today, I removed CPURegisterPeek.ASM
from the solution, so that you and I don’t have to deal with it when the build engine decides that it needs to rebuild from scratch. This is more common than you might guess; it happens when anything in the project configuration changes.
Points of Interest
The main routine has the generic TCHAR
mapped name _tmain
and the standard two-argument signature, and is defined in module SecurePrintFHazard.cpp
. Following four static arrays that require no further explanation is the first executable block, which is well protected by guard code that ensures that it is excluded from the compilation unless both preprocessor symbols _DEBUG
and _PROGRAMIDFROMARGV_DBG
are defined. I could have condensed the guard code into a single line on each side, but I didn’t, since the outer test was an afterthought.
#if defined ( _DEBUG )
#if defined ( _PROGRAMIDFROMARGV_DBG )
#pragma message ( "Preprocessor symbol _PROGRAMIDFROMARGV_DBG is defined.")
DebugBreak ( );
#else
#pragma message ( "Preprocessor symbol _PROGRAMIDFROMARGV_DBG is UNdefined.")
#endif /* #if defined ( _PROGRAMIDFROMARGV_DBG ) */
#endif /* #if defined ( _DEBUG ) */
Listing 7 is the guard code around the call to Windows API routine DebugBreak
, which I used to help me coerce a debug build of the program started from a command prompt into the Visual Studio debugger.
Next comes the first call into an application defined function, ProgramIDFromArgV
, which extracts the base name of the program, SecurePrintFHazard
, from the name by which it was invoked, which it receives in the form of a pointer to the first element in the argv array. Since it is unrelated to the subject at hand, I leave its analysis as an exercise for insatiably curious readers.
The main body of the routine is the switch
statement nestled inside the two nested for
loops shown in Listing 8. The outermost of the two for
statements defines and uses unsigned integer uintOutputMethod
to iterate the elements of two-element array s_enmOutputMethod
. This array is populated with one each of the nonzero members of the SPH_OUTPUT_METHOD
enumeration.
The innermost of the two for
statements defines and uses uintTestIndex
, another unsigned integer, to iterate the s_auintAssertedSizes
array, a collection of unsigned integers representing buffer sizes to be passed, in turn, into wsprintf
on the second iteration of the outer loop.
The main thing that happens within the innermost loop is a call to the other major application defined function, Exercise_stprintf_s
, which is the first, and principal, routine defined in Exercise_stprintf_s.CPP
. The only task remaining for the inner loop is to use the value returned by Exercise_stprintf_s
to determine which of three messages to display about the outcome. Other than calling to your attention that the calls to wprintf
are made through its generic text mapping, the print statements are unremarkable.
Since I do my best to avoid calculating anything more than once, uintTestNumber
is defined and used to store the ordinal test number (used twice), which starts at one, even though deriving it from the index of the inner loop, an array index that starts at zero, is trivial. All that remains to be said about the main routine is that, although calculation of the limit values of the two loops is nominally data driven, they appear as constants in the emitted code, because the calculation depends entirely on values that are known at compile time, and the compiler performs them and writes the answers into the generated code as constants.
In general, this is true of any value that is expressed as either sizeof
a variable or type, or an expression composed entirely of such expressions and basic arithmetic operators. This concept plays a key role in in Exercise_stprintf_s
, too, as well as a great many of the macros that I usually employ.
for ( unsigned int uintOutputMethod = 0 ;
uintOutputMethod < sizeof ( s_enmOutputMethod ) / sizeof ( SPH_OUTPUT_METHOD ) ;
uintOutputMethod++ )
{
_tprintf ( TEXT ( "\nTest group %d: %s\n\n" ) ,
( uintOutputMethod + 1 ) ,
s_szOutputMethodMsg [ s_enmOutputMethod [ uintOutputMethod ] ] ) ;
for ( unsigned int uintTestIndex = 0;
uintTestIndex < sizeof ( s_auintAssertedSizes ) / sizeof ( unsigned int );
uintTestIndex++ )
{
unsigned int uintTestNumber = uintTestIndex + 1;
switch ( int intResult = Exercise_stprintf_s ( uintTestNumber ,
s_auintAssertedSizes [ uintTestIndex ] ,
s_enmOutputMethod [ uintOutputMethod ] ) )
{
case SPH_TEST_SUCCEEDED:
case SPH_TEST_FAILED:
_tprintf ( TEXT ( " Outcome of test # %d = %s\n\n" ) ,
uintTestNumber ,
s_szResultMsg [ intResult ] );
break;
case SPH_TEST_REPORTING_ERROR:
_tprintf ( TEXT ( " Test # %d reported that a call to function _tprintf produced nothing.\n\n" ) ,
uintTestNumber );
break;
default:
_tprintf ( TEXT ( " Test # %d reported an unexpected result code of 0x%08x (%d decimal)\n\n" ) ,
uintTestNumber ,
intResult ,
intResult );
}
}
}
Listing 8 is the core of the main routine, consisting of a switch
statement that evaluates the value returned by application defined function Exercise_stprintf_s
, which runs in the innermost of two for
loops that index its two key arguments, which come from a pair of arrays iterated by the two loops.
Function Exercise_stprintf_s
, the heart of the test program, was almost completely dissected above, in the section titled “Anatomy of a Buffer Overflow.” This routine takes three arguments, all effectively unsigned integers, as shown in Listing 9.
- The first argument,
puintTestNumber
, goes into a couple of messages, and is otherwise ignored.
- The second argument,
puintAssertedSize
, is ignored unless the third argument, penmOutputMethod
, is SPH_INDIRECT
(2), which is true on the second iteration of the outermost loop in the main routine.
- You have probably already guessed that
penmOutputMethod
determines whether the result of the simple math problem represented by the first statement within the try
block is printed directly, via wprintf
, or indirectly, by calling swprintf
, then sending the buffer to wprintf
.
int __stdcall Exercise_stprintf_s
(
const unsigned int puintTestNumber ,
const unsigned int puintAsserteSize ,
const SPH_OUTPUT_METHOD penmOutputMethod
) ;
Listing 9 is the prototype of function Exercise_stprintf_s
, the real workhorse of this program.
Though it is by far the biggest function in the entire project, Exercise_stprintf_s
is straightforward.
- Four variables are declared with function scope, three of which are scalars (two
INT32
and one int
), all three of which are initialized by the declaration.
- Next, two
_invalid_parameter_handler
function pointers are defined, the first of which is set aside to hold a pointer to the default invalid parameter handler, while the second is initialized with the address of a custom handler, SPH_InvalidParameterHandler
, which is declared in SecurePrintFHazard.h
, and defined near the end of Exercise_stprintf_s.CPP
. In retrospect, I would have better off to write _CrtSetReportMode ( _CRT_ASSERT , CRTDBG_MODE_FILE )
, followed by _CrtSetReportFile(_CRT_ERROR, _CRTDBG_FILE_STDERR)
, to divert the standard assertion message to the console window.
- Next come five fairly unremarkable calls to
wprintf
(through generic TCHAR
mapping macro _tprintf
). The accompanying format string contains two format items, 0x%08x
, followed shortly by %d
. The first format item causes the argument that replaces it to be represented as a hexadecimal string, while the second formats the same item as an unformatted decimal integer.
- Although the foregoing technique works well for pointers to strings, displaying the address and value of an integer takes a bit more work. This is the domain of
SPH_ShowAddressAndValueOfInt32
, which takes the address of the integer (e. g., &rintResult
for the first call) and pointers to two strings (e. g., ( LPCTSTR ) &m_szRetCdeAddrTpl1
and ( LPCTSTR ) &m_szScalarValueTpl)
.
SPH_ShowAddressAndValueOfInt32
first calls SPH_ShowAddressOfScalar
with the address of the variable and the first of the two format strings.
SPH_ShowAddressOfScalar
wraps a simple call to wprintf
(through the _tprintf
macro, as above), returning the number of characters that it wrote. This function could be folded into SPH_ShowAddressAndValueOfInt32
, or replaced with a macro. I did neither, because it was easier to thoroughly test and document it as a separate routine, and because the same routine or the macro that supersedes it, can be applied to displaying the address of any scalar. Anticipating that, I cast its plpScalar
argument to const void *
instead of const INT32 *
.
- The second call to
wprintf
uses the second format string, dereferencing pintValue
along the way to wprintf
(hence, the asterisk preceding it in the argument list), so that the print statement renders its value, and explaining why pintValue
is cast to const INT32 *
, instead of const void *
.
- Next is a pair of print statements that display the current values of the EBP and ESP registers. While the print statements are more of the same,
CPURegisterPeek
, the function that reads the CPU registers, deserves a short explanation. The original version of this routine used the two short bits of straightforward inline assembly shown in Listing 12. Its successor, CPURegisterPeek
, can report the current contents of any of the general purpose registers except EFL, the flags register. Due to its complexity, and that it is 100% assembly language, CPURegisterPeek
is here treated as a black box and out of scope. I have tested it sufficiently to cover the use cases applicable to this program, and the source code is in the main solution directory. I may dissect it in a future article.
- Finally, a simple multiplication problem is solved to generate enough material for a small, but nontrivial report, followed by one or two function calls to print it. The first four cases print the report directly, via
wprintf
, which succeeds for all four cases. The second four cases repeat the same calculation, and invoke swprintf
to write the report into a buffer, which is expected to fail for the first and fourth buffer sizes. The first case is expected to write nothing, reporting that the buffer is too small, while the fourth case is the overflow that motivated me to create this program.
Since I wasn’t completely sure how the test routine would behave, I put the math problem and the routines that print the report inside a try
block, followed by an ellipsis catch
block. I discovered that neither C++ try/catch
blocks, nor C style Structured Exception Handling play any role because changes in the new CRT library force any program in which a buffer overflow is detected to terminate. However, since its presence was harmless, I left the try/catch block.
_invalid_parameter_handler oldHandler , newHandler ;
newHandler = SPH_InvalidParameterHandler ;
oldHandler = _set_invalid_parameter_handler ( newHandler ) ;
_CrtSetReportMode ( _CRT_ASSERT , 0 ) ;
Listing 10 is the section of Exercise_stprintf_s
that registers a custom invalid parameter handler and disables the assertion message box, which it replaces.
int __stdcall SPH_ShowAddressAndValueOfInt32
(
const INT32 * pintValue ,
LPCTSTR plpAddressFormat ,
LPCTSTR plpValueFormat
)
{
if ( int rintRC = SPH_ShowAddressOfScalar ( pintValue , plpAddressFormat ) )
{
rintRC = _tprintf ( plpValueFormat ,
*pintValue ,
*pintValue ) ;
return rintRC ;
}
else
{
return SPH_ERROR_NOTHING_PRINTED ;
}
}
int __stdcall SPH_ShowAddressOfScalar
(
const void * plpScalar ,
LPCTSTR plpFormat
)
{
return _tprintf ( plpFormat ,
plpScalar ,
plpScalar ) ;
}
Listing 11 is function SPH_ShowAddressAndValueOfInt32
, followed by its dependent function, SPH_ShowAddressOfScalar
.
{
VOID * szESPAddress = NULL
__asm
{
lea eax , [ EBP ]
mov dword ptr [ szEBPAddress ] , eax
}
Listing 12 is the inline assembly code that CPURegisterPeek replaced.
To close this section, I call to your attention that all function prototypes and macros are in SecurePrintFHazard.h
. This practice affords maximum flexibility, because having the prototype in the master header means that the function can be defined in any source file. Unless the prototype needs macros or typedefs
defined in it, the file in which it is defined can omit the master header. SecurePrintFHazard.h
is omitted from ProgramIDFromArgV.C
, which compiles and links just fine. Declaring ProgramIDFromArgV
in SecurePrintFHazard.h
and including ProgramIDFromArgV.C
in the project is enough to get it compiled and linked.
Lessons Learned
I learned several things from this exercise, some of which were more like blunt force reminders.
- The new “secure” functions present a two edged sword.
- They are no panacea, because they introduce new hazards, which won’t necessarily manifest as profoundly as they did in this case. The bugs they introduce may be more subtle, harder to identify, and more dangerous than the deficiencies they are intended to address.
- Pay close attention to the output buffer when you test a new call to any of the functions that incorporate this backfilling technique. This is sound practice whenever you use a function that outputs to a buffer. Thankfully, the fill pattern is easy to identify (Figure 1 was generated from the test program.)
- I discovered a new way to report invalid parameters, although the limitations imposed by its interface make it unlikely that I will use it, unless I find a way to use the knowledge that went into this article to extend its capabilities by peering into the previous stack frame.
- In a debug build, its signature exposes details that would be significantly more useful when user code invokes the handler. Nevertheless, even in a debug build, the information directly available through its arguments is only marginally useful.
- Since its arguments are null in a release build, a garden variety implementation such as the one in the test program is useless in a release build, and the code to hook it may as well be suppressed by wrapping it in a test for presence of the
_DEBUG
preprocessor symbol. To fight code bloat, the routine, itself, should also be guarded, although its prototype may be safely left unguarded, since it its purpose is to provide the compiler with a template against which to validate the syntax of a call.
- Studying its signature to discover what useful tidbits are available for use in the error report reminded me of the limitations that all callback functions impose on the code that registers and invokes them, and the desirability of anticipating future requirements when designing a callback interface.
- I discovered CRT function
_CrtSetReportMode
, which can disable the assertion message box if it gets to be too annoying, or send the information to STDERR
.
- A callback function that is used before it is defined needs a prototype, even if it implements a system defined interface such as
_invalid_parameter_handler
.
- I rediscovered
DebugBreak
, a Windows API function that can be used to force any process into a debugger. Since the Visual Studio IDE insists on a fully qualified file name for the program to load into the debugger, this is the only way I could monitor its behavior when called from a command prompt or batch file by its unqualified name. I needed to do this to find and fix a character truncation error in helper function ProgramIDFromArgV
.
- Assembly language modules can be incorporated into a Visual Studio project, which ships with an assembler. I could use the current assembler to assemble
CPURegisterPeek.ASM
until I got cute with an equate that simulates the behavior of an expression constructed from several sizeof
expressions.
- Any extra file, even one, such as
readme.txt
created by the New Project Wizard, that is neither code, nor content, that you delete directly from the file system, remains visible in the Solution Explorer, and causes the solution to be perpetually marked as out of date, prompting a request to rebuild whenever you start the debugger. Use the context menu in the Solution Explorer to remove the file from the solution, and the message goes away.
- You can force the build engine to skip a target by setting its modified date to the current time, making it appear to have been modified since the last full build. I used the 64 bit version of
FSTouch
, a free utility published by Funduc Software, and available at http://www.funduc.com/fstouch.htm.
References
1
|
“sprintf, _sprintf_l, swprintf, _swprintf_l, __swprintf_l,” the classic family of buffered formatted printing functions, is documented at https://msdn.microsoft.com/en-us/library/ybk95axf.aspx and elsewhere.
|
2
|
“sprintf_s, _sprintf_s_l, swprintf_s, _swprintf_s_l,” the corresponding collection of “secure” overloads, is documented at https://msdn.microsoft.com/en-us/library/ce3zzk1k.aspx.
|
3
|
“Secure Template Overloads” documents the template macros intended to simplify upgrading at https://msdn.microsoft.com/en-us/library/ms175759.aspx
|
4
|
“Howto prevent process crash on CRT error C++",” at http://stackoverflow.com/questions/10719626/howto-prevent-process-crash-on-crt-error-c, gets credit for steering me to the solution that enabled me to create a detailed report about an expected invalid parameter error in the first of four test cases in the demonstration program, even though I eventually concluded that it would have been simpler to redirect the assert to STDERR.
|
5
|
“Using Static Buffers to Improve Error Reporting Success,” David A. Gray, 9 April 2015, http://www.codeproject.com/Articles/894564/Using-Static-Buffers-to-Improve-Error-Reporting-Su.
|
History
20 January 2016 - Corrected a technical error in the table that describes the uses of the various CPU registers as they relate to memory management.