Introduction
You may have met programmers who believe that a memory corruption will always, immediately, generate some kind of visible result (most likely a program crash): I have.
I wish they were right.
Background
Memory corruption changes the contents of memory at unwanted locations, thus changing the values of the variables stored at those locations. In real life, variables hold meaningful data,
and a change to those data will have some bad results. Among others: calculations returning the wrong results, programs crashing, programmers losing jobs, and hackers getting access
to sensitive information. The sample project shows a case of memory corruption that actually does nothing but change the value
of a few variables. No Hollywood-style explosions, no loud screeching noises.
Wikipedia says: "Memory corruption happens when the contents of a memory location are unintentionally modified due to programming errors"
Don't take my word for it, go ahead, check it. The link is right here: http://en.wikipedia.org/wiki/Memory_corruption.
You're back already? Let's go on.
Again from Wikipedia: "In computer security and programming, a buffer overflow, or buffer overrun, is an anomaly where a program, while writing data to a buffer,
overruns the buffer's boundary and overwrites adjacent memory." http://en.wikipedia.org/wiki/Buffer_overflow
So far I've proved to you that I can copy and paste. But what's the point? Well, the point of this article is that memory corruption, like other kinds of corruption,
can and should be prevented. We'll get to that eventually. Just bear with me for a while. I hope you'll have some fun along the way.
Let's corrupt some memory!
Let's have a look at some parts of the sample program (.sln and .dsw provided):
void OverflowMyBuffer(char *szTest)
{
strcpy(szTest, "Hello, world!");
}
This is the easiest way to overflow a buffer: strcpy
(or memcpy
) into its address (the address of its first byte) something bigger than its allocated size.
For instance, the string "Hello, world!" is 14 bytes long (counting the ending zero); if the buffer szTest
is shorter than that, you'll get a buffer overflow.
void test ()
{
const char format[] = "* * * * * nine = %d, eight = %d, seven = %d, szSmallBuffer = '%s'\n";
const int const arr[3] = { 9, 8, 7 };
char szSmallBuffer[3];
strcpy(szSmallBuffer, "abc");
printf(format, arr[0], arr[1], arr[2], szSmallBuffer);
OverflowMyBuffer(szSmallBuffer);
printf(format, arr[0], arr[1], arr[2], szSmallBuffer);
printf("arr as a string: '%s'\n", arr);
}
The function test()
is the interesting one. First a few variables get declared and initialized, then a call to
printf()
shows that their values are as expected. So far, so good.
Then OverflowMyBuffer()
is called: this function does not receive a pointer to arr[]
, and has no inkling of its existence.
After that call, the values in the array arr[]
, that was not passed to the function, have changed: The last two printf()
calls show
that fact, and the new contents of the double-const array.
int _tmain(int argc, _TCHAR* argv[])
{
printf("\n-----\nIn main, before test()\n-----\n");
test();
printf("\n-----\nIn main, after test()\n-----\n");
return 0;
}
Not much to see here. The function 'test()
' above was the interesting one.
The output of the program is:
-----
In main, before test()
-----
* * * * * nine = 9, eight = 8, seven = 7, szSmallBuffer = 'abc'
* * * * * nine = 1998597231, eight = 1684828783,
seven = 33, szSmallBuffer = 'Hello, world!'
arr as a string: 'o, world!'
-----
In main, after test()
-----
What if the contents of arr[]
were not just a few idle integers, but important data? For instance: a corrupt pointer may make a program crash,
a corrupt variable holding the distance between your ship and the iceberg may send you swimming in very cold waters.
Beginner's aside
- Now, wait a minute! - someone may say: - If the function OverflowMyBuffer()
has no access to the variable arr[]
, how can it mess with its contents?
Well, that's exactly the problem with memory corruption. Using pointers, you can mess with memory everywhere. Consider this: *((int *) rand()) = 0xDeadBeef;
Pretty, right? This code tries to plant an arbitrary value in a random position in the memory of your process: if there is a variable there (for all I know, there might be)
its value will be modified; if the random pointer points to unallocated memory, or memory occupied by code, strange things will happen.
The point at last
Considering that there are Wikipedia articles on buffer overflow and memory corruption, what's the point of this article?
There are two:
A memory corruption will not necessarily generate an immediate, repeatable crash
Some time ago, a colleague, looking for a bug, searched a few thousand lines of code for all appearances of an integer variable whose value was mysteriously changing,
set breakpoints everywhere the variable appeared, and saw the value of the variable change without any apparent reason. He asked me to take a look: how could that happen?
Since you're reading this article, you may have deduced that it was memory corruption due to a buffer overflow. At the time, it wasn't obvious: the code was something like this:
int x = 14; char str[20];
LoadDatabaseFieldIntoVariable(SomeDatabaseField, str);
char * ptr = malloc(x);
strcpy(ptr, "Hello, world!");
We set breakpoints wherever str
was touched, added a watch for x
, and the bug was solved.
If you see production code randomly failing at customer sites, to succeed after a couple of minutes, with identical input; if you see simple calculations
sometimes returning weird results, and a minute later the right result; if you cannot reproduce the unwanted behavior under your debugger; if your teammates
are blessed with healthy programmers' ego ("if there's a bug, it's not in my code"); those are usual symptoms of memory issues: either corruption,
or failed, unchecked memory allocation (which is a subject for a totally different article), or, in multithreaded code, a race condition (again, out of the scope of this article).
Buffer overflows can be avoided
With a few simple precautions, you can make buffer overflows a thing of the past:
- Don't use C-style strings: use
std::string
, or WTL/ATL/MFC CString
, or CComBSTR/_bstr_t
. All of them manage their own memory. - Don't use C-style arrays: use STL containers. Again, they protect you against buffer overflows.
- If you absolutely must write a function that receives a C-style array as parameter and modifies its contents, take also a second parameter
of type
size_t
, with the count of elements, and use it to avoid buffer overflow. Just like strncpy()
.
Another frequent cause of memory corruption is dangling pointers; I didn't bump into any on the last few weeks. If you're interested in their prevention,
and a Google or Bing search for "how to prevent dangling pointers" didn't help, leave me a comment below.
I hope you enjoyed this article: in any case, thank you for taking the time to read it. Happy programming!
History