Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / programming

A Study on Corruption

4.96/5 (17 votes)
30 Jan 2012CPOL5 min read 41.8K   229  
Do you believe that memory corruption will generate an immediate, repeatable crash? Some programmers actually do...

Introduction

You may have met programmers who believe that a memory corruption will always, immediately, generate some kind of visible result (most likely a program crash): I have. I wish they were right.

Background

Memory corruption changes the contents of memory at unwanted locations, thus changing the values of the variables stored at those locations. In real life, variables hold meaningful data, and a change to those data will have some bad results. Among others: calculations returning the wrong results, programs crashing, programmers losing jobs, and hackers getting access to sensitive information. The sample project shows a case of memory corruption that actually does nothing but change the value of a few variables. No Hollywood-style explosions, no loud screeching noises.

Wikipedia says: "Memory corruption happens when the contents of a memory location are unintentionally modified due to programming errors"

Don't take my word for it, go ahead, check it. The link is right here: http://en.wikipedia.org/wiki/Memory_corruption. You're back already? Let's go on.

Again from Wikipedia: "In computer security and programming, a buffer overflow, or buffer overrun, is an anomaly where a program, while writing data to a buffer, overruns the buffer's boundary and overwrites adjacent memory." http://en.wikipedia.org/wiki/Buffer_overflow

So far I've proved to you that I can copy and paste. But what's the point? Well, the point of this article is that memory corruption, like other kinds of corruption, can and should be prevented. We'll get to that eventually. Just bear with me for a while. I hope you'll have some fun along the way.

Let's corrupt some memory!

Let's have a look at some parts of the sample program (.sln and .dsw provided):

C++
void OverflowMyBuffer(char *szTest)
{
    // Copy a string of size 13 into a buffer
    // of unknown size. Its size may be just 3 bytes...
    strcpy(szTest, "Hello, world!");  
}

This is the easiest way to overflow a buffer: strcpy (or memcpy) into its address (the address of its first byte) something bigger than its allocated size. For instance, the string "Hello, world!" is 14 bytes long (counting the ending zero); if the buffer szTest is shorter than that, you'll get a buffer overflow.

C++
void test ()
{
    const char format[] = "* * * * * nine = %d, eight = %d, seven = %d, szSmallBuffer = '%s'\n";
    // Const everywhere: this cannot change, right?
    // The compiler would tell us, wouldn't it? Well, the contents
    // of this array will be corrupted by a buffer overflow. Stay tuned...
    const int const arr[3] = { 9, 8, 7 };  
    char szSmallBuffer[3];
    // The buffer has no room for the ending zero, but VC++ doesn't seem to mind... 
    strcpy(szSmallBuffer, "abc"); 
    // The next line prints * * * * * nine = 9, eight = 8,
    //       seven = 7, szSmallBuffer = 'abc'; not bad.
    printf(format, arr[0], arr[1], arr[2], szSmallBuffer); 
    OverflowMyBuffer(szSmallBuffer);
    // The next line prints * * * * * nine = 1998597231,
    // eight = 1684828783, seven = 33, szSmallBuffer = 'Hello, world!'
    // 33 is the ANSI code for '!'; suspicion arises...
    printf(format, arr[0], arr[1], arr[2], szSmallBuffer); 
    // The next line prints: arr as a string: 'o, world!'
    printf("arr as a string: '%s'\n", arr); 
}

The function test() is the interesting one. First a few variables get declared and initialized, then a call to printf() shows that their values are as expected. So far, so good.

Then OverflowMyBuffer() is called: this function does not receive a pointer to arr[], and has no inkling of its existence.

After that call, the values in the array arr[], that was not passed to the function, have changed: The last two printf() calls show that fact, and the new contents of the double-const array.

C++
int _tmain(int argc, _TCHAR* argv[])
{
    printf("\n-----\nIn main, before test()\n-----\n");
    test();
    printf("\n-----\nIn main, after test()\n-----\n");
    return 0;
}

Not much to see here. The function 'test()' above was the interesting one.

The output of the program is:

-----
In main, before test()
-----
* * * * * nine = 9, eight = 8, seven = 7, szSmallBuffer = 'abc'
* * * * * nine = 1998597231, eight = 1684828783, 
                    seven = 33, szSmallBuffer = 'Hello, world!'
arr as a string: 'o, world!'

-----
In main, after test()
-----

What if the contents of arr[] were not just a few idle integers, but important data? For instance: a corrupt pointer may make a program crash, a corrupt variable holding the distance between your ship and the iceberg may send you swimming in very cold waters.

Beginner's aside

- Now, wait a minute! - someone may say: - If the function OverflowMyBuffer() has no access to the variable arr[], how can it mess with its contents?

Well, that's exactly the problem with memory corruption. Using pointers, you can mess with memory everywhere. Consider this: *((int *) rand()) = 0xDeadBeef; Pretty, right? This code tries to plant an arbitrary value in a random position in the memory of your process: if there is a variable there (for all I know, there might be) its value will be modified; if the random pointer points to unallocated memory, or memory occupied by code, strange things will happen.

The point at last

Considering that there are Wikipedia articles on buffer overflow and memory corruption, what's the point of this article?

There are two:

A memory corruption will not necessarily generate an immediate, repeatable crash

Some time ago, a colleague, looking for a bug, searched a few thousand lines of code for all appearances of an integer variable whose value was mysteriously changing, set breakpoints everywhere the variable appeared, and saw the value of the variable change without any apparent reason. He asked me to take a look: how could that happen? Since you're reading this article, you may have deduced that it was memory corruption due to a buffer overflow. At the time, it wasn't obvious: the code was something like this:

C++
// This is pseudo-code. 
int x = 14; // 32 bits = 4 bytes. Contents: { 14, 0, 0, 0 };
            // The x86 processor family is little endian. 
char str[20]; // Declared after x, which means (str + 20 == &x);
              // don't expect (str - 4 == &x). 

// ... many, many lines of code: x is not touched ...

// The database field is of size 20, the function
// LoadDatabaseFieldIntoVariable() calls strcpy;
// the ending zero of str[] overflows into the least significant byte of x
LoadDatabaseFieldIntoVariable(SomeDatabaseField, str); 

// ... many, many lines of code: x is not touched ...

// x is no longer 14, but zero. Huh?
// Nobody ever touched it! What happened?
char * ptr = malloc(x);
// .. a few more lines of code ...
strcpy(ptr, "Hello, world!"); // Crash!

We set breakpoints wherever str was touched, added a watch for x, and the bug was solved.

If you see production code randomly failing at customer sites, to succeed after a couple of minutes, with identical input; if you see simple calculations sometimes returning weird results, and a minute later the right result; if you cannot reproduce the unwanted behavior under your debugger; if your teammates are blessed with healthy programmers' ego ("if there's a bug, it's not in my code"); those are usual symptoms of memory issues: either corruption, or failed, unchecked memory allocation (which is a subject for a totally different article), or, in multithreaded code, a race condition (again, out of the scope of this article).

Buffer overflows can be avoided

With a few simple precautions, you can make buffer overflows a thing of the past:

  • Don't use C-style strings: use std::string, or WTL/ATL/MFC CString, or CComBSTR/_bstr_t. All of them manage their own memory.
  • Don't use C-style arrays: use STL containers. Again, they protect you against buffer overflows.
  • If you absolutely must write a function that receives a C-style array as parameter and modifies its contents, take also a second parameter of type size_t, with the count of elements, and use it to avoid buffer overflow. Just like strncpy().

Another frequent cause of memory corruption is dangling pointers; I didn't bump into any on the last few weeks. If you're interested in their prevention, and a Google or Bing search for "how to prevent dangling pointers" didn't help, leave me a comment below.

I hope you enjoyed this article: in any case, thank you for taking the time to read it. Happy programming!

History

  • 2012, January - Posted.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)