Foreword
Language May Offend: This article contains language that may be offensive to some people, in particular, to programmers that strongly prefer languages that do not expose pointers. Reader's discretion is advised.
Introduction
You're at work. Work, for you, includes maintenance of a legacy behemoth with decades of accumulation of global variables, C-style strings, C-style arrays, do-it-yourself containers, functions with obscure names, and many other things that were written before anyone thought of 'anti patterns'.
The phone rings. You answer. You know that voice.
- "It sometimes crashes."
You know what "it" means: a DLL deep in the heart of the product. The one that implements the core business logic. You know where it crashes: at the newest machine of your company's most valued customer.
And you know what "sometimes" means, but you need a few seconds to evaluate a course of action.
- "Please define 'sometimes'.", you say.
Nine times out of ten, when you're told that a program 'sometimes crashes', those crashes are random, and apparently they do not depend on a particular user's action. This one is no exception.
The caller is one of your top customer support people. You both have already been through this conversation: this program works fine in thousands of installations, but some machines seem to suffer from some kind of allergy to it, and in those few machines the program will crash three times out of seven runs. Customers, who have paid for the product, get angry.
You make a mental list of your options:
- Have the customer call an exorcist. Since computers are
deterministic, and the program crashes are random, their cause
is obviously supernatural: maybe the server room is built on top of an ancient
cemetery, or something like that. That may be fine, if you don't mind
people thinking that you believe in ghosts.
- Blame a virus. You will look like a clown when the
customer notices that the alleged 'virus' only seems to affect one
program.
- Recommend upgrading the hardware. The program works fine
in almost all the other places where it is installed, so there must be a
hardware contribution to the crashes. Since hardware costs money,
business customers will have to go through some kind of approval
process: this would send the ball to the customer's field. Some people
actually see this as a good thing.
- Roll up your sleeves and actually try to solve the problem.
If I was in your shoes, I'd rule the first three options out. I wouldn't be a programmer if I didn't prefer solving problems by fixing whatever is wrong with the code: and if a program crashes, there's always something wrong with the code. I'd start by implementing a structured error translator that writes relevant information to the Application Event Log, and by reviewing the code.
Let's assume your choice is the same as mine.
Random crashes are usually the result of resource management issues, among others:
- A memory allocation (
malloc
, calloc
, strdup
, and realloc
) returns NULL
, and this return value is ignored. - A resource allocation (
fopen
, or CreateFile
, for instance) returns NULL
(or INVALID_HANDLE_VALUE
), and this return value is ignored. - A race condition: one thread is trying to use a resource that was released by another thread.
- A memory corruption, for instance a buffer overflow (see A Study on Corruption).
- A dangling pointer.
If you feel like playing with dangling pointers, read on. Like any other pointer issues, if you work with a language that does not expose pointers (Java, Visual Basic, COBOL, C#), you won't meet it.
Background
What are dangling pointers?
Let's start at the source: http://en.wikipedia.org/wiki/Dangling_pointer
Dangling pointers are one aspect of a bigger subject: resource ownership.
When the ownership rules are clear, dangling pointers are scarce.
When you have legacy, C-style code, sprinkled with global variables, and rich in stack allocations, the ownership rules are seldom (if ever) clear.
The simple case
Let's say you have a function like this:
int foo()
{
MyClass pClass = new MyClass;
int x = pClass->bar();
int y = pClass->baa();
delete pClass;
switch (x)
{
case 1:
return y;
case 2:
return y * 2;
case 3:
default:
return x + y;
}
}
After some routine maintenance, someone (carefully respecting the
open-closed principle), adds a new case to the switch:
int foo()
{
MyClass pClass = new MyClass;
int x = pClass->bar();
int y = pClass->baa();
delete pClass;
switch (x)
{
case 1:
return y;
case 2:
return y * 2;
case 4:
if (17 == rand() % 99 )
{
y = pClass->baz(x);
}
return y / 3;
case 3:
default:
return x + y;
}
}
Many compilers will happily accept this.
You may expect this code to crash one time in a hundred. In that case, you would be wrong. Run the attached project (compile it in release mode) and see.
The simplest fix for code like this: use a stack variable (see below). Ownership is clear: when the stack unrolls, and only when the stack unrolls, the destructor for the stack variable mClass
will be called.
int foo()
{
MyClass mClass;
int x = mClass.bar();
int y = mClass.baa();
switch (x)
{
case 1:
return y;
case 2:
return y * 2;
case 4:
if (17 == rand() % 99 )
{
y = mClass.baz(x);
}
return y / 3;
case 3:
default:
return x + y;
}
}
Sometimes you cannot use a stack variable. For instance, when you use a factory method, that returns a pointer to something that implements an interface (inherits an abstract class, actually).
In this case, the simple fix is to use a smart pointer (like std::auto_ptr
, boost::shared_ptr
, the C++ 11 std::unique_ptr
, and others). Choose your smart pointer carefully: std::unique_ptr
is the cheapest and fastest, if you can use it.
MyAbstractClass *FactoryMethod()
{
return new SomethingDerivedFromMyAbstractClass();
}
int foo()
{
const std::auto_ptr<myabstractclass> pClass = std::auto_ptr<myabstractclass>(FactoryMethod());
int x = pClass->bar();
int y = pClass->baa();
switch (x)
{
case 1:
return y;
case 2:
return y * 2;
case 4:
if (17 == rand() % 99 )
{
y = pClass->baz(x);
}
return y / 3;
case 3:
default:
return x + y;
}
}
Dangling pointers are the result of explicit heap memory allocation. Stay away from explicit memory allocation, and you're safe.
There are three ways I know of to steer clear from explicit heap memory allocation:
- Use the stack.
- Use RAII (like
auto_ptr
, shared_ptr
, unique_ptr
, and all other smart pointers). - Use a language that manages memory for you (Java, C#, VB, COBOL).
If you can't choose your language, that makes two.
- So, we're done, ain't we? Can we have our cofee break now?
- Well, if you are thirsty, by all means do; but we're not done yet. Not by a long shot: Googling for 'dangling pointer' brings about 1,340,000 results; Bing found 298,000 results. That's a lot of results for a non-issue...
The real mess begins when functions start passing pointers around, and some function that received a pointer as a parameter deletes it. const
doesn't help us there, and in legacy code you will find sometimes beautiful, brilliant inventions - and other times horrible, terrible things that you can only hope did not go unpunished.
A few cases not so simple
Managing resource ownership, in short, is making clear which part of a program is in charge of allocating resources, and which part is in charge of releasing them.
The bigger your program, the messier this may get: as usual, we'll learn from the experience of others, and use the patterns that other people have found.
There are several patterns of ownership, the ones I've met most frequently are:
- Single owner: the same part of the program (class, function) that allocated a resource is in charge of releasing it. RAII is a good example: allocate on initialization, release in the destructor; also the
const std::auto_ptr
idiom, that Sutter refers to. - The producer-consumer (or source-sink) pattern: one part of the program allocates a resource, another releases it. I've used this pattern frequently in multi-threaded programming: one thread allocates a
new Job
instance, then adds it to a queue; a worker thread pops the Job
from the queue, calls pJob->Execute()
and deletes the Job
pointer. - Shared resources (like
shared_ptr
, or COM AddRef()/Release()
) have multiple owners: the last one to release the resource actually deletes the pointer, closes the handle, etc. There is a danger of leaks if there are circular references (a -> b -> c -> a), which usually imply that the pointers will never be released.
In real life, I seldom use explicit memory allocation. I use STL containers and smart pointers, that manage the nasty little details for me. The uses of new()
that I couldn't avoid are the following:
- The producer-consumer pattern, mentioned above: the producer either is or uses a factory, that will call
new
. - A legacy API requires a
char *
, and the buffer size is not known until run time. - Run time polymorphism, which often requires choosing the implementation of an interface at run time.
Let's see some examples.
Calling a legacy API
ShFileOperation
(for instance) receives a zero-delimited list of strings in its parameter pFrom
, with a double-zero to end the list.
The problem is: allocation and deallocation of that parameter.
When confronted with something this ugly, we programmers don't sweep it under the rug: we encapsulate it in a class.
In this case, there is a reasonably easy solution: encapsulate the call to the legacy API in a class that manages the buffer as an instance member. The member may be an auto_ptr
; the class may even expose a method that receives a std::list<std::string>
and does all the allocation and resource management inside it.
The producer-consumer pattern
The source/producer is usually a function that allocates a pointer, passes it to another function, and exits without deleting it. This is not a problem until, during maintenance, someone notices that the original author 'forgot a delete', and 'fixes that leak'.
Fortunately, deleting twice the same pointer will raise an immediate exception, so this particular error can be caught in tests.
On the other hand, avoiding is better than fixing: let's see if we find a solution.
Approach 1.1: Add extensive comments to the code, make ownership explicit.
In the producer:
Job *pShootAndForget = new Job(params);
m_workerThread.AddToQueue(pShootAndForget);
pShootAndForget = NULL;
Elegant and refined, right?
Approach 1.2: Don't declare a temporary variable, do it inline.
m_workerThread.AddToQueue(new Job(params));
The risk here, again, is that someone may find a 'leak', and 'fix' it.
Approach 2: Move the burden of creation to m_worker. Ugly.
m_worker.AddToQueue(params);
I don't like this approach, for several reasons:
- It breaks the single responsibility principle (m_workerThread is now both a factory and a consumer of jobs); formerly, it was in charge of consuming (executing and deleting) them only.
- Adds coupling: In the original version,
Job
could be any abstract class with a virtual Execute()
method, and m_worker 'knew' nothing about its creation. You could use the same worker class with any number of different clients, without changing one line of code.
Approach 3: Use std::auto_ptr, it yields ownership.
If your development environment supports unique_ptr
(C++ 11) you can use unique_ptr
instead of auto_ptr
.
Those of us still using compilers that do not support C++ 11, in business environments where you're not allowed to use the Boost libraries, will have no option but to stick with auto_ptr
.
class WorkerThread
{
void AddToQueue(std::auto_ptr<job> pJob)
{
m_queue.push(pJob.release());
}
};
std::auto_ptr<job> pShootAndForget ( new Job(params) );
m_workerThread.AddToQueue(pShootAndForget);
This approach is not foolproof, but it doesn't make tripping too easy.
Choosing an implementation at run time
Enter run-time polymorphism.
Let's say you have a program that connects to a database, and let's say your program may use SQL Server, Oracle, or MySQL, depending on configuration: you implemented a data access layer that connects to the right database, generates and executes all the SQL required, and stores and retrieves objects, leaving client code oblivious of all the nasty database details.
You did so by defining a data access interface, and implementing a different concrete class for each database engine.
You also implemented a factory class, with a static method that returns an instance of the right implementation of your interface.
Hey, you did a very nice job there! Let's see some pseudo-code.
std::string databaseEngine = Configuaration::GetInstance().GetDatabaseEngine(); IDatabaseAccess *pDatabaseAccess = DatabaseAccessFactory::GetIDatabaseAccess(databaseEngine.c_str());
Since calling the configuration and factory may have a measurable overhead, someone may 'improve' the program by caching the database access pointer, for instance, in a static variable.
static IDatabaseAccess *pDatabaseAccess = NULL;
if (NULL == pDatabaseAccess)
{
std::string databaseEngine = Configuaration::GetInstance().GetDatabaseEngine(); pDatabaseAccess = DatabaseAccessFactory::GetIDatabaseAccess(databaseEngine.c_str());
}
The problem is that at some other piece of code, far far away, someone may delete this pointer.
Here, using a
shared_ptr
instead of a raw pointer would be of assistance.
How to make a mess with auto_ptr
Now, here's a riddle for you: what will be the output of this program? (It's part of the sample project).
namespace mess_with_auto_ptr
{
struct X
{
void Foo()
{
printf("X::Foo(%#p);\n", this);
fflush(stdout);
}
virtual ~X()
{
printf("X::~X(%#p);\n", this);
fflush(stdout);
}
};
void Consumer(std::auto_ptr<x> px)
{
px->Foo();
}
void Producer()
{
printf("Producer();\n");
std::auto_ptr<x> px(new X);
px->Foo(); Consumer(px); px->Foo(); printf("Ready to exit Producer();\n");
}
void Producer2()
{
printf("Producer2();\n");
std::auto_ptr<x> px(new X);
px->Foo(); Consumer(std::auto_ptr<x>(px.get())); px->Foo(); printf("Ready to exit Producer2();\n");
fflush(stdout);
}
}
int main(int argc, char* argv[])
{
mess_with_auto_ptr::Producer();
mess_with_auto_ptr::Producer2();
return 0;
}
And here is the output:
Producer();
X::Foo(0X00347B08);
X::Foo(0X00347B08);
X::~X(0X00347B08);
X::Foo(00000000);
Ready to exit Producer();
Producer2();
X::Foo(0X00347B08);
X::Foo(0X00347B08);
X::~X(0X00347B08);
X::Foo(0X00347B08);
Ready to exit Producer2();
At the end, it generates a fault - and offers the 'Please tell Microsoft' dialog.
Look at the fifth line of the output: X::Foo(00000000);
We actually see a call to (NULL)->Foo()
. And it seems to work (at least, as long as you don't use the 'this' pointer inside Foo()
).
Does this make any sense? Actually, yes.
This is how C++ works: a member function is actually a regular, global, C function that receives a pointer to 'this' as its first parameter.
How to make a mess with stack variables
Now, for another riddle: what do you think this will do?
void Producer3()
{
X x;
Consumer(&x); }
It compiles fine. You won't know there is a problem until run time.
In debug mode (VS2008), you'll get an assertion, with this content:
In release mode, it runs to the end.
What about Linux?
In Linux, it's about the same. Compiling the code in Release mode with Eclipse/GCC, and running it on a terminal, this is what we see:
We see here:
- Instead of
X::Foo(00000000)
, it prints X::Foo(nil)
- Instead of showing a message box, the message Segmentation fault is written to
stderr
.
Trying to delete a pointer to a variable in the stack caused an immediate crash.
This similarity in behavior is not a surprise: there is only one C++ standard.
Points of Interest
- Dangling pointers will not crash a program immediately. If the memory they formerly occupied is still there, they may even work. The crash will be, according to one of Murphy's laws, when and where it will hurt the most.
- Avoiding explicit memory allocation whenever possible may save us quite a few headaches.
- Standard smart pointers are a useful tool. Smart people have spent time designing, writing and testing them for us. Not using them is just plain rude.
- When compelled to use explicit memory allocation (like when calling
ShFileOperation
or FormatMessage
), encapsulating each function in a class and testing this class thouroughly may save some pain.
The ways I've found so far to avoid and fix dangling pointers are: for new code, good design (with clear ownership), and for legacy code, code reviews. And always, everywhere, unit tests.
Code reviews, in particular, shed light on otherwise obscure areas of code, and let programmers who are not the original authors of the code get familiar with it. There are many misconceptions about code reviews, one of them being that the only way to review code is to have a senior programmer go over the code of a junior programmer, but that issue belongs in another article.
Thanks for reading, and please share your opinion.
History
2012, May 4th - First version.
2012, May 18th - Added a few lines about Linux.