Introduction
A few weeks ago, I had to write some code where several threads are
manipulating a single instance of some class. The synchronization problems were
pretty much self-solved, so the main issue I was facing, was: How can I know
when is it safe to delete that shared instance?
The obvious solution for this problem involves the usage of a reference
counting mechanism of some sort. This mechanism will make sure my object is
deleted if, and only if, the number of references to that object has reached
zero. As a keen C/C++ programmer, I have been (traditionally) reluctant to use
this kind of solution, since I was always concerned about the performance costs
of the underlying algorithms. Although Boost::shared_ptr
is a very good implementation, I couldn't use it, since my project is required
to compile without additional libraries.
So, I sat down to write my own smart pointer which will specifically
address the performance issue. When debugging was over, I took the resulting
class (named: AUTO_REF
) to the test track, to go head to head with
other similar classes. The demo project attached to this article, is actually my
benchmarking application which I used for timing these classes. The results are
covered in the Benchmarks section of this article.
Interface
"Give me a break", I hear you say, "I just want to know how to integrate
AUTO_REF
into my code". Well, here are the details. Let's assume
you have just written a class called MY_CLASS
, which has a member
function called run(int x)
. Here's how to access
MY_CLASS
objects via AUTO_REF
pointers.
- Setting up your project:
- Add auto_ref.cpp to your project.
- Make sure auto_ref.h and quick_mem_manager.h are located on
your project's include files search path.
- Setting up an
AUTO_REF
pointer:
MY_CLASS* raw1 = new MY_CLASS();
MY_CLASS* raw2 = new MY_CLASS();
AUTO_REF<MY_CLASS> p1 = raw1;
AUTO_REF<MY_CLASS> p2 = p1;
- Changing the referenced object:
AUTO_REF<MY_CLASS> p3 = raw2;
p2 = p3;
- Dereferencing:
p1->run(5);
(*p1).run(5);
p2->run(3);
- Checking validity (implemented via the automatic bool casting)
if(p1)
cout << "p1 is valid";
if(!p2)
cout << "p2 is invalid";
- What not to do:
AUTO_REF<MY_CLASS> pt1(raw1);
AUTO_REF<MY_CLASS> pt2(pt1);
AUTO_REF<MY_CLASS> pt3(raw1);
The design philosophy
So what actually is a smart pointer? A smart pointer is a pointer which frees
the programmer of the need to call delete()
to release the
pointed-to object. In a typical implementation, the smart pointer's constructor
increases the object's reference counter, while its destructor decreases it. The
destructor also destroys the object when the count reaches zero, thus allowing
the programmer to totally forget about releasing the object. Sounds wonderful?
Well, there is one pitfall: If your data structure has circular pointing
(A -> B -> C -> A
), smart pointers will not work as
expected, but we will get to that later. For
the meantime, let's assume your data structure is circle free.
Before getting into the design of the AUTO_REF
class, let's go
over some typical approaches for implementing smart pointers:
Ownership: Instead of using a reference counter, these smart pointers
keep track of which pointer instance has 'ownership' over the pointed to object.
When a pointer is copy constructed or copy assigned it 'takes' the ownership
from the other pointer, which virtually becomes invalid. If a pointer is
destroyed while owning an object, the object is destroyed along. For a
representing sample of this approach, take a look at STL's auto_ptr
class. This is a simple and robust approach with minimal (zero) memory overhead,
but it's not a real smart pointer, since you cannot have two pointers pointing
to the same object at a given time. This disadvantage makes it impossible to
store auto_ptr
's in an STL container. (read
more).
Inheritance: This kind of a smart pointers can point only at classes
which were derived from a dedicated base class (let's say:
SP_BASE
). From the smart pointer's point of view, it points to an
instance of SP_BASE
. While this keeps the implementation simple, it
has significant perfromance costs, due to the use of inheritance and virtual
functions (SP_BASE
's destructor must be
virtual, for start).
Templates: The smart pointer class takes the type of the pointed-to
object as a template parameter, So the pointer is actually 'aware' of what type
of object it is pointing to. This design offers better performances (when
compared with the Inheritance design), but, on the other hand, it can lead to an
increased executable size. A typical example: Boost::shared_ptr
AUTO_REF
's implementation attempts to meet the following
guidelines/requirements:
- [GL-1] Minimal performance degradation when compared to raw pointers.
- [GL-2] Favor performance over memory overhead.
- [GL-3] Complete exception safety.
- [GL-4]
AUTO_REF
's source file must be small in size. (This
requirement should keep most developers happy)
- [GL-5]
AUTO_REF
can be stored inside STL containers.
- [GL-6] The code should compile on a standard Visual C++ compiler with no
additional libraries.
For those of you not familiar with the term "exception safety" here is a
short, intuitive (totally informal) definition: if an exception is thrown your
objects will remain in a stable state, with no unwanted side affects taking
place (e.g: memory leaks).
Throughout development, [GL-1] was the dominant requirement of the six. It
almost immediately implied using the template approach, and - in addition - I
decided to hold the raw pointer value at each AUTO_REF
object. I
could have placed that pointer at the shared structure, along with the counter,
but that would have caused an extra memory access operation for each pointer
dereferencing. Taking the pointer off the shared structure, made dereferencing
faster, and reduced that shared structure to (merely) a single LONG
The next major call was to employ a custom memory allocator. Allocating the
counter (type: LONG
) is the most frequent, time consuming step in
the AUTO_REF
algorithm. Since all allocations are of the same size
it seemed like the perfect situation for using a memory manager customized for
allocating buffers of constant size. As before, I was not allowed to use boost
, so reusing boost::pool
was out of the question. Hence, I supplied a similar allocator of my own.
The other requirements, [GL-3] to [GL-5] pretty much came along with code,
with just minimal effort. Making the code exception safe required some more
thought, but luckily, I have just read Herb Sutter's excellent book "Exceptional C++". By
following the book's suggestions it wasn't too hard to make
AUTO_REF
comply with this requirement as well. As for the code size
- it turned out to be a mere 3,886 bytes.
Implementation
So how does the code inside auto_ref.cpp/.h achieve all that?
The key is is in auto_ref.h. This file defines the
AUTO_REF
class which does most of the work. When a new
AUTO_REF
pointer is created it receives a pointer to an existing
class/struct (the 'raw' pointer) and stores it in the m_inst_p
data
member. Then, the constructor allocates a small LONG
sized buffer
for holding the reference counter. The counter initial value is set to 1 since
there is one AUTO_REF
object pointing at the object.
When a new AUTO_REF
object is initialized with an existing
AUTO_REF
object (i.e: copy constructed), the new object takes the
raw pointer and the counter pointer from the existing one. Additionally, the
counter is increased by 1, since the newly created object is actually another
reference to the pointed-to object. Note that the code uses
InterlockedIncrement
to make this operation safe even if two
threads create two new AUTO_REF
instances simultaneously.
The destructor is similarly simpler: It decreases the counter (again, using
InterlockedDecrement
), and - if the counter has reached zero - it
deletes both the pointed-to object and the reference counter.
Copy assignment: I could have used a code which combines the destructor with
the copy constructor. This would lead to something along these lines:
if(!InterlockedDecrement(m_ref_p)) {
delete m_inst_p;
g_the_auto_ref_pool.free(m_ref_p);
}
m_ref_p = other.m_ref_p;
m_inst_p = other.m_inst_p;
InterlockedIncrement(m_ref_p);
return *this;
Looks OK? Wrong! If self-assignment is performed, this code will decrease the
counter, delete the pointed-to object (assuming we started with a reference
count of one), and then increase the counter back to one. Thus,
m_inst_p
becomes invalid, as it now holds a pointer to a deleted
object. This is certainly not a desirable outcome.
The common way to fix this problem, is to enclose this piece of code in an if
statement: if(this != &other) {...} return *this;
. However,
there is another way:
The creative way, requires the use of what Herb Sutter is calling "The
Canonical form of strongly exception-safe copy assignment". The book
("Exceptional C++") argues that if you provide a non throwing
swap()
function, you can always write your copy assignment using
the following lines:
T &operator=(const T &rhs) {
T temp(rhs);
swap(temp);
return *this;
}
This is neat, totally exception free and - as a special bonus - you get the
swap functionality. So - Herb - this assignemt operator is for you.
My code defines one more class: QUICK_MEM_MANAGER
. This class
implements the custom allocator functionality to speed up memory manipulation
required by AUTO_REF
. The algorithm goes like this:
QUICK_MEM_MANAGER
uses C's standard malloc()
to
allocate memory in chunks of several Kilobytes. It also keeps a LIFO list of
free buffers.
When QUICK_MEM_MANAGER::malloc()
is called, it checks to see if
the free buffers list is empty. If not, it removes the first buffer from the
list and returns that buffer to the caller. Otherwise, it finds a new buffer on
the recently allocated chunk (A new chunck is allocated if needed). Since all
buffers are of the same size, we can be absolutely sure that the buffer we
remove form the free list, will meet the caller's requested size.
When the QUICK_MEM_MANAGER::free()
function is called, the
specified buffer is placed on the free list, a simple operation which requires
O(1) time. The list is implemented as singly linked list where the pointers are
stored inside the free buffers themselves, thus, eliminating the need for
extra memory.
Restrictions
Circular pointers
I promised to get back to this issue. To illustrate
the problem, let us consider the following example: You have defined a class
MY_CLASS
which has an
AUTO_REF<MY_CLASS>
pointer
as a data member, called
next
:
struct MY_CLASS {
AUTO_REF<MY_CLASS> next;
};
void demo_func() {
AUTO_REF<MY_CLASS> pa(new MY_CLASS);
AUTO_REF<MY_CLASS> pb(new MY_CLASS);
AUTO_REF<MY_CLASS> pb(new MY_CLASS);
pa->next = pb;
pb->next = pc;
pc->next = pa;
}
Having executed this code, we created a circle of pointers: A -> B
-> C -> A
. Now, let's see what is the reference count for each
object: A
has pa
pointing at it, but also
pc->next
, so the total for A
is 2. The same goes
for B
(pb
and pa->next
) and
C
.
When demo_func()
returns, pa, pb
and
pc
go out of scope. Hence, the reference counters are decreased,
but no object gets deleted since all the counters go from 2 to 1. So, as far as
the AUTO_REF
mechanism is concerned these objects are still in use.
On the other hand, at this point, there's no local/global/static pointer which
references either A
, B
or C
. This means
that the code has no way to access/delete any of these objects. On the bottom
line - we have a leak since these 3 objects will never be released.
You should note that if we didn't make the last assignment pc->next
= pa;
(i.e: we did not close the chain), then A
's reference
counter would have become zero when pa
goes out of scope.
Consequently, A
would have been destroyed, a thing that would have
decreased B
's counter to 0 (since pa->next
's
destructor has been called by A
's destructor). Thus, B
would have been destroyed as well, and eventually C
(due to the
same reason as B
).
Solving this problem (reference counters on circular data structures)
requires the employment of more sophisticated 'garbage collection' algorithms,
at the expense of simplicity and speed. This certainly is not what I had in
mind, so I decided to leave this issue open. The developer must be aware of this
limit, as it imposes restrictions on the usage of AUTO_REF
pointers
in certain data structures.
Thread safety
- Construction, Copy-Construction and Destruction of an
AUTO_REF
object are thread safe.
- Assignment (
operator=
) is not thread-safe: If two threads are
concurrently assigning a value to an existing AUTO_REF
instance the
result is unpredictable.
- Dereferencing is not synchronized: When two (or more) threads are invoking
requests (i.e.: member functions) on an object X thru
AUTO_REF
pointer/pointers, the requests will be executed with no predictable order,
possibly in an interleaving manner. Thus, the design of X's class must support
thread-safety, if you want your program to use it in such a way.
Memory management
QUICK_MEM_MANAGER
's allocation scheme
proves to be fairly quick, but this speed gain comes at the expense of memory
consumption: If you allocate a lot of memory (via
QUICK_MEM_MANAGER::malloc()
) and then release most of it, the
released portion will not be returned to the system, only
QUICK_MEM_MANAGER::malloc()
will be able to use this portion of
memory. If this is your application's objects usage profile, you should either
consider using other smart pointers, or supply a custom implementation of
QUICK_MEM_MANAGER
which will behave better under the required
circumstances.
Pointers to arrays
AUTO_REF
is not aware whether it points
to a single struct (or class) or to the first element in an array of structs.
Hence, if you create an array by using the
new[]
operator, you
should not use the pointer you received to initialize an
AUTO_REF
.
If you do so, you'll have a memory leak since only the very first element of the
array will be destroyed when the reference counter reaches zero. This is not a
major limit since you can (a) 'wrap' your dynamically allocated array inside a
class, or (b) just use
std::vector
, which is probably the simplest,
safest solution
Multiple counters
As illustrated in
code sample f -
"what not to do" (under the interface section of this article), you cannot
initialize an
AUTO_REF
pointer with a raw pointer to an object
which is already pointed to by another
AUTO_REF
object. This will
lead to an undefined behavior. The rule is: If you create an
AUTO_REF
pointer to an object, make sure all other
AUTO_REF
s pointing at this object are assigned by either a copy
constructor or a copy assignment operator.
Benchmarks
The main motivation for developing
AUTO_REF
, was
to create a smart pointer class which offers minimal speed overhead over raw 'C'
pointers. Does
AUTO_REF
's implementation delivers what it was
expected to do ?
To answer this intriguing question I wrote the test application. The program
creates N (set to 320,000) instances of a class, and store the pointers
to these objects in an STL vector. The code then goes into a loop, which
dereferences the (vector stored) pointers, and invokes a member function on the
pointed objects. This loop is repeated N*60 times.
The type of the pointer used by the code is determined by the
PTR_SWITCH
preprocessor symbol, so the application can be
configured to use any of the following pointer types: AUTO_REF
pointers (AUTO), raw 'C' pointers (RAW), RefCountPtr
(RCP),
idllib::smart_ptr
(IDLLIB), and boost::shared_ptr
pointers (BOOST).
I measured average run time for each configuration when running on a 700MHz
Win2000 machine. The code was compiled with MSVC6 using STLport 4.5. I have
taken the result of the RAW configuration as 100%, and calculated speed in
percentage, relative to this result. When I modified the number of times the
main loop is performed (currently set to N*60), The difference between
RAW and the other pointers had changed, but the overall ranking remained the
same. You should bear in mind that although my application tries to simulate a
typical, 'real life' usage of pointers, the results are true for this specific
test only.
Configuration Run time (seconds) %
==================================================
RAW 10.3 100%
AUTO_REF 11.7 113%
BOOST 12.7 123%
RCP 15.2 147%
IDLLIB 16.6 161%
The table shows that AUTO_REF
pointers performed very well:
AUTO_REF
yielded performance reduction of only 13%, which is 10%
better than its nearest competitor (BOOST
). The other two
configurations reduced performance by more than 45%.
So, when should you use AUTO_REF
? The answer is simple: Whenever
you do not want to explicitly delete your pointers, but you still want to keep
your program as fast as possible.
References
- Herb Sutter, Exceptional
C++
- Greg Colvin and Beman Dawes, Smart pointer library
- Steve Cleary, Boost
Pool Library
- Herb Sutter, Effective
memory management
- Stefan Chekanov, A Smart Pointer Capable of Object
Level Thread Synchronization and Reference Counting Garbage Collection - A
CodeProject article.
- Sandu Turcan, A Simple Smart Pointer - A CodeProject article.