(untagged)

The fastest smart pointer in the west

Itay Maman

0.00/5 (No votes)

7 Oct 2003

Need a super quick implementation of a reference counting smart pointer? AUTO_REF delivers just that, with a 'thin' source code of just 3,886 bytes (< 4KB)

Introduction

A few weeks ago, I had to write some code where several threads are manipulating a single instance of some class. The synchronization problems were pretty much self-solved, so the main issue I was facing, was: How can I know when is it safe to delete that shared instance?

The obvious solution for this problem involves the usage of a reference counting mechanism of some sort. This mechanism will make sure my object is deleted if, and only if, the number of references to that object has reached zero. As a keen C/C++ programmer, I have been (traditionally) reluctant to use this kind of solution, since I was always concerned about the performance costs of the underlying algorithms. Although Boost::shared_ptr is a very good implementation, I couldn't use it, since my project is required to compile without additional libraries.

So, I sat down to write my own smart pointer which will specifically address the performance issue. When debugging was over, I took the resulting class (named: AUTO_REF) to the test track, to go head to head with other similar classes. The demo project attached to this article, is actually my benchmarking application which I used for timing these classes. The results are covered in the Benchmarks section of this article.

Interface

"Give me a break", I hear you say, "I just want to know how to integrate AUTO_REF into my code". Well, here are the details. Let's assume you have just written a class called MY_CLASS, which has a member function called run(int x). Here's how to access MY_CLASS objects via AUTO_REF pointers.

Setting up your project:
1. Add auto_ref.cpp to your project.
2. Make sure auto_ref.h and quick_mem_manager.h are located on your project's include files search path.
Setting up an AUTO_REF pointer:

MY_CLASS* raw1 = new MY_CLASS();         // First object (obj1)

MY_CLASS* raw2 = new MY_CLASS();         // Second object (obj2)


AUTO_REF<MY_CLASS> p1 = raw1;                
AUTO_REF<MY_CLASS> p2 = p1;              // p1,p2 both point at obj1

Changing the referenced object:

AUTO_REF<MY_CLASS> p3 = raw2;            // p3 points at obj2

p2 = p3;                                 // p2,p3 both point at obj2

Dereferencing:

p1->run(5);                              //Invokes run(5) on obj1. 

(*p1).run(5);                            //   same thing


p2->run(3);                              //invokes run(3) on obj2

Checking validity (implemented via the automatic bool casting)

if(p1)                               
      cout << "p1 is valid";       // p1 is dereferenceable 



if(!p2)
      cout << "p2 is invalid";     // p2 is empty. Dereferencing 

                                   // is not allowed

What not to do:

AUTO_REF<MY_CLASS> pt1(raw1);       // First pointer - no problem!


AUTO_REF<MY_CLASS> pt2(pt1);        // Declaring an additional

                                    // pointer thru an

                                    // existing AUTO_REF pointer - ok!


AUTO_REF<MY_CLASS> pt3(raw1);       // Declaring an additional pointer

                                    // WITHOUT using the existing 

                                    // AUTO_REF pointer

                                    //            - Wrong !!!!

The design philosophy

So what actually is a smart pointer? A smart pointer is a pointer which frees the programmer of the need to call delete() to release the pointed-to object. In a typical implementation, the smart pointer's constructor increases the object's reference counter, while its destructor decreases it. The destructor also destroys the object when the count reaches zero, thus allowing the programmer to totally forget about releasing the object. Sounds wonderful? Well, there is one pitfall: If your data structure has circular pointing (A -> B -> C -> A), smart pointers will not work as expected, but we will get to that later. For the meantime, let's assume your data structure is circle free.

Before getting into the design of the AUTO_REF class, let's go over some typical approaches for implementing smart pointers:

Ownership: Instead of using a reference counter, these smart pointers keep track of which pointer instance has 'ownership' over the pointed to object. When a pointer is copy constructed or copy assigned it 'takes' the ownership from the other pointer, which virtually becomes invalid. If a pointer is destroyed while owning an object, the object is destroyed along. For a representing sample of this approach, take a look at STL's auto_ptr class. This is a simple and robust approach with minimal (zero) memory overhead, but it's not a real smart pointer, since you cannot have two pointers pointing to the same object at a given time. This disadvantage makes it impossible to store auto_ptr's in an STL container. (read more).

Inheritance: This kind of a smart pointers can point only at classes which were derived from a dedicated base class (let's say: SP_BASE). From the smart pointer's point of view, it points to an instance of SP_BASE. While this keeps the implementation simple, it has significant perfromance costs, due to the use of inheritance and virtual functions (SP_BASE's destructor must be virtual, for start).

Templates: The smart pointer class takes the type of the pointed-to object as a template parameter, So the pointer is actually 'aware' of what type of object it is pointing to. This design offers better performances (when compared with the Inheritance design), but, on the other hand, it can lead to an increased executable size. A typical example: Boost::shared_ptr

AUTO_REF's implementation attempts to meet the following guidelines/requirements:

[GL-1] Minimal performance degradation when compared to raw pointers.
[GL-2] Favor performance over memory overhead.
[GL-3] Complete exception safety.
[GL-4] AUTO_REF's source file must be small in size. (This requirement should keep most developers happy)
[GL-5] AUTO_REF can be stored inside STL containers.
[GL-6] The code should compile on a standard Visual C++ compiler with no additional libraries.

For those of you not familiar with the term "exception safety" here is a short, intuitive (totally informal) definition: if an exception is thrown your objects will remain in a stable state, with no unwanted side affects taking place (e.g: memory leaks).

Throughout development, [GL-1] was the dominant requirement of the six. It almost immediately implied using the template approach, and - in addition - I decided to hold the raw pointer value at each AUTO_REF object. I could have placed that pointer at the shared structure, along with the counter, but that would have caused an extra memory access operation for each pointer dereferencing. Taking the pointer off the shared structure, made dereferencing faster, and reduced that shared structure to (merely) a single LONG

The next major call was to employ a custom memory allocator. Allocating the counter (type: LONG) is the most frequent, time consuming step in the AUTO_REF algorithm. Since all allocations are of the same size it seemed like the perfect situation for using a memory manager customized for allocating buffers of constant size. As before, I was not allowed to use boost, so reusing boost::pool was out of the question. Hence, I supplied a similar allocator of my own.

The other requirements, [GL-3] to [GL-5] pretty much came along with code, with just minimal effort. Making the code exception safe required some more thought, but luckily, I have just read Herb Sutter's excellent book "Exceptional C++". By following the book's suggestions it wasn't too hard to make AUTO_REF comply with this requirement as well. As for the code size - it turned out to be a mere 3,886 bytes.

Implementation

So how does the code inside auto_ref.cpp/.h achieve all that?

The key is is in auto_ref.h. This file defines the AUTO_REF class which does most of the work. When a new AUTO_REF pointer is created it receives a pointer to an existing class/struct (the 'raw' pointer) and stores it in the m_inst_p data member. Then, the constructor allocates a small LONG sized buffer for holding the reference counter. The counter initial value is set to 1 since there is one AUTO_REF object pointing at the object.

When a new AUTO_REF object is initialized with an existing AUTO_REF object (i.e: copy constructed), the new object takes the raw pointer and the counter pointer from the existing one. Additionally, the counter is increased by 1, since the newly created object is actually another reference to the pointed-to object. Note that the code uses InterlockedIncrement to make this operation safe even if two threads create two new AUTO_REF instances simultaneously.

The destructor is similarly simpler: It decreases the counter (again, using InterlockedDecrement), and - if the counter has reached zero - it deletes both the pointed-to object and the reference counter.

Copy assignment: I could have used a code which combines the destructor with the copy constructor. This would lead to something along these lines:

    //Destruction starts here

    if(!InterlockedDecrement(m_ref_p)) {
        delete m_inst_p;
        g_the_auto_ref_pool.free(m_ref_p);
    }

    //Copy construction starts here

    m_ref_p = other.m_ref_p;
    m_inst_p = other.m_inst_p;
    InterlockedIncrement(m_ref_p);

    return *this;

Looks OK? Wrong! If self-assignment is performed, this code will decrease the counter, delete the pointed-to object (assuming we started with a reference count of one), and then increase the counter back to one. Thus, m_inst_p becomes invalid, as it now holds a pointer to a deleted object. This is certainly not a desirable outcome.

The common way to fix this problem, is to enclose this piece of code in an if statement: if(this != &other) {...} return *this; . However, there is another way:

The creative way, requires the use of what Herb Sutter is calling "The Canonical form of strongly exception-safe copy assignment". The book ("Exceptional C++") argues that if you provide a non throwing swap() function, you can always write your copy assignment using the following lines:

    T &operator=(const  T &rhs) {
        T temp(rhs);
        swap(temp);
        return *this;
    }

This is neat, totally exception free and - as a special bonus - you get the swap functionality. So - Herb - this assignemt operator is for you.

My code defines one more class: QUICK_MEM_MANAGER. This class implements the custom allocator functionality to speed up memory manipulation required by AUTO_REF. The algorithm goes like this: QUICK_MEM_MANAGER uses C's standard malloc() to allocate memory in chunks of several Kilobytes. It also keeps a LIFO list of free buffers.

When QUICK_MEM_MANAGER::malloc() is called, it checks to see if the free buffers list is empty. If not, it removes the first buffer from the list and returns that buffer to the caller. Otherwise, it finds a new buffer on the recently allocated chunk (A new chunck is allocated if needed). Since all buffers are of the same size, we can be absolutely sure that the buffer we remove form the free list, will meet the caller's requested size.

When the QUICK_MEM_MANAGER::free() function is called, the specified buffer is placed on the free list, a simple operation which requires O(1) time. The list is implemented as singly linked list where the pointers are stored inside the free buffers themselves, thus, eliminating the need for extra memory.

Restrictions

Circular pointers

I promised to get back to this issue. To illustrate the problem, let us consider the following example: You have defined a class MY_CLASS which has an AUTO_REF<MY_CLASS> pointer as a data member, called next:

struct MY_CLASS {
   AUTO_REF<MY_CLASS> next;
};

void demo_func() {
    AUTO_REF<MY_CLASS> pa(new MY_CLASS);  // object A

    AUTO_REF<MY_CLASS> pb(new MY_CLASS);  // object B

    AUTO_REF<MY_CLASS> pb(new MY_CLASS);  // object C

    pa->next = pb;
    pb->next = pc;
    pc->next = pa;
}

Having executed this code, we created a circle of pointers: A -> B -> C -> A. Now, let's see what is the reference count for each object: A has pa pointing at it, but also pc->next, so the total for A is 2. The same goes for B (pb and pa->next) and C.

When demo_func() returns, pa, pb and pc go out of scope. Hence, the reference counters are decreased, but no object gets deleted since all the counters go from 2 to 1. So, as far as the AUTO_REF mechanism is concerned these objects are still in use. On the other hand, at this point, there's no local/global/static pointer which references either A, B or C. This means that the code has no way to access/delete any of these objects. On the bottom line - we have a leak since these 3 objects will never be released.

You should note that if we didn't make the last assignment pc->next = pa; (i.e: we did not close the chain), then A's reference counter would have become zero when pa goes out of scope. Consequently, A would have been destroyed, a thing that would have decreased B's counter to 0 (since pa->next's destructor has been called by A's destructor). Thus, B would have been destroyed as well, and eventually C (due to the same reason as B).

Solving this problem (reference counters on circular data structures) requires the employment of more sophisticated 'garbage collection' algorithms, at the expense of simplicity and speed. This certainly is not what I had in mind, so I decided to leave this issue open. The developer must be aware of this limit, as it imposes restrictions on the usage of AUTO_REF pointers in certain data structures.

Thread safety

Construction, Copy-Construction and Destruction of an AUTO_REF object are thread safe.
Assignment (operator=) is not thread-safe: If two threads are concurrently assigning a value to an existing AUTO_REF instance the result is unpredictable.
Dereferencing is not synchronized: When two (or more) threads are invoking requests (i.e.: member functions) on an object X thru AUTO_REF pointer/pointers, the requests will be executed with no predictable order, possibly in an interleaving manner. Thus, the design of X's class must support thread-safety, if you want your program to use it in such a way.

Memory management

QUICK_MEM_MANAGER's allocation scheme proves to be fairly quick, but this speed gain comes at the expense of memory consumption: If you allocate a lot of memory (via QUICK_MEM_MANAGER::malloc()) and then release most of it, the released portion will not be returned to the system, only QUICK_MEM_MANAGER::malloc() will be able to use this portion of memory. If this is your application's objects usage profile, you should either consider using other smart pointers, or supply a custom implementation of QUICK_MEM_MANAGER which will behave better under the required circumstances.

Pointers to arrays

AUTO_REF is not aware whether it points to a single struct (or class) or to the first element in an array of structs. Hence, if you create an array by using the new[] operator, you should not use the pointer you received to initialize an AUTO_REF. If you do so, you'll have a memory leak since only the very first element of the array will be destroyed when the reference counter reaches zero. This is not a major limit since you can (a) 'wrap' your dynamically allocated array inside a class, or (b) just use std::vector, which is probably the simplest, safest solution

Multiple counters

As illustrated in code sample f - "what not to do" (under the interface section of this article), you cannot initialize an AUTO_REF pointer with a raw pointer to an object which is already pointed to by another AUTO_REF object. This will lead to an undefined behavior. The rule is: If you create an AUTO_REF pointer to an object, make sure all other AUTO_REFs pointing at this object are assigned by either a copy constructor or a copy assignment operator.

Benchmarks

The main motivation for developing AUTO_REF, was to create a smart pointer class which offers minimal speed overhead over raw 'C' pointers. Does AUTO_REF's implementation delivers what it was expected to do ?

To answer this intriguing question I wrote the test application. The program creates N (set to 320,000) instances of a class, and store the pointers to these objects in an STL vector. The code then goes into a loop, which dereferences the (vector stored) pointers, and invokes a member function on the pointed objects. This loop is repeated N*60 times.

The type of the pointer used by the code is determined by the PTR_SWITCH preprocessor symbol, so the application can be configured to use any of the following pointer types: AUTO_REF pointers (AUTO), raw 'C' pointers (RAW), RefCountPtr (RCP), idllib::smart_ptr (IDLLIB), and boost::shared_ptr pointers (BOOST).

I measured average run time for each configuration when running on a 700MHz Win2000 machine. The code was compiled with MSVC6 using STLport 4.5. I have taken the result of the RAW configuration as 100%, and calculated speed in percentage, relative to this result. When I modified the number of times the main loop is performed (currently set to N*60), The difference between RAW and the other pointers had changed, but the overall ranking remained the same. You should bear in mind that although my application tries to simulate a typical, 'real life' usage of pointers, the results are true for this specific test only.

Configuration        Run time (seconds)        %
==================================================
RAW                       10.3                100%
AUTO_REF                  11.7                113%
BOOST                     12.7                123%
RCP                       15.2                147%
IDLLIB                    16.6                161%

The table shows that AUTO_REF pointers performed very well: AUTO_REF yielded performance reduction of only 13%, which is 10% better than its nearest competitor (BOOST). The other two configurations reduced performance by more than 45%.

So, when should you use AUTO_REF? The answer is simple: Whenever you do not want to explicitly delete your pointers, but you still want to keep your program as fast as possible.

References

Herb Sutter, Exceptional C++
Greg Colvin and Beman Dawes, Smart pointer library
Steve Cleary, Boost Pool Library
Herb Sutter, Effective memory management
Stefan Chekanov, A Smart Pointer Capable of Object Level Thread Synchronization and Reference Counting Garbage Collection - A CodeProject article.
Sandu Turcan, A Simple Smart Pointer - A CodeProject article.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here