Designing Efficient Thread-Safe Reference Counting System

EgorYusov

5.00/5 (16 votes)

29 Jun 2017CPOL20 min read

27.8K

205

This article describes an efficient thread-safe reference counting system.

This article presents an implementation of an efficient reference counting system which supports user-provided allocators, weak references, and is thread-safe.

Download source code - 30.1 KB

Disclaimer: This article uses material published on this page.

Background

This article uses some advanced C++ concepts such as variadic templates, perfect forwarding, inplace new.

Introduction

Reference counting is a well-known method that automatically handles lifetimes of objects with complex dependencies. C++ standard library provides std::shared_ptr class that is a smart pointer retaining shared ownership of a dynamically allocated object. The class maintains internal reference counters, but does not allow accessing them directly. The counters can only be changed by creating new smart pointers, which limits their application to C++ language. There are, however, certain situations when custom reference counting system is needed. Diligent Engine is a cross-platform rendering engine. It is packed into dynamic libraries, which can be used in any language or system, and thus needs to provide simple and efficient method for managing object lifetimes. That system could be implemented with C++ smart pointers, however that would require introducing additional level of abstraction: instead of exposing native interfaces, the engine would have to expose smart pointers, which is undesirable. Besides that, implementing own resource counting system gives more flexibility and provides better possibilities for optimizations. Diligent Engine for performance reasons implements its own atomic operations. On Win32 platform, both Diligent Engine and standard C++ atomics use InterlockedIncrement() Windows function, but because DE introduces less overhead, its implementation is more than two times faster than one provided by C++ standard library.

Basic Reference Counting System

A very basic resource counting system can be implemented using the base class that may look like this:

C++

template<typename Base>
class RefCountedObject : public Base
{
public:
    void AddRef()
    {
        ++m_NumReferences;
    }
    void Release()
    {
        if(--m_NumReferences == 0)
        {
            delete this;
        }
    }
private:
    virtual ~RefCountedObject() = 0;
    int m_NumReferences = 0;
};

The class defines virtual destructor that allows deleting any instance of derived class through the base class pointer. The class counts the number of outstanding references and destroys itself when the last reference is released. That kind of implementation provides very limited functionality and only works in a single-threaded scenario. Real resource counting system should support many more features:

Allow different types of memory allocation and deallocation methods
Provide weak pointers that implement non-owning references
Object ownership flexibility (an object may be part of another reference counted object)
Provide exception and multi-threaded safety
Efficiency

Designing a system that meets all the requirements above is an interesting and challenging task. This article presents a solution implemented in Diligent Engine.

Overview

Diligent Engine implements thread-safe reference counting system that supports weak pointers. It includes the following classes and interfaces:

IObject is the base interface to the reference counted object
IReferenceCounters is the interface to the helper object that implements reference counting and controls the lifetime of an object

RefCountedObject is the template base class for all reference counted objects
RefCountersImpl is the template class that implements actual reference counting functionality defined by the IReferenceCounters interface
MakeNewRCObj is the template class that is responsible for creating object + reference counters pair
RefCntAutoPtr is the template class that implements strong pointer
RefCntWeakPtr is the template class that implements weak pointer

The following diagram illustrates different components of the system:

IReferenceCounters Interface

The declaration of IReferenceCounters interface is given in the listing below:

C++

class IReferenceCounters
{
public:
    virtual Atomics::Long AddStrongRef() = 0;
    virtual Atomics::Long ReleaseStrongRef() = 0;
    virtual Atomics::Long AddWeakRef() = 0;
    virtual Atomics::Long ReleaseWeakRef() = 0;
    virtual void GetObject(class IObject **ppObject) = 0;
    virtual Atomics::Long GetNumStrongRefs()const = 0;
    virtual Atomics::Long GetNumWeakRefs()const = 0;
};

The interface defines the following functions:

AddStrongRef() - Increments the number of strong references by 1. The method returns the number of strong references after incrementing the counter. It is thread-safe and does not require explicit synchronization.
ReleaseStrongRef() - Decrements the number of strong references by 1 and destroys the referenced object when the counter reaches zero. If there are no more weak references, destroys the reference counters object itself. The method is thread-safe and does not require explicit synchronization.
AddWeakRef() - Increments the number of weak references by 1 and returns the number of weak references after incrementing the counter. The method is thread-safe and does not require explicit synchronization.
ReleaseWeakRef() - Decrements the number of weak references by 1. If there are no more strong and weak references, destroys the reference counters object itself.
GetObject() - Gets the pointer to the IUnknown interface of the referenced object. If the object was destroyed, nullptr will be returned. If the object is alive, the pointer to the object's IUnknown interface will be returned. In this case, the number of strong references to the object will be incremented by 1.
GetNumStrongRefs() - Returns the number of outstanding strong references. This method is intended for debug purposes only. In a multithreaded environment, the returned number may not be reliable as other threads may simultaneously change the actual value of the counter. The only reliable value is 0 as the object is destroyed when the last strong reference is released.
GetNumWeakRefs() - Returns the number of outstanding weak references. This method is intended for debug purposes only. In a multithreaded environment, the returned number may not be reliable as other threads may simultaneously change the actual value of the counter.

IObject Interface

The declaration of IObject interface is given in the listing below:

C++

class IObject
{
public:
    virtual void QueryInterface
            ( const Diligent::INTERFACE_ID &IID, IObject **ppInterface ) = 0;
    virtual Atomics::Long AddRef() = 0;
    virtual Atomics::Long Release() = 0;
    virtual IReferenceCounters* GetReferenceCounters()const = 0;
};

The interface defines the following functions:

QueryInterface() - Queries the specific interface. The method increments the number of strong references by 1. The interface must be released by a call to Release() method when it is no longer needed.
AddRef() - Increments the number of strong references by 1. This method is equivalent to GetReferenceCounters()->AddStrongRef(). It is thread-safe and does not require explicit synchronization.
Release() - Decrements the number of strong references by 1 and destroys the object when the counter reaches zero. This method is equivalent to GetReferenceCounters()->ReleaseStrongRef(). It is thread-safe and does not require explicit synchronization.
GetReferenceCounters() - Returns the pointer to IReferenceCounters interface of the associated reference counters object. The method does not increment the number of strong references to the returned object.

RefCountedObject Class

RefCountedObject is the base template class that implements common functionality of all reference counted objects. Its source code is given in the listing below:

C++

/// Base class for all reference counting objects
template<typename Base>
class RefCountedObject : public Base
{
public:
    RefCountedObject(IReferenceCounters *pRefCounters)noexcept :
        m_pRefCounters( ValidatedCast<RefCountersImpl>(pRefCounters) )
    {
    }

    // Virtual destructor makes sure all derived classes can be destroyed
    // through the pointer to the base class
    virtual ~RefCountedObject()
    {
    }

    inline virtual IReferenceCounters* GetReferenceCounters()const override final
    {
        return m_pRefCounters;
    }

    inline virtual Atomics::Long AddRef()override final
    {
        // Since type of m_pRefCounters is RefCountersImpl,
        // this call will not be virtual and should be inlined
        return m_pRefCounters->AddStrongRef();
    }

    inline virtual Atomics::Long Release()override final
    {
        // Since type of m_pRefCounters is RefCountersImpl,
        // this call will not be virtual and should be inlined
        return m_pRefCounters->ReleaseStrongRef();
    }

private:
    template<typename AllocatorType, typename ObjectType>
    friend class MakeNewRCObj;

    friend class RefCountersImpl;
   
    // Operator new is private, and can only be called by MakeNewRCObj
    void* operator new(size_t Size)
    {
        return new Uint8[Size];
    }

    // This operator delete can only be called from MakeNewRCObj 
    // if an exception is thrown,
    // or from RefCountersImpl when object is destroyed
    void operator delete(void *ptr)
    {
        delete[] reinterpret_cast<Uint8*>(ptr);
    }
           

    // Operator new is private, so that an object 
    // can only be constructed by MakeNewRCObj
    template<typename ObjectAllocatorType>
    void* operator new(size_t Size, ObjectAllocatorType &Allocator, 
                       const Char* dbgDescription,
                       const char* dbgFileName, const  Int32 dbgLineNumber)
    {
        return Allocator.Allocate(Size, dbgDescription, dbgFileName, dbgLineNumber);
    }

    // This operator is only called when an exception is thrown 
    // from an object constructor
    template<typename ObjectAllocatorType>
    void operator delete(void *ptr, ObjectAllocatorType &Allocator, 
                         const Char* dbgDescription,
                         const char* dbgFileName, const  Int32 dbgLineNumber)
    {
        return Allocator.Free(ptr);
    }

    // Note that the type of the reference counters is RefCountersImpl,
    // not IReferenceCounters. This avoids virtual calls from
    // AddRef() and Release()
    RefCountersImpl *const m_pRefCounters;
};

Few comments regarding the implementation:

AddRef(), Release(), and GetReferenceCounters() methods are declared as final and inline. The first keyword tells the compiler that the methods will never be overridden in derived classes, so that the compiler can avoid virtual calls. inline is another hint for the compiler to inline the methods.
Calls to AddRef() and Release() are delegated to the reference counters object, pointer to which is kept in m_pRefCounters member. Since the type of this pointer is RefCountersImpl* rather than IReferenceCounters*, a good compiler will be able to avoid virtual calls and potentially inline them.
Valid pointer to the reference counters object must be provided to the constructor. Note that m_pRefCounters does not necessarily have to track the object itself. It may be another object that owns this one, directly or indirectly.
The class defines two versions of operator new and two operators delete. These operators are private and are only accessible by MakeNewRCObj class that is responsible for constructing new objects. The first pair uses standard C++ operator new to allocate required amount of memory. The second pair uses custom allocator. These versions take reference to the allocator as an argument. They also use debug information such as allocation description, file name and line number where allocation was performed. Also note that either operator delete is only called if an exception is thrown during the object construction.

RefCountersImpl Class

RefCountersImpl class is responsible for the bulk of the actual reference counting functionality and implements IReferenceCounters interface.

Destroying Controlled Object

The most important operation that RefCountersImpl class must support is destroying the object it controls. This operation needs to be flexible enough to support different types of objects as well as different allocators. One way to achieve this is to make the class a template class parameterized by the type of the controlled object as well as by the allocator type:

C++

template<typename ControlledObjectType, typename ObjectAllocatorType>
class RefCountersImpl : public IReferenceCounters

Such template class can hold pointers to the object and to the allocator:

C++

ControlledObjectType* m_pObject;
ObjectAllocatorType * const m_pObjectAllocator;

so that the object can be destroyed as follows:

C++

auto *pObj = m_pObject;
auto *pAllocator = m_pObjectAllocator;
// ...
if (pAllocator)
{
    pObj->~ControlledObjectType();
    pAllocator->Free(pObj);
}
else
{
    delete pObj;
}

As long as ControlledObjectType is a type derived from RefCountedObject, it will have a virtual destructor and will be successfully destroyed by pObj->~ControlledObjectType(). This approach also handles any allocator and is thus a legitimate method to handle different types of objects and different types of allocators. In fact, this is how it was originally implemented in Diligent Engine. It however has one drawback: the type of RefCountedObject::m_pRefCounters member needs to be IReferenceCounter. The reason is that when an object is part of another reference counting object (for example, default view of a texture is part of the texture object), it keeps pointer to the reference counters of the parent object. Since the types of the two objects are unrelated, the only alternative left is using IReferenceCounter class which is common ancestor for all reference counters. The biggest disadvantage of this method is that AddRef() and Release() methods of RefCountedObject class perform virtual calls to AddStrongRef() and ReleaseStrongRef() of a specific RefCountersImpl instantiation, which is suboptimal. To the contrary, in the implementation of RefCountedObject class shown above, the type of RefCountedObject::m_pRefCounters member is RefCountersImpl* (non template). All methods of IReferenceCounter interface implemented by RefCountersImpl class are labeled as final, and inline, so a good compiler will not only be able to eliminate virtual call, but to eliminate the function call altogether by inlining the methods.

The question now is: if RefCountersImpl class does not depend on the type of the object and allocator, how can we provide the required flexibility? The answer is using object wrapper template class that knows how to destroy the object. First of all, we will define non-template abstract base class as shown in the listing below:

C++

class ObjectWrapperBase
{
public:
    virtual void DestroyObject() = 0;
    virtual void QueryInterface
    ( const Diligent::INTERFACE_ID &iid, IObject **ppInterface )=0;
};

This class only provides two pure virtual methods: one to destroy the object, and the second to query the specified interface (which we will use to implement GetObject() method later). So as long as RefCountersImpl class keeps pointer to the object wrapper of type ObjectWrapperBase, it can destroy controlled object as easy as pWrapper->DestroyObject(). Specific details about destroying particular object with particular allocator are defined by the inherited template class given in the listing below:

C++

template<typename ObjectType, typename AllocatorType>
class ObjectWrapper : public ObjectWrapperBase
{
public:
    ObjectWrapper(ObjectType *pObject, AllocatorType *pAllocator) :
        m_pObject(pObject),
        m_pAllocator(pAllocator)
    {}
    virtual void DestroyObject()
    {
        if (m_pAllocator)
        {
            m_pObject->~ObjectType();
            m_pAllocator->Free(m_pObject);
        }
        else
        {
            delete m_pObject;
        }
    }
    virtual void QueryInterface
    ( const Diligent::INTERFACE_ID &iid, IObject **ppInterface )override final
    {
        return m_pObject->QueryInterface(iid, ppInterface);
    }
private:
    // It is crucially important that the type of the pointer
    // is ObjectType and not IObject, since the latter
    // does not have virtual dtor.
    ObjectType* const m_pObject;
    AllocatorType * const m_pAllocator;
};

You can see that all the object destruction logic moved to DestroyObject() virtual method. Since the class is parameterized by the type of object and allocator (the same parameters that RefCountersImpl class had originally), it can handle all object types and all allocator types. The only question we need to answer is how to initialize specific instance of ObjectWrapper class. For that, RefCountersImpl class provides template method Attach() that has the same template parameters and initializes specific instance of ObjectWrapper class. However, instead of creating the class on the heap, the method initializes it in place in the raw memory buffer provided by RefCountersImpl class:

C++

static const size_t ObjectWrapperBufferSize = 
    sizeof(ObjectWrapper<IObject, IMemoryAllocator>) / sizeof(size_t);
size_t m_ObjectWrapperBuffer[ObjectWrapperBufferSize];

template<typename ObjectType, typename AllocatorType>
void Attach(ObjectType *pObject, AllocatorType *pAllocator)
{
    new(m_ObjectWrapperBuffer) ObjectWrapper<ObjectType, AllocatorType>
       (pObject, pAllocator);
    m_ObjectState = ObjectState::Alive;
}

Typical size of m_ObjectWrapperBuffer buffer will be the size of three pointers (vtbl pointer plus m_pObject and m_pAlloctor members). In-place new operation

C++

new(m_ObjectWrapperBuffer) ObjectWrapper<ObjectType, AllocatorType>(pObject, pAllocator);

initializes raw memory with an instance of ObjectWrapper<ObjectType, AllocatorType> class, whose m_pObject and m_pAlloctor members will store pointers provided to the constructor from the calling function. Also, the compiler will generate specific DestroyObject() method instantiation for the ObjectType and AllocatorType types, which will be accessible though the virtual table of the ObjectWrapper<ObjectType, AllocatorType> class, pointer to which will in turn be stored in the first element of m_ObjectWrapperBuffer buffer. So putting everything together, RefCountersImpl::ReleaseStrongRef() will execute the following code to destroy the controlled object:

C++

// Copy the object wrapper and release the object after unlocking the
// reference counters
size_t ObjectWrapperBufferCopy[ObjectWrapperBufferSize];
for(size_t i=0; i < ObjectWrapperBufferSize; ++i)
    ObjectWrapperBufferCopy[i] = m_ObjectWrapperBuffer[i];
auto *pWrapper = reinterpret_cast<ObjectWrapperBase*>(ObjectWrapperBufferCopy);
//...
pWrapper->DestroyObject();

Class Members and Methods

Now when we described how RefCountersImpl class manages various types of objects allocated with different allocators, we can describe the implementation of class methods. But first, let's take a look at other members that the class defines:

C++

Atomics::AtomicLong m_lNumStrongReferences;
Atomics::AtomicLong m_lNumWeakReferences;
ThreadingTools::LockFlag m_LockFlag;
enum class ObjectState : Int32
{
    NotInitialized,
    Alive,
    Destroyed
}m_ObjectState = ObjectState::NotInitialized;

m_lNumStrongReferences is the strong reference counter
m_lNumWeakReferences is the weak reference counter
m_LockFlag is the lock flag used to gain exclusive access to the object members
m_ObjectState is the state of the object (not initialized, alive or destroyed)

Let's now look at the implementation of the class methods. AddStrongRef(), AddWeakRef(), GetNumStrongRefs(), and GetNumWeakRefs() are straightforward:

C++

inline virtual Atomics::Long AddStrongRef()override final
{
    return Atomics::AtomicIncrement(m_lNumStrongReferences);
}

inline virtual Atomics::Long AddWeakRef()override final
{
    return Atomics::AtomicIncrement(m_lNumWeakReferences);
}

inline virtual Atomics::Long GetNumStrongRefs()const override final
{
    return m_lNumStrongReferences;
}

inline virtual Atomics::Long GetNumWeakRefs()const override final
{
    return m_lNumWeakReferences;
}

Note that inline and final keywords are very strong hints to a good compiler that these methods need to be inlined when they are called using the pointer of RefCountersImpl* type.

The real work is done by ReleaseStrongRef(), ReleaseWeakRef() and GetObject() methods. At first glance, the implementation may seem to be straightforward, but in a multithreaded environment, there may be a lot of tricky situations. Consider few examples:

There is one strong reference and one weak reference remaining. One thread releases the last strong reference, while the second one releases weak reference. The first thread destroys controlled object, but which one should destroy the reference counter itself? If the threads simply check that both counters have reached 0, then the object will be released twice, which will cause undefined behavior.
There is one strong reference and one weak reference remaining. One thread releases the last strong reference, while the second one attempts to get the object by calling GetObject(). Who wins in this situation? How to avoid returning reference to the destroyed object?

These were just few examples, but there exists many more complex cases. We will now give implementations of all three functions for reference, and then give detailed explanation for every function:

C++

inline virtual Atomics::Long ReleaseStrongRef()override final
{
    // Decrement strong reference counter without acquiring the lock.
    auto RefCount = Atomics::AtomicDecrement(m_lNumStrongReferences);
    if( RefCount == 0 )
    {
        // Acquire the lock.
        ThreadingTools::LockHelper Lock(m_LockFlag);

        // Copy the object wrapper and release the object after unlocking the
        // reference counters
        size_t ObjectWrapperBufferCopy[ObjectWrapperBufferSize];
        for(size_t i=0; i < ObjectWrapperBufferSize; ++i)
            ObjectWrapperBufferCopy[i] = m_ObjectWrapperBuffer[i];
        auto *pWrapper = reinterpret_cast<ObjectWrapperBase*>(ObjectWrapperBufferCopy);

        // Mark object as destroyed
        m_ObjectState = ObjectState::Destroyed;

        bool bDestroyThis = m_lNumWeakReferences == 0;
       
        // Unlock the object now to avoid deadlocks
        Lock.Unlock();

        // Destroy referenced object
        pWrapper->DestroyObject();

        if( bDestroyThis )
            SelfDestroy();
    }

    return RefCount;
}

inline virtual Atomics::Long ReleaseWeakRef()override final
{
    // The method must be serialized!
    ThreadingTools::LockHelper Lock(m_LockFlag);
    // It is essentially important to check the number of weak references
    // while holding the lock. Otherwise reference counters object
    // may be destroyed twice if ReleaseStrongRef() is executed by other
    // thread.
    auto NumWeakReferences = Atomics::AtomicDecrement(m_lNumWeakReferences);

    if( NumWeakReferences == 0 && m_ObjectState == ObjectState::Destroyed )
    {
        Lock.Unlock();
        SelfDestroy();
    }
    return NumWeakReferences;
}

inline virtual void GetObject( class IObject **ppObject )override final
{
    if( m_ObjectState != ObjectState::Alive)
        return; // Early exit

    ThreadingTools::LockHelper Lock(m_LockFlag);

    auto StrongRefCnt = Atomics::AtomicIncrement(m_lNumStrongReferences);
    if( m_ObjectState == ObjectState::Alive && StrongRefCnt > 1 )
    {
        auto *pWrapper = reinterpret_cast<ObjectWrapperBase*>(m_ObjectWrapperBuffer);
        pWrapper->QueryInterface(Diligent::IID_Unknown, ppObject);
    }
    Atomics::AtomicDecrement(m_lNumStrongReferences);
}

Let's now take a look at how these functions work. Let's start with ReleaseStrongRef(). The function first atomically decrements the strong reference counter without acquiring the lock. This is important as locking is an expensive operation and you want to pay this cost only when it is absolutely necessary. If the decremented value is 0, which means that the function released the last reference, object should be destroyed. Note that Atomics::AtomicDecrement() decrements the counter atomically. This means that if several threads reach the instruction at the same time, access to the counter will be serialized. The function returns decremented value, and only one thread will read 0. Note, that it is crucially important to use the value returned by the function, because if we used m_lNumStrongReferences in comparison, an object could be destroyed multiple times as more than one thread could read 0. The thread that reads 0, starts destroying the object:

C++

auto RefCount = Atomics::AtomicDecrement(m_lNumStrongReferences);
if( RefCount == 0 )
{
    // Start destroying the object...

Interference between ReleaseStrongRef() and GetObject() methods

This is the first situation where we need to carefully design interference with other functions. What happens if another thread has a weak reference and starts getting strong reference to an object through GetObject()? Without special care, the method can potentially return reference to an object that will very soon be destroyed by ReleaseStrongRef() from another thread. After ReleaseStrongRef() starts destroying the object, it acquires exclusive access to the class using the lock. GetObject() also acquires the lock, so only one method in one of the threads can now be running. Since ReleaseStrongRef() cannot be stopped and the object will be destroyed anyway, it is the responsibility of GetObject() to detect this situation and avoid returning reference to the soon-to-be destroyed object. After obtaining the lock, GetObject() first increments the number of strong references (we will discuss later why it is important to increment the counter while having exclusive access to the counters) and examines the returned value. There are two cases now:

Returned value is 1. This means that there are no more alive strong reference to the object, and the object is either already destroyed or will be destroyed very soon. In this case, we should not return reference to the object.
Returned value is greater than 1. This means that there exists at least one another strong reference to the object. Since we already incremented the counter, no ReleaseStrongRef() in other threads will be able to decrement it to 0 (assuming the calls to AddRef() and Release() are correctly balanced), so it is safe to return the reference to the object.

Note that QueryInterface() also increments the strong reference counter, but before calling the method we need to be sure that the object is alive.

The following two figures illustrates two possible scenarios that show what happens if one thread is releasing the last strong reference to the object, while the second one is attempting to get strong reference to the object through a weak reference using GetObject(). In the first scenario, ReleaseStrongRef() decrements strong reference counter first, and the object will be released:

In the second scenario, GetObject() increments strong reference counter first, and the object is kept alive:

Note that it is crucial that GetObject() increments the strong reference counter only after acquiring the lock. Consider what may have happened if GetObject() did not acquire the lock and there were several threads running GetObject() while another thread was releasing the last reference to the object:

As you can see, what happens in the scenario above is that two threads running GetObject() increment strong reference counter. As a result, the thread that increments the counter second sees two strong reference and starts returning the object. The object, however will have been destroyed by that time or will be destroyed soon after by the thread running ReleaseStrongRef(). This scenario is not possible if GetObject() acquires the lock:

The crucial difference is that if the lock is acquired, the second thread (thread 3) running GetObject() will only be able to increment reference counter after the first thread running GetObject() (thread 2) has decremented it. As a result, thread 3 will also see only one strong reference and will not return the object. In the alternative scenario where one of the threads running GetObject() increments the counter before ReleaseStrongRef() decrements it, the object will be kept alive and both threads 2 and 3 will obtain valid object reference.

There may be two questions that a careful reader may ask at this point. First, if GetObject() acquires the lock, why do we use atomic operations to increment and decrement the counter? This is because other methods (AddStrongRef() and ReleaseStrongRef()) access the counter without acquiring the lock. The second question is: if AddStrongRef() increments reference counter without acquiring the lock, is it possible that the same faulty scenario happens when one thread runs ReleaseStrongRef(), second runs GetObject() while the third one runs AddStrongRef()? The answer is no because since the third thread runs AddStrongRef(), there exist at least one another strong reference to the object. So the first thread running ReleaseStrongRef() is not releasing the last reference.

Interference between ReleaseStrongRef() and ReleaseWeakRef() Methods

Without special care, there may be a problem if two threads are simultaneously releasing the last strong and weak reference through ReleaseStrongRef() and ReleaseWeakRef() methods. If both methods see there are no more strong and weak references, two steps are taken to assure that only one method destroys the reference counting object itself. First, ReleaseStrongRef() checks the number of strong references while still holding the lock and sets the flag indicating if reference counting object needs to be released:

C++

bool bDestroyThis = m_lNumWeakReferences == 0;

Second, ReleaseWeakRef() decrements weak reference counter only after acquiring the lock. If two threads are running ReleaseStrongRef() and ReleaseWeakRef(), there are two scenarios possible depending on which method first acquires the lock. In the first scenario, ReleaseStrongRef() acquires the lock first. Since number of weak references it will see will not be equal zero, the method will not destroy the reference counting object:

In the second scenario, ReleaseWeakRef() acquires the lock first. However, ReleaseStrongRef() must first destroy the controlled object, so ReleaseWeakRef() must not self destroy reference counting object. To detect this situation, the methods use m_ObjectState flag. The state is atomically set to ObjectState::Destroyed by ReleaseStrongRef() method. If ReleaseWeakRef() sees that the state is not ObjectState::Destroyed, it means there are alive strong references or ReleaseStrongRef() method is not completed as in the scenario below:

Note that it is crucial that m_ObjectState is accessed when lock is acquired and that weak reference counter is only decremented while lock is acquired as well. For example, consider the following scenario where weak reference counter is decremented without acquiring the lock, which leads to self-destroying reference counters object twice:

Acquiring the lock has two safety effects:

If ReleaseStrongRef() sets bDestroyThis flag to true, this means there are no other threads that may run code related to weak references, because ReleaseWeakRef() decrements weak reference counter after acquiring the lock. So reference counters object can be safely destroyed
If ReleaseWeakRef() sees that m_ObjectState is set to ObjectState::Destroyed, then all strong reference-related code must be completed by this time because the object state is updated while keeping the lock. In this case, the reference counters can be safely destroyed as well

Interference between ReleaseWeakRef() and GetObject() methods

There is in fact no interference possible between these two methods if they are used correctly. If GetObject() is called in one thread, while another thread is running ReleaseWeakRef(), this means there are at least two weak references. Therefore, ReleaseWeakRef() will only decrement the counter, but will not destroy the reference counters object as there will be at least one another outstanding weak reference.

MakeNewRCObj Class

MakeNewRCObj is a class that is responsible for creating object + reference counters pair. The class is a template class parameterized by the type of the object and type of the memory allocator:

C++

template<typename ObjectType, typename AllocatorType = IMemoryAllocator>
class MakeNewRCObj

The class defines the following private members:

C++

AllocatorType* const m_pAllocator;
IObject* const m_pOwner;
const Char* const m_dbgDescription;
const char* const m_dbgFileName;
const Int32 m_dbgLineNumber;

where:

m_pAllocator is the pointer to the allocator that will be used to allocate memory for the object. If nullptr, default system allocator is used.
m_pOwner is the pointer to the object that owns the one to be created. If nullptr, the object has no owner.
m_dbgDescription, m_dbgFileName, and m_dbgLineNumber are debug members provided to the allocator to describe the allocation for debug purposes.

The class provides two constructors to initialize its members:

C++

MakeNewRCObj(AllocatorType &Allocator,
             const Char* dbgDescription,
             const char* dbgFileName,
             const Int32 dbgLineNumber,
             IObject* pOwner = nullptr)noexcept :
    m_pAllocator(&Allocator),
    m_pOwner(pOwner),
    m_dbgDescription(dbgDescription),
    m_dbgFileName(dbgFileName),
    m_dbgLineNumber(dbgLineNumber)
{
}

MakeNewRCObj(IObject* pOwner = nullptr)noexcept :
    m_pAllocator(nullptr),
    m_pOwner(pOwner),
    m_dbgDescription(nullptr),
    m_dbgFileName(nullptr),
    m_dbgLineNumber(0)
{}

The class defines template operator () that performs allocation of the object:

C++

template<typename ... CtorArgTypes>
ObjectType* operator() (CtorArgTypes&& ... CtorArgs)
{
    RefCountersImpl *pNewRefCounters = nullptr;
    IReferenceCounters *pRefCounters = nullptr;
    if(m_pOwner != nullptr)
        pRefCounters = m_pOwner->GetReferenceCounters();
    else
    {
        // Constructor of RefCountersImpl class is private and only accessible
        // by methods of MakeNewRCObj
        pNewRefCounters = new RefCountersImpl();
        pRefCounters = pNewRefCounters;
    }
    ObjectType *pObj = nullptr;
    try
    {
        // Operators new and delete of RefCountedObject are private and only accessible
        // by methods of MakeNewRCObj
        if(m_pAllocator)
            pObj = new(*m_pAllocator, m_dbgDescription, m_dbgFileName, m_dbgLineNumber)
                   ObjectType(pRefCounters, std::forward<CtorArgTypes>(CtorArgs)... );
        else
            pObj = new ObjectType
                  ( pRefCounters, std::forward<CtorArgTypes>(CtorArgs)... );
        if(pNewRefCounters != nullptr)
            pNewRefCounters->Attach<ObjectType, AllocatorType>(pObj, m_pAllocator);
    }
    catch (...)
    {
        if(pNewRefCounters != nullptr)
            pNewRefCounters->SelfDestroy();
        throw;
    }
    return pObj;
}

There are few interesting things about the operator. First, it is a variadic template function. It can take any number of arguments of any types. The function passes all its arguments to the object constructor using perfect forwarding mechanism:

C++

ObjectType(pRefCounters, std::forward<CtorArgTypes>(CtorArgs)... )

Perfect forwarding assures that every parameter is passed as either lvalue, or rvalue, depending on its original type. Second, the method uses in-place new to create object using custom allocator in case it is provided:

C++

pObj = new(*m_pAllocator, m_dbgDescription, m_dbgFileName, m_dbgLineNumber)
       ObjectType(pRefCounters, std::forward<CtorArgTypes>(CtorArgs)... );

If allocator is not provided, default allocator is used:

C++

pObj = new ObjectType( pRefCounters, std::forward<CtorArgTypes>(CtorArgs)... );

Recall that RefCountedObject defines two private versions of operator new, and that MakeNewRCObj is a friend class of RefCountersImpl. MakeNewRCObj is thus the only class that has access to these operators, and is the only place where instances of classes derived from RefCountersImpl can be created on the heap.

After the object is created, the method attaches this object to the reference counters using template Attach() method that we discussed earlier.

If an exception is thrown by the object constructor, the method catches it, destroys reference counters object, and re-throws the exception. Note that in our original implementation, reference counters object was created by the object constructor. This resulted in a memory leak if constructor threw an exception, because destructor would never be called and memory never freed.

There is one tricky situation related to correct exception handling. Consider a scenario, where object A owns object B that keeps weak reference WP to object A. If constructor of A throws an exception, then destructor of weak reference may try to destroy the reference counting object. As we know, this operation is performed by the MakeNewRCObj class, and we must avoid destroying the same object twice. What helps here is that ReleaseWeakRef() checks if the state of the object is ObjectState::Destroyed. Since the object has never been constructed, its state will be ObjectState::NotInitialized, and ReleaseWeakRef() will not destroy the reference counting object.

The following macros is defined to facilitate usage of MakeNewRCObj class:

C++

#define NEW_RC_OBJ(Allocator, Desc, Type, ...)\
    MakeNewRCObj<Type, typename std::remove_reference<decltype(Allocator)>::type>\
    (Allocator, Desc, __FILE__, __LINE__, ##__VA_ARGS__)

With the help of this macros, typical allocation looks like this:

C++

BufferD3D11Impl *pBufferD3D11 = 
    NEW_RC_OBJ(m_BufObjAllocator, "BufferD3D11Impl instance", BufferD3D11Impl)
              (m_BuffViewObjAllocator, this, BuffDesc, BuffData );

Smart Pointer Classes

The reference counting system provides two smart pointer classes, RefCntAutoPtr and RefCntWeakPtr that implement strong reference and weak reference functionality. The class implementation is relatively straightforward as they are thin wrappers over RefCountersImpl and RefCountedObject classes.

Conclusion

This article presented an implementation of an efficient reference counting system. The system supports user-provided allocators, weak references, and is thread-safe. The source code for the classes described in the article can be found in the attached archive. It is also available on GitHub.

History

17^th June, 2017: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)