Registered Pointers - High-Performance C++ Smart Pointers That Can Target The Stack

Noah L

4.67/5 (6 votes)

20 Mar 2016MIT6 min read

22.4K

149

An introduction to new smart pointers meant to be safe replacements for raw pointers and (raw) references.

Download source - 22.4 KB

Quick summary

mse::TRegisteredPointer is a smart pointer that behaves just like a raw pointer, except that its value is automatically set to null_ptr when the target object is destroyed. It can be used as a general replacement for raw pointers in most situations. Like a raw pointer, it does not have any intrinsic thread safety. But in exchange it has no problem targeting objects allocated on the stack (and obtaining the corresponding performance benefit). With default run-time checks enabled, this pointer is safe from accessing invalid memory.

mse::TRegisteredFixedPointer is a derivative of mse::TRegisteredPointer that is a functional equivalent of a C++ reference. That is, it may only be constructed to point at an existing object and cannot be retargeted after construction. While these properties may make it unlikely that a C++ reference will end up being used to access invalid memory, it is of course, not impossible. mse::TRegisteredFixedPointer on the other hand, inherits mse::TRegisteredPointer's safety with respect to invalid memory access.

Who should use registered pointers?

Registered pointers are appropriate for use by two groups of C++ developers - those for whom safety and security are critical, and also everybody else.
Registered pointers can help eliminate many of the opportunities for inadvertently accessing invalid memory.
While using registered pointers can incur a modest performance cost, because the registered pointers have the same behavior as raw pointers when pointing to valid objects, they can be "disabled" (automatically replaced with the corresponding raw pointer) with a compile-time directive, allowing them to be used to help catch bugs in debug/test/beta modes while incurring no overhead cost in release mode. So there is really no excuse for not using them.

Usage

Using registered pointers is easy. Just copy two files, mseprimitives.h and mseregistered.h, into your project (or "include" directory). There are no other dependencies. Registered pointer usage is very similar to raw pointer usage and they can generally be used as a "drop-in" substitute. Note that the target object does have to be declared as a "registered object". Because the registered object type is publicly derived from the original object's type, it remains compatible with it.

C++

#include "mseregistered.h"
...

    class A {
    public:
        int b = 3;
    };

    A a;
    mse::TRegisteredObj<A> registered_a;

    A* A_native_ptr1 = &a;
    mse::TRegisteredPointer<A> A_registered_ptr1 = &registered_a;

    A* A_native_ptr2 = new A();
    mse::TRegisteredPointer<A> A_registered_ptr2 = mse::registered_new<A>();

    delete A_native_ptr2;
    mse::registered_delete<A>(A_registered_ptr2);

If you prefer to do less typing, shorter aliases are available:

C++

#include "mseregistered.h"
using namespace mse;
...

    class A {
    public:
        int b = 3;
    };

    ro<A> registered_a;
    rp<A> A_registered_ptr1 = &registered_a;
    rp<A> A_registered_ptr2 = rnew<A>();
    rdelete<A>(A_registered_ptr2);

The example project included with this article contains a comprehensive set of examples of registered pointers in action.

Discussion

These days C++ stands out as a uniquely dangerous language. At least compared to the other modern languages. By "dangerous", I mean the ever-present significant possibility of accessing invalid memory. The potential consequences of invalid memory access can be severe. From exposure of sensitive data to complete compromise of the run-time environment.

Presumably this is the main reason C++ is not a popular language for (server side) web applications. Yet curiously, it is still the language used for critical parts of the web infrastructure. Web servers and web browsers, for example. Why is that? I suggest that it's simply because no other language is really up to the job. One issue in particular is that a lot of the other languages depend on garbage collection to achieve their language safety, which is arguably not appropriate for writing complex systems that need to be reliably responsive.

But C++ is still dangerous, and there have been countless security exploits that have taken advantage of that.

Since C++11, C++ has become a much more powerful language. Is there really still no practical way to avoid using C++'s dangerous elements? Well let's consider the most dangerous element of all, the pointer. Experienced (older) C++ programmers know how easy it can be to unintentionally end up with a pointer pointing to invalid memory. The situation is better now that the STL provides well-tested versions of many of the commonly used dynamic data structures so you don't have to implement your own, eliminating much of the need to use pointers at all.

And when using dynamic allocation, std::shared_ptr can often be a great substitute for raw pointers that helps ensure you don't accidentally deallocate the target object prematurely. Using std::shared_ptr essentially gets you the safety benefits of garbage collection, but, like garbage collection, there is a performance cost. In my opinion the safety benefit is worth it in pretty much all situations, but others would disagree.

C++

#include <vector>

class CNames : public std::vector<std::string> {
public:
    void addName(const std::string& name) {
        (*this).push_back(name);
    }
};

class CQuarantineInfo {
public:
    void add_quarantine_patient(const std::string* p_patient_name) {
        if (p_patient_name) {
            if ((3 * supervising_doctors.size()) <= quarantined_patients.size()) {
                /* The policy is to have at least one supervising doctor for every 3 patients. */
                if (1 <= available_reserve_doctors.size()) {
                    supervising_doctors.addName(available_reserve_doctors.back());
                    supervising_doctors.shrink_to_fit(); /* Just to increase the likelihood of exposing
                        the bug. */
                    available_reserve_doctors.pop_back();
                }
            }
            quarantined_patients.addName(*p_patient_name);
        }
    }

    CNames quarantined_patients;
    CNames supervising_doctors;
    CNames available_reserve_doctors;
};

void main(int argc, char* argv[]) {
    CQuarantineInfo quarantine_info;
    quarantine_info.available_reserve_doctors.addName("Dr. Bob");
    quarantine_info.available_reserve_doctors.addName("Dr. Dan");
    quarantine_info.available_reserve_doctors.addName("Dr. Jane");
    quarantine_info.available_reserve_doctors.addName("Dr. Tim");

    quarantine_info.add_quarantine_patient(&std::string("Amy"));
    quarantine_info.add_quarantine_patient(&std::string("Carl"));
    quarantine_info.add_quarantine_patient(&std::string("Earl"));

    /* Suppose the supervising doctor contracts the infection and becomes a patient too. */
    const std::string* p_name_of_doctor_that_contracted_the_infection = &(quarantine_info.supervising_doctors.front());
    quarantine_info.add_quarantine_patient(p_name_of_doctor_that_contracted_the_infection);

    /* The problem here is that the add_quarantine_patient() function might first add another doctor to
    the set of supervising_doctors. But because supervising_doctors is ultimately implemented as an
    std::vector<>, an insert (or push_back) operation could cause a "reallocation" event which would
    invalidate any references to any member of the vector. So the add_quarantine_patient() function
    could inadvertently invalidate its parameter before it is finished using it. */
}

It may never have occurred to the author of the add_quarantine_patient() function that the reference to the new patient could also be a reference to a supervising doctor, in which case the function can inadvertently cause the target of its p_patient_name parameter to be invalidated before it's finished using it.

It's a contrived example, but this kind of thing can easily happen in more complex situations. Of course using raw pointers is perfectly safe in the vast majority of cases. The problem is that there are a minority of cases where it's easy to assume that it's safe when it really isn't. So the prudent policy is to simply not use raw pointers (unless you're going to do some very thorough testing).

Again, using std::shared_ptr in place of raw pointers everywhere would be a simple way to solve the problem, but with a performance cost. A lot of that performance cost comes from the constraint that std::shared_ptr target objects cannot (or should not) be allocated on the stack. So when considering performance, registered pointers can often be a better alternative.

Here's what the above example looks like when substituting raw pointers (and references) with registered pointers:

C++

#include <vector>
#include "mseregistered.h"
using namespace mse;
/* Note that "ro<>" is aliased to mse::RegisteredObj<>, "rcp<>" to mse::RegisteredConstPointer<> and
"rfcp<>" to mse::RegisteredFixedConstPointer<>. */

class CNames : public std::vector<ro<std::string>> {
public:
    void addName(rfcp<std::string> p_name) {
        (*this).push_back(*p_name);
    }
};

class CQuarantineInfo {
public:
    void add_quarantine_patient(rcp<std::string> p_patient_name) {
        if (p_patient_name) {
            if ((3 * supervising_doctors.size()) <= quarantined_patients.size()) {
                /* The policy is to have at least one supervising doctor for every 3 patients. */
                if (1 <= available_reserve_doctors.size()) {
                    supervising_doctors.addName(&available_reserve_doctors.back());
                    supervising_doctors.shrink_to_fit(); /* Just to increase the likelihood of exposing the bug. */
                    available_reserve_doctors.pop_back();
                }
            }
            quarantined_patients.addName(&*p_patient_name);
        }
    }

    CNames quarantined_patients;
    CNames supervising_doctors;
    CNames available_reserve_doctors;
};

void main(int argc, char* argv[]) {
    CQuarantineInfo quarantine_info;
    quarantine_info.available_reserve_doctors.addName(&ro<std::string>("Dr. Bob"));
    quarantine_info.available_reserve_doctors.addName(&ro<std::string>("Dr. Dan"));
    quarantine_info.available_reserve_doctors.addName(&ro<std::string>("Dr. Jane"));
    quarantine_info.available_reserve_doctors.addName(&ro<std::string>("Dr. Tim"));

    quarantine_info.add_quarantine_patient(&ro<std::string>("Amy"));
    quarantine_info.add_quarantine_patient(&ro<std::string>("Carl"));
    quarantine_info.add_quarantine_patient(&ro<std::string>("Earl"));

    /* Suppose the supervising doctor contracts the infection and becomes a patient too. */
    rcp<std::string> p_name_of_doctor_that_contracted_the_infection = &(quarantine_info.supervising_doctors.front());
    try {
        quarantine_info.add_quarantine_patient(p_name_of_doctor_that_contracted_the_infection);
        /* The problem here is that the add_quarantine_patient() function might first add another
        doctor to the set of supervising_doctors. But because supervising_doctors is ultimately
        implemented as an std::vector<>, an insert (or push_back) operation could cause a
        "reallocation" event whichwould invalidate any references to any member of the vector. So the
        add_quarantine_patient() function could inadvertently invalidate its parameter before it is
        finished using it. */
        /* By default, registered pointers will throw an exception on any attempt to access invalid
        memory. */
    }
    catch (...) {
        /* Whether the bug is exposed depends on the implementation of std::vector<>. Under msvc2015 in
        debug mode (March 2016), the bug does manifest and an exception is caught here. */
    }

    /* Just to demonstrate that registered pointers also support stack allocated objects. */
    ro<std::string> patient_fred("Fred");
    quarantine_info.add_quarantine_patient(&patient_fred);
}

By default, registered pointers will throw an exception on any attempt to access invalid memory.

So there you go, C++'s most dangerous element made safe. Without sacrificing the performance benefit of stack allocation. Used along with the rest of the "SaferCPlusPlus" library, it is now practical to write C++ code with greatly reduced risk of accessing invalid memory.

Before we finish up, every good data type plugging article needs a benchmark chart:

Allocation, deallocation, pointer copy and assignment:

Pointer Type	Time
mse::TRegisteredPointer (stack)	0.027 seconds
native pointer (heap)	0.049 seconds
mse::TRegisteredPointer (heap)	0.074 seconds
std::shared_ptr (heap)	0.087 seconds

Pointer Type

Time

mse::TRegisteredPointer (stack)

0.027 seconds

native pointer (heap)

0.049 seconds

mse::TRegisteredPointer (heap)

0.074 seconds

std::shared_ptr (heap)

0.087 seconds

So as we can see, mse::TRegisteredPointers targeting stack allocated objects easily outperform even native (aka raw) pointers targeting heap allocated objects.

That's it. Let's code safely out there.

License

This article, along with any associated source code and files, is licensed under The MIT License