C++ references, some clarity on use and safety.

john morrison leon

4.68/5 (18 votes)

5 May 2014CPOL15 min read

31.3K

Some clear criteria for choosing when to use C++ references and how to know when they are safe.

Introduction

I wrote an article a few years ago about C++ references while I was still a little mystified by when to use them and when they are safe. It was a bit journalistic in style and unfortunately elicited an outbreak of a familiar flame war over the identity, meaning and compiled interpretation of C++ references.

Since then I have taken some trouble to de-mystify the issue and here are my conclusions.

I don`t wish to ignite that flame war again – I think that what I write here gives credibility to both sides.

Dry statement of criteria for using C++ references

Use a reference rather than a pointer whenever there is nothing to be gained by testing its validity. This could be:

Because you have good reason to know it will always be valid and therefore testing it would be a waste of time ( the primary and most intended use of C++ references).
Or alternatively because testing the pointer for non-zero would not be a reliable test of its validity.

Apart from providing some syntactical comfort, the use of a C++ reference rather than a pointer clarifies the situation. In the first case it prevents the programmer from making unnecessary tests of validity and in the second case it prevents the programmer from being fooled into making unreliable tests of validity.

I think that is a reliable design criteria for choosing a C++ reference rather than a pointer and this is illustrated by its most common applications, the first of which, the copy constructor, actually requires the use of a C++reference and effectively defines its syntax and grammar.

The copy constructor

A copy constructor cannot take a pass by value argument because pass by value is what it is defining.

class CClass
{
    CClass(CClass c) //cannot be allowed to compile
        //because the compiler would enter infinite recursion
    {
        //Initialisation  code
    }
};

It could be written to receive a pointer argument instead but then it would no longer be recognised by the compiler as a copy constructor and none of its implicit application, creation of temporary copies etc. will work.

class CClass
{
    CClass(CClass* pC)
    {
        //Initialisation  code
    } 
};
CClass Function()
{
     CClass c;
    return c;
} //does not use the constructor defined above to make the temporary copy returned

The C++ reference is required to provide a coherent expression of a copy constructor

class CClass 
{
    CClass(CClass const & c)
    {
        //Initialisation  code
    }
};
CClass Function1()
{
     CClass c;
    return c;
} //uses the copy constructor defined above

The reference argument CClass& const c specifies that the type to match and receive is a CClass object but a reference to it (yes an underlying pointer) is what is actually passed in. As the type passed to the argument is a CClass value, there is no point in having pointer semantics that allow you to test and zero its address. You work with it within the constructor as if it is a value, which it is.

The const modifier is important. It declares that the reference passed in will not be used to alter what it points at. This tells the compiler that it is ok to pass a const object. Without it, the complier will not match a const object with this copy constructor and will use its own default instead. You will not easily know this has happened as the code will compile ok.

It seems paradoxical at first that if you want a reference argument to be indifferent to constness then you must give it the const modifier; Only if you want it to reject const objects do you omit the const modifier. Of course what is really happening is that if there is no const modifier then you are indicating an intention to modify the referenced object and const objects cannot allow this.

Passing arguments into functions

The most common situation in which we can choose between C++ references and pointers is in the type of arguments passed into functions and methods. The solid case for using a C++ reference is where you are passing in something declared by value in the calling scope or a wider scope that embraces it.

CClass c;
Function1(c);
void Function1(CClass& c)
{
    c.DoSomething();
}

In this case scope assures that the C++ references passed in will always be valid throughout the execution of the function or method. There would be no value in testing their validity within the function or method and it is appropriate to dereference them by the dot operator as if they are static variables because they are static variables but declared in a wider scope. In low level terms they are very firmly sitting further down the stack.

You could use a pointer to do the same thing...

CClass c;
Function1(&c);
void Function1(CClass* pC)
{
    pC->DoSomething();
}

...but both the call and the function definition are uglier and for no gain. You know the variable c is valid so there is no point in suggesting that it might not be.

You can of course use a reference but think of it as a pointer in disguise but it is useful to take the perspective suggested in the official language definition that passing a reference into a function gives that function direct access to a variable already declared in a wider scope. The variable in a wider scope is always there and the reference gives you a name by which you can refer to it.

Structurally safe C++ references

Uses of references so far described above are examples of using a reference because the validity of the variable referenced is assured and there would be no point in testing it. They are entirely safe, that is they are guaranteed by scope to be structurally safe. Now the interesting thing here is that if C++ references had been designed with the following operations prohibited...

• conversion from a pointer

CClass* pC=NULL;
CClass& c= *(pC); //c is invalid object

• return from a function

CClass& Function3()
{
    CClass c;
    return c; //c is destroyed on destruction and a reference to where it used to be is returned
}

• pointer dereference involved in initialisation

CClass& c=Parent.pChild->Object.c; //Object is not guaranteed to always exist

...then C++ references would have been guaranteed to be safe at all times for all usage. This is because a C++ reference is effectively a const. Its declaration must include its initialisation and it cannot be changed thereafter. This means you can't declare it and then assign it later to something in a narrower scope. You can only initialise it to something that already exists in the same scope or a wider embracing scope and is therefore guaranteed to exist as long as the reference itself.

Access to dynamic collection elements – not always safe

Back in the real world of C++, those operations described above are not prohibited. They are not only allowed, they are put to very important use. You can of course prohibit them yourself in much of your own use of C++ references to ensure their safety. This is recommended and I will return to that later. First though let us take a look at some of the benefits of allowing these dangerous operations and also become fully aware that the price is the loss of a blanket guarantee that all C++ references will always remain valid.

With ordinary static arrays we use the [] operator both to read and to alter the elements of an array or to execute their methods...

CClass a[8];
a[0].intval=5;
a[0].DoSomething();

...but much of the time we use dynamic collections rather than arrays and we find it very comfortable that we can use the same syntax:

vector<CClass> v;
v.resize(8);
v[0].intval=5;
v[0].DoSomething();

Dynamic collections achieve this by overriding the [] operator. Now if the [] operator were to return the CClass object by value then it would return a copy and the assignment of 5 to intval and call to DoSomething() would be carried out on a temporary copy before throwing it away – not what we really want. So instead it returns a C++ reference to the CClass object in the array. Furthermore, to provide that C++ reference it will have had to convert the pointer it holds to the dynamically created object into a C++ reference or alternatively use a pointer dereference in initialisation of the C++ reference. Either way it is a wholehearted breach of the prohibitions described above that would keep a C++ reference structurally safe.

Now for any direct usage of the [] operator to carry out an operation on an unnamed element of a collection, this is perfectly safe. The C++ reference returned is temporary, unnamed and unseen and only exists while a single operation is carried out during which nothing else can happen to the element being referenced that may invalidate it. The problem comes when you decide to do something clever to avoid repeated dereferencing of the same element.

You could make a copy, work on it and then copy it back but the two copies will increase the execution overhead.

CClass c=v[0];
c.intval=5;
c.DoSomething();
v[0]=c;

A very efficient solution is to declare a named reference to the element returned and work with that.

CClass& c=v[0]
c.intval=5;
c.DoSomething();

There is no copy made and no need to copy back, you worked directly on the array element itself. This is supremely efficient and also pleasing to the eye. The example above is also perfectly safe but if the collection is disturbed in any way while that reference is in use...

CClass& c=v[0]
c.intval=5;
v.resize(8);        //may cause v[0] to be moved in memory
c.DoSomething();    //c may now represent an invalid object

...then we may find ourselves with an invalid C++ reference, quite an ugly thing because we like to think of them as being safe, even giving them the syntax of a statically declared variable.

You could use a pointer instead...

CClass* pC=&v[0]
pC->intval=5;
v.resize(8);            //may cause v[0] to be moved in memory
if(pC)                //passes test because  pC still points at where the element was
                //it is non NULL
    pC->DoSomething();    // pC may now point at invalid memory

...but all you gain for the ugly syntax is a false sense of security from a test that is useless. When the array rearranges itself it does not inform your pointer pC that it should set itself to NULL.

There is no avoiding the fact that as soon as you initialise a named C++ reference with a C++ reference returned from a function or method, including the [] operator, you leave the comfort zone of structural safety and have take your own informed measures to keep things safe. One solution is to use braces to keep named references to collection elements very tightly scoped and avoid any intrusion of collection operations during their lifetime

{
    CClass& c=v[0];    //take a reference to the element
    c.intval=5;        //work on it
    c.DoSomething();
}//close scope before touching the array
v.resize(8);            //array operation, may move elements
{
    CClass& c=v[0];    //take a reference to the element again
    c.DoSomething();    //work on it
    c.intval=5;        //safe as long as DoSomething() didn't disturb the array
                //watch out for this gotcha!
}

Having presented this efficient but, shall we say, unprotected use of C++ references, I should point out another hazard that it introduces:

This is beautiful

CClass& c=v[0]
c.intval=5;
c.DoSomething();

but just make one mistake in typing it:

CClass c=v[0]
c.intval=5;
c.DoSomething();

and it will compile and run but not do what you want. It will make changes to a copy and then throw it away. Take care!

Safe long term references to dynamic collection elements

If you really want to take a reference to an element of a dynamic collection, hold on to it while things might happen to the collection and have it auto zero if it becomes invalid, then neither C++ references nor raw pointers will do, you need to use a reference counted smart pointer.

Within the standard library you can take safe references to elements of an array as long as they are declared as std:shared_ptr<T>:

vector<std::shared_ptr<CClass> > v;
v.resize(8);
v[0]=new CClass;
v[1]=new CClass;
std::shared_ptr<CClass> spC0= v[0];    //shared ownership reference
std::weak_ptr<CClass> wpC1= v[1];    //observing reference

There are some drawbacks with this. The elements of the vector are vulnerable to shared ownership so you lose the assurance that resetting an element will delete it. Using a shared_ptr to hold a long term reference does just that, it keeps the object alive. For an observing reference you will need to use std::weak_ptr and accept having to convert it into a std::shared_ptr each time you want to deference it.

If you are not sharing your object across threads and your design is fundamentally single ownership then these are grave and unnecessary drawbacks. An alternative to this is to make use of a smart pointer system I have recently published on the Code Project

A Sensible Smart Pointer Wrap for Most of Your Code

vector<std::owner_ptr<CClass, ElementType> > v; //array of exclusive owners 
//that will survive STL collections.
v.resize(8);
v[0]=new CClass;
v[1]=new CClass;
ref_ptr<CClass> rC0= v[0];    // observing reference supporting direct dereference with the -> operator
//sharing ownership is expressly prohibited

Or if you want an array of values rather than pointers:

vector<super_gives_ref_ptr<CClass > > v; //array of values, super classed to provide  ref_ptr_to_this() method.
v.resize(8); 
ref_ptr<CClass> rC0= v[0].ref_ptr_to_this();    // observing reference supporting direct dereference with the -> operator
//sharing ownership is expressly prohibited

These smart pointers will be safe at all times in that they will either be valid or test as zero.

Other unsafe uses of C++ references

We can start with obvious cases of shooting yourself in the foot

CClass* pC=NULL;
CClass& c= *(pC); //c is invalid object
CClass& Function3()
{
    CClass c;
    return c; //c is destroyed on destruction and a reference to where it used to be is returned
}

And a less obvious case...

CClass& c=Parent.pChild->Object.c; //Object is not guaranteed to always exist

...the -> in the initialisation indicates that dynamic creation may have been involved and therefore the reference could be left invalid.

Probably the most common hazard occurs when you have functions and methods that take a reference parameter as an argument and you need to pass an object referenced by a pointer.

void Function1(CClass& c)
{
    c.DoSomething();
}
CClass* pC=GetObject();
Function1(*pC);

If pC turns out to be NULL then Function1 will receive an invalid reference.

So to prevent this from happening we can add a test first and only call the function if the pointer is non zero:

CClass* pC=GetObject();
if(pC) 
    Function1(*pC);

OK but remember that pointers don’t always get reset when the object they point at is destroyed so the non-null test is not reliable in the general case. You really do have to take a good look at where that pointer comes from and what you really have to do to test its validity.

Further structurally safe uses of C++ references

There is a very common scenario in which a C++ reference is absolutely the right thing to use and that is as a back reference to a parent. Note that unlike the human analogy of parents and children, we are talking of relationship in which the child can only exist as long as its parent is alive so typically the child would be a member of the parent class or would be dynamically created and held by a smart pointer that is a member of the parent class. That is to say that the parent is guaranteed to exist as long as the child. Typically programmers use raw pointers...

class CChild
{
    CParent* m_pParent;
};
Class CParent
{
    CChild m_Child;
public:
    CParent()
    {
        m_Child. m_pParent=this;
    }
};

Or if the child is dynamically created

class CChild
{
    CParent* m_pParent;
public:
    CChild(CParent* pParent)
    {
        m_pParent= pParent;
    }
    
};
Class CParent
{
    owner_ptr<CChild> m_apChild;
public:
    CParent()
    {
        m_apChild = NULL;
    }
    Void CreateChild()
    {
        m_apChild = new CChild(this);
    }
};

...but we know the parent is always there so there is never any need to check the back pointer as valid and there is no sense in exposing it to the danger of being zeroed. Sometimes smart pointers are used as back pointers which can be disastrous...

std::shared_ptr<CParent> m_spParent: //provokes cyclic references causing memory leaks

...or pointless...

std::weak_ptr<CParent> m_wpParent:

The correct solution is to use a C++ reference but the knowledge of how to initialise a C++ reference that is a member of a class is not widespread and requires an understanding of initializer lists. The problem is that a reference must be initialised to point at a valid object as soon as it exists and initialising it in the body of the constructor is already too late. At this point all members have already been created and are ready for use. Fortunately C++ allows you to define an initialiser list outside of the body of the constructor in which you can specify initial values for any members that will be assigned as they are created and it is here that we can initialise any reference members.

class CChild
{
    CParent& Parent;
public:
    CChild(CParent& _Parent)
        : Parent(_Parent)    //initialiser list
    {
    }
};

Note that I have called the back reference simply Parent and not m_Parent or m_rParent. This is because it isn’t really a member and it doesn’t need to be thought of as a reference. It is a direct reference to the parent itself and so should be thought of as simply being the Parent.

In the case of a dynamically created child, the parent code would look something like this...

Class CParent
{
    owner_ptr<CChild> m_apChild;
public:
    CParent()
    {
        m_apChild = NULL;
    }
    Void CreateChild()
    {
        m_apChild = new CChild(this);
    }
};

...but if the child is a member of the parent class then we also need to use the parents initialiser list to initialise it.

Class CParent
{
    CChild  m_Child; 
public:
    CParent()
        :  m_Child(*this);
    {
            }
    
};

It is also possible for two sibling child members of a parent class to be initialised to reference each other.

class CChild
{
    CChild & OtherChild;
public:
    CChild(CChild & _ OtherChild)
        : OtherChild (_OtherChild)
    {
    }
};
Class CParent
{
    CChild  m_Child1;
CChild  m_Child2; 
public:
    CParent()
        :  m_Child1(m_Child2),  m_Child2(m_Child1);
    {
            }
    
};

There are other safe uses of C++ references. Although they can be seen as trivial, they can be useful as compile time switches:

A reference to something in the same scope...

CClass C; 
CClass& c=C;

...at first this may seem pointless because anywhere you can refer to the reference c, you can also refer to the original variable C, but if we have ...

CClass C;
CClass C1;
CClass& c=C;
// loads of code working on c which refers to C

Then if we want to work with C1 instead of with C then we just change one line of code

CClass& c=C1;
// loads of code working on c which refers to C1

There are also some cases where a C++ reference may be returned by a function of method and be perfectly safe:

When it is a reference to a global variable; This may be a function or method that chooses which global variable to return to work with, depending on various conditions...

CClass g_C;
CClass g_C1;
CClass& GetAppropriateClassObjectToWorkWith()
{
    if(Condition)
        return g_C;
    else
        return g_C1;
}

... in either case it will return a reference to a global variable which will be perfectly safe.

It is also safe for a private non-static method of a class to return a reference to a member variable as it will only be possible to assign it to a local variable that can only exist during the life of the calling method ... during which time the class object and all it members must still exist.

CParent
{
private:
    CChild  m_Child1;
    CChild  m_Child2;
        CChild  & GetAppropriateChildObjectToWorkWith()
    {
        if(Condition)
            return Child1;
        else
            return Child2;
    }
public:
    void DoSomething()
    {
        CChild& Child= GetAppropriateChildObjectToWorkWith();
        //Work with Child.
    }
};

You can also usefully have a public method returning a reference to a class member but that will allow you to initialise a reference that may live beyond the class object. You no longer have structural safety and will have to resort to your own informed measures to keep things safe.

Can the use of C++ references optimise compiled code?

The short answer with current compilers is no. For instance in the general case of passing a reference into a function it will be necessary to store an underlying pointer because the calling context is not known when the function is compiled and it may have several different calling contexts. However there are many function and methods, particularly private methods of classes, which are always called with the same calling context. It would be possible for future compilers to detect this and instead of storing an underlying pointer passed in for each call; it could simply hard code an offset into the stack where the referenced variable is always found, eliminating the need to pass in an underlying pointer on each call.

Of course if you express the same thing with pointers then you are explicitly asking the compiler to create storage for them and unless it can easily see a complete redundancy in your request, that is exactly what it will do. The more specific you are about what you really want from a compiler...

void func1(A& a) //I want 'a' to refer to the existing variable of type A referenced in the call.
void func2(A* pA) //I want a new variable A* pA which is a copy of the variable of type A* //referenced in the call which may or may not hold the address of a variable of type A.

...the better chance it has of giving it to you in the most efficient manner. More importantly the better chance you and other programmers have of understanding your real intention.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)