Preface
The C and C++ languages make powerful use of pointers, and many find them hard to understand. The Java language does not use pointers, as those who developed the Java language insisted that they result in buggy code. There are two basic essentials to reiterate on in order to understand how to use pointers: variables and memory. In a typical x86 32 bit operating system, there are 2 raised to the 32nd power (4294967296 or 4 GB) of possible memory locations. These memory addresses are sequentially allocated. That is, computer memory is divided into sequentially numbered memory locations. These memory address locations cannot be physically moved, therefore some of those locations must be referenced to. Look at the following statement:
int number = 5;
Here an area of memory is allocated to store an integer, and you can access it using the name number. The value of 5 is stored in that 32 bit memory location. The computer references the area using an address. The specific address where this data will be stored depends on your computer and what operating system and compiler you're using. Even though the variable name is fixed in the source program, the address is likely to be different on different systems (even those of the same architecture). In the most basic sense, variables that can store addresses are called pointers, and the address that’s stored in a pointer is usually that of another variable. To get a variable’s memory address, use the address-of operator (&), which returns the address of an object in memory. Below is the code that illustrates the use of the address-of operator:
#include <iostream>
int main()
{
using namespace std;
short shortVar = 5;
long longVar=65535;
long sVar = -65535;
cout << "shortVar:\t" << shortVar;
cout << "\tAddress of shortVar:\t";
cout << &shortVar << endl;
cout << "longVar:\t" << longVar;
cout << "\tAddress of longVar:\t" ;
cout << &longVar << endl;
cout << "sVar:\t\t" << sVar;
cout << "\tAddress of sVar:\t" ;
cout << &sVar << endl;
return 0;
}
When using the cl.exe compiler, the result is:
- shortVar: 5 Address of shortVar: 0012FF34
- longVar: 65535 Address of longVar: 0012FF38
- sVar: -65535 Address of sVar: 0012FF3C
Three variables are declared and assigned a value. Each requires three blocks of console output “cout
” to print the value of the variable and the address of the variable. When you declare a variable, the compiler determines how much memory is located based on the data type.
Storing a Variable’s Address in a Pointer
Every variable has an address. Even without knowing the specific address, you can store a variable’s address in a pointer. One point should be made at this point. C++ is based on the C language. C also uses the "&" operator, but is actually considered a unary operator, as it points to only one side of a variable’s initialization when using the “=” operator. A binary pointer (one that uses the * operator) points at both sides of a variable’s initialization. Suppose that howOld
is an integer. To declare a pointer (just as you would a variable) called pAge
to hold its address, you write:
int *pAge = 0;
This declares pAge
to be a pointer to an int
data type. pAge
is declared to hold the address of the integer. In that example, pAge
is initialized to zero. A pointer whose value is zero is called a null pointer. Safe programming practices dictate that pointers always be initialized. For a pointer to hold an address, the address must be assigned to it. In the current example, you must specifically assign the address of hOld
to pAge
, as shown in the following:
short int howOld = 60; short int * pAge = 0; pAge = &howOld;
We see that the value of howOld
is 50
, and pAge
has the address of howOld
. So then, a pointer is a variable that has no difference from any other variable, apart from the data it holds. Although *
is a binary pointer, in C++ it is called the indirection operator (or dereference operator). When a pointer is dereferenced, the value at the address of the pointer is retrieved. A pointer provides indirect access to the value of the variable whose address it stores. That is, the indirection operator (*
) in front of the pointer variable pAge
means “the value stored at”. Indirection dereferences the variable we are pointing to, and returns the value at that memory location. Here is an example using some very basic Visual C++ syntax. Notice the “stdafx.h” header file. This file is located in the MFC directory. In order to make the code compile (on the command line) and recognize this header file, I made a copy of it by typing ‘notepad stdafx.h’ and then copied it to the default c:\Program Files\Microsoft Visual Studio 9.0\VC\Include directory.
#include "stdafx.h"
int main(int argc, _TCHAR* argv[])
{
int j = 5;
printf(" j : %8.X (value)\n", j); printf(" &j : %8.X (address)\n", &j);
int *p = &j; printf(" p : %8.X (value)\n", p);
printf(" &p : %8.X (address)\n", &p);
printf(" *p : %8.X (indirection)\n", *p); return 0;
}
I compiled this code on the command line using the /EHsc switch of the Cl.exe compiler:
j : 5 (value)
&j : 5 (address)
p : 12FF38 (value)
&p : 12FF3C (address)
*p : 5 (indirection)
What is a Reference ?
A reference is an alias; when you create a reference, you initialize it with the name of another object, the target (in our case variable j
). From that point on, the reference functions as an alternative name for the target, and anything you do to the reference is really done to the target. Examine the code below:
#include "stdafx.h"
int main(int argc, _TCHAR* argv[])
{
int j = 5;
printf(" j : %8.X (value)\n", j);
printf(" &j : %8.X (address)\n", j);
int *p = &j;
printf(" p : %8.X (value)\n", p);
printf(" &p : %8.X (address)\n", &p);
printf(" *p : %8.X (indirection)\n", *p);
int &r = j;
printf(" r : %8.X (value)\n", r);
printf(" &r : %8.X (address)\n", &r);
getc(stdin); return 0;
}
Now let’s examine the output:
j : 5 (value)
&j : 5 (address)
p : 12FF34 (value)
&p : 12FF3C (address)
*p : 5 (indirection)
r : 5 (value)
&r : 12FF34 (address)
First, notice that the address of p
is different than that of what this pointer points to. The addresses are close because they are on the stack. So j
stores 5
at address 12FF34. The declared pointer p
points at that address. The address of p
is different from what it stores, which is the address of the value of j
. Next we declare a reference to j
through variable r
. r
has the same value as j
, as it functions as an alias, or alternative to the target object. The address of r
is therefore the value as the pointer p
: the address of j
. Now let’s examine the use of double indirection (using **p
):
#include "stdafx.h"
class CTest
{
public:
CTest() { printf("CTest constructor\n"); }
~CTest() { printf("CTest destructor\n"); }
};
void RunTest()
{
CTest test;
CTest *ptest = new CTest();
delete ptest;
}
int _tmain(int argc, _TCHAR* argv[])
{
int i = 1;
printf(" i : %8.X (value)\n", i);
printf(" &i : %8.X (address)\n", &i);
int *p = &i;
printf(" p : %8.X (value)\n", p);
printf(" &p : %8.X (address)\n", &p);
printf(" *p : %8.X (indirection)\n", *p);
int &r = i;
printf(" r : %8.X (value)\n", r);
printf(" &r : %8.X (address)\n", &r);
int **pp = &p;
printf(" pp : %8.X (value)\n", pp);
printf(" &pp : %8.X (address)\n", &pp);
printf(" *pp : %8.X (indirection)\n", *pp);
printf("**pp : %8.X (double indirection)\n", **pp);
*p = 2;
printf(" i : %8.X (value)\n", i);
**pp = 3;
printf(" i : %8.X (value)\n", i);
getc(stdin);
return 0;
}
The declaration of the string of characters is commented out for now. Notice that we instantiated a class, constructed objects, and then made use of destructors using the delete
keyword. Here is an image of the output, as seen in memory:
C++ Code can be Vulnerable to Memory Leaks
Local variables are on the stack, along with function parameters (which comprise the stack frame). In C++, just about all of the remaining memory is given to the free store, which is also called the heap. When a function is called, it pushes its parameters onto the stack. The stack is cleaned automatically when a function returns from a call. That is, the function returns to the return address on the stack so the next instruction pointed at by the Instruction Pointer (IP register) can then execute. The local variables do not persist, when a function returns, its local variables are destroyed. This is good because it prevents the programmer from managing the memory space, but is also bad because it makes it hard for functions to create objects for use by other objects or functions without generating the extra overhead of copying objects from the stack to return value to destination object of the caller. So while the stack is clean automatically, the heap is not until your application ends, and it is the responsibility to free any memory that you've reserved when you're done with it. This is where destructors are important, because they provide a place where any heap memory allocated in a class can be reclaimed. You allocate memory on the heap in C++ by using the new
keyword. new
is followed by the type of the object that you want to allocate, so the compiler knows how memory is required. The return value of new
is a memory address. Because we know that memory addresses are stored in pointers, then the return value of new
should also be assigned to a pointer. To create an unsigned short int
on the heap, you could write:
short int *pPointer;
pPointer = new short int;
So what does this mean? It means that we put memory back using the delete
keyword. That is, when you are finished with an area of memory, you must free it back to the system. You do this by calling the delete
on the pointer. delete
returns memory to the heap (free store). So, to prevent memory leaks, restore any memory that you have allocated (from using the new
keyword) by using the delete
keyword. For example:
delete pPointer
Now if we take a quick look at class instantiation and object construction and add it to the code that is used above, we get:
#include "stdafx.h"
class CTest
{
public:
CTest() { printf("CTest constructor\n"); }
~CTest() { printf("CTest destructor\n"); }
};
void RunTest()
{
CTest test;
CTest *ptest = new CTest();
delete ptest;
}
int _tmain(int argc, _TCHAR* argv[])
{
char str[] = "This is a test string";
char *p = str;
printf(" str = \"%s\"\n", str);
printf(" str = %8.X (address)\n", str);
printf("*str = %8.X (indirection)\n", *str);
printf(" p = %8.X (value)\n", p);
printf(" *p = %8.X (indirection)\n", *p);
p[4] = '1';
printf(" str = \"%s\"\n", str);
*(p + 4) = '2';
printf(" str = \"%s\"\n", str);
getc(stdin);
return 0;
}
The output:
str = "This is a test string"
str = 12FF20 (address)
*str = 54 (indirection)
p = 12FF20 (value)
*p = 54 (indirection)
str = "This1is a test string"
str = "This2is a test string"
The allocated memory from the heap by using the new
keyword was released by using the delete
keyword. Finally, here is more basic code that allocates and then frees memory:
c:\>type con > free_memory.cpp
#include <iostream>
int main()
{
using namespace std;
int localVariable = 5;
int * pLocal= &localVariable;
int * pHeap = new int;
*pHeap = 7;
cout << "localVariable: " << localVariable << endl;
cout << "*pLocal: " << *pLocal << endl;
cout << "*pHeap: " << *pHeap << endl;
delete pHeap;
pHeap = new int;
*pHeap = 9;
cout << "*pHeap: " << *pHeap << endl;
delete pHeap;
return 0;
}
^Z
c:\>cl /EHsc free_memory.cpp
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 15.00.30729.01 for 80x86
Copyright (C) Microsoft Corporation. All rights reserved.
free_memory.cpp
Microsoft (R) Incremental Linker Version 9.00.30729.01
Copyright (C) Microsoft Corporation. All rights reserved.
/out:free_memory.exe
free_memory.obj
c:\>free_memory.exe
localVariable: 5
*pLocal: 5
*pHeap: 7
*pHeap: 9
If anyone has ever developed in .NET, then the closest topic of native code pointers could be delegate types. If anyone is interested in learning the MFC or ATL frameworks, then pointers are a good way to start. Once one has a strong handle on pointers, then other more advanced programming topics will come much less complicated.