A Beginner's Guide on using Pointers

logicchild

4.41/5 (11 votes)

29 Jul 2009CPOL7 min read

37.1K

This is an article directed at those beginners who might benefit from understanding pointers

Preface

The C and C++ languages make powerful use of pointers, and many find them hard to understand. The Java language does not use pointers, as those who developed the Java language insisted that they result in buggy code. There are two basic essentials to reiterate on in order to understand how to use pointers: variables and memory. In a typical x86 32 bit operating system, there are 2 raised to the 32^nd power (4294967296 or 4 GB) of possible memory locations. These memory addresses are sequentially allocated. That is, computer memory is divided into sequentially numbered memory locations. These memory address locations cannot be physically moved, therefore some of those locations must be referenced to. Look at the following statement:

C++

int number = 5;

Here an area of memory is allocated to store an integer, and you can access it using the name number. The value of 5 is stored in that 32 bit memory location. The computer references the area using an address. The specific address where this data will be stored depends on your computer and what operating system and compiler you're using. Even though the variable name is fixed in the source program, the address is likely to be different on different systems (even those of the same architecture). In the most basic sense, variables that can store addresses are called pointers, and the address that’s stored in a pointer is usually that of another variable. To get a variable’s memory address, use the address-of operator (&), which returns the address of an object in memory. Below is the code that illustrates the use of the address-of operator:

C++

#include <iostream>
int main()
{
   using namespace std;
   short shortVar = 5;
   long  longVar=65535;
   long sVar = -65535;
   cout << "shortVar:\t" << shortVar;
   cout << "\tAddress of shortVar:\t";
   cout <<  &shortVar  << endl;
   cout << "longVar:\t"  << longVar;
   cout  << "\tAddress of longVar:\t" ;
   cout <<  &longVar  << endl;
   cout << "sVar:\t\t"     << sVar;
   cout << "\tAddress of sVar:\t" ;
   cout <<  &sVar     << endl;
  return 0;
}

When using the cl.exe compiler, the result is:

shortVar: 5 Address of shortVar: 0012FF34
longVar: 65535 Address of longVar: 0012FF38
sVar: -65535 Address of sVar: 0012FF3C

Three variables are declared and assigned a value. Each requires three blocks of console output “cout” to print the value of the variable and the address of the variable. When you declare a variable, the compiler determines how much memory is located based on the data type.

Storing a Variable’s Address in a Pointer

Every variable has an address. Even without knowing the specific address, you can store a variable’s address in a pointer. One point should be made at this point. C++ is based on the C language. C also uses the "&" operator, but is actually considered a unary operator, as it points to only one side of a variable’s initialization when using the “=” operator. A binary pointer (one that uses the * operator) points at both sides of a variable’s initialization. Suppose that howOld is an integer. To declare a pointer (just as you would a variable) called pAge to hold its address, you write:

C++

int *pAge = 0;

This declares pAge to be a pointer to an int data type. pAge is declared to hold the address of the integer. In that example, pAge is initialized to zero. A pointer whose value is zero is called a null pointer. Safe programming practices dictate that pointers always be initialized. For a pointer to hold an address, the address must be assigned to it. In the current example, you must specifically assign the address of hOld to pAge, as shown in the following:

C++

short  int howOld = 60;         // declare and initialize a variable
short int  * pAge = 0;          // declare a pointer
pAge = &howOld;                 // place howOld’s address in pAge

We see that the value of howOld is 50, and pAge has the address of howOld. So then, a pointer is a variable that has no difference from any other variable, apart from the data it holds. Although * is a binary pointer, in C++ it is called the indirection operator (or dereference operator). When a pointer is dereferenced, the value at the address of the pointer is retrieved. A pointer provides indirect access to the value of the variable whose address it stores. That is, the indirection operator (*) in front of the pointer variable pAge means “the value stored at”. Indirection dereferences the variable we are pointing to, and returns the value at that memory location. Here is an example using some very basic Visual C++ syntax. Notice the “stdafx.h” header file. This file is located in the MFC directory. In order to make the code compile (on the command line) and recognize this header file, I made a copy of it by typing ‘notepad stdafx.h’ and then copied it to the default c:\Program Files\Microsoft Visual Studio 9.0\VC\Include directory.

C++

#include "stdafx.h"
int main(int argc, _TCHAR*  argv[])
{
   int j = 5;
   printf("    j : %8.X  (value)\n", j);    	// the value is what is stored at 
					// the memory location
   printf("   &j : %8.X  (address)\n", &j); 	// the address is that memory location  

   int *p = &j;                            	// p is another variable that 
					// has its own address
                                            	// stored at pointer p is the address of 
					// the variable j, hence its name 
   printf("   p  :   %8.X (value)\n", p);
   printf("  &p  :   %8.X (address)\n", &p);
   printf("  *p   :   %8.X (indirection)\n", *p);      // indirection
    return 0;
 }

I compiled this code on the command line using the /EHsc switch of the Cl.exe compiler:

C++

  j :          5  (value)
 &j :        5  (address)
 p  :        12FF38 (value)
&p  :      12FF3C (address)
*p   :          5 (indirection)   //note dereferencing p retrieves the value of j, or 5

What is a Reference ?

A reference is an alias; when you create a reference, you initialize it with the name of another object, the target (in our case variable j). From that point on, the reference functions as an alternative name for the target, and anything you do to the reference is really done to the target. Examine the code below:

C++

#include "stdafx.h"
int main(int argc, _TCHAR*  argv[])
{
   int j = 5;
   printf("    j : %8.X  (value)\n", j);  
   printf("   &j : %8.X  (address)\n", j); 
   int *p = &j;  
   printf("   p  :   %8.X (value)\n", p);
   printf("  &p  :   %8.X (address)\n", &p);
   printf("  *p   :   %8.X (indirection)\n", *p); 
  // now we use r as a "reference to variable j
    int &r = j;
   printf("   r  : %8.X (value)\n", r);
   printf("   &r : %8.X (address)\n", &r);
   getc(stdin);  // the getc() function holds the DOS prompt out without disappearing
   return 0;
 }

Now let’s examine the output:

  j :         5  (value)
 &j :         5  (address)
 p  :        12FF34 (value)
&p  :        12FF3C (address)
*p   :        5 (indirection)
 r  :         5 (value)
 &r :        12FF34 (address)

First, notice that the address of p is different than that of what this pointer points to. The addresses are close because they are on the stack. So j stores 5 at address 12FF34. The declared pointer p points at that address. The address of p is different from what it stores, which is the address of the value of j. Next we declare a reference to j through variable r. r has the same value as j, as it functions as an alias, or alternative to the target object. The address of r is therefore the value as the pointer p: the address of j. Now let’s examine the use of double indirection (using **p):

C++

#include "stdafx.h"
class CTest
{
public:
    CTest() { printf("CTest constructor\n"); }
    ~CTest() { printf("CTest destructor\n"); }
};

void RunTest()
{
    CTest test;
    CTest *ptest = new CTest();
    delete ptest;
}

int _tmain(int argc, _TCHAR* argv[])
{
//    RunTest();

    int i = 1;
    printf("   i : %8.X (value)\n", i);
    printf("  &i : %8.X (address)\n", &i);

    int *p = &i;
    printf("   p : %8.X (value)\n", p);
    printf("  &p : %8.X (address)\n", &p);
    printf("  *p : %8.X (indirection)\n", *p);

    int &r = i;
    printf("   r : %8.X (value)\n", r);
    printf("  &r : %8.X (address)\n", &r);

    int **pp = &p;
    printf("  pp : %8.X (value)\n", pp);
    printf(" &pp : %8.X (address)\n", &pp);
    printf(" *pp : %8.X (indirection)\n", *pp);
    printf("**pp : %8.X (double indirection)\n", **pp);
              *p = 2;
    printf("   i : %8.X (value)\n", i);
             **pp = 3;
    printf("   i : %8.X (value)\n", i);

/*    char str[] = "This is a test string";
    char *p = str;
              printf(" str = \"%s\"\n", str);
    printf(" str = %8.X (address)\n", str);
    printf("*str = %8.X (indirection)\n", *str);
    printf("   p = %8.X (value)\n", p);
    printf("  *p = %8.X (indirection)\n", *p);
              p[4] = '1';
    printf(" str = \"%s\"\n", str);
             *(p + 4) = '2';
    printf(" str = \"%s\"\n", str);*/
               getc(stdin);
    return 0;
}

The declaration of the string of characters is commented out for now. Notice that we instantiated a class, constructed objects, and then made use of destructors using the delete keyword. Here is an image of the output, as seen in memory:

C++ Code can be Vulnerable to Memory Leaks

Local variables are on the stack, along with function parameters (which comprise the stack frame). In C++, just about all of the remaining memory is given to the free store, which is also called the heap. When a function is called, it pushes its parameters onto the stack. The stack is cleaned automatically when a function returns from a call. That is, the function returns to the return address on the stack so the next instruction pointed at by the Instruction Pointer (IP register) can then execute. The local variables do not persist, when a function returns, its local variables are destroyed. This is good because it prevents the programmer from managing the memory space, but is also bad because it makes it hard for functions to create objects for use by other objects or functions without generating the extra overhead of copying objects from the stack to return value to destination object of the caller. So while the stack is clean automatically, the heap is not until your application ends, and it is the responsibility to free any memory that you've reserved when you're done with it. This is where destructors are important, because they provide a place where any heap memory allocated in a class can be reclaimed. You allocate memory on the heap in C++ by using the new keyword. new is followed by the type of the object that you want to allocate, so the compiler knows how memory is required. The return value of new is a memory address. Because we know that memory addresses are stored in pointers, then the return value of new should also be assigned to a pointer. To create an unsigned short int on the heap, you could write:

C++

short int  *pPointer;
pPointer = new short int;

So what does this mean? It means that we put memory back using the delete keyword. That is, when you are finished with an area of memory, you must free it back to the system. You do this by calling the delete on the pointer. delete returns memory to the heap (free store). So, to prevent memory leaks, restore any memory that you have allocated (from using the new keyword) by using the delete keyword. For example:

C++

delete pPointer

Now if we take a quick look at class instantiation and object construction and add it to the code that is used above, we get:

C++

#include "stdafx.h"
class CTest
{
public:
    CTest() { printf("CTest constructor\n"); }
    ~CTest() { printf("CTest destructor\n"); }
};

void RunTest()
{
    CTest test;
    CTest *ptest = new CTest();
    delete ptest;
}

int _tmain(int argc, _TCHAR* argv[])
{
    char str[] = "This is a test string";
    char *p = str;

    printf(" str = \"%s\"\n", str);
    printf(" str = %8.X (address)\n", str);
    printf("*str = %8.X (indirection)\n", *str);
    printf("   p = %8.X (value)\n", p);
    printf("  *p = %8.X (indirection)\n", *p);

    p[4] = '1';
    printf(" str = \"%s\"\n", str);

    *(p + 4) = '2';
        printf(" str = \"%s\"\n", str);

    getc(stdin);
    return 0;
}

The output:

str = "This is a test string"
 str =   12FF20 (address)
*str =       54 (indirection)
   p =   12FF20 (value)
  *p =       54 (indirection)
 str = "This1is a test string"
 str = "This2is a test string"

The allocated memory from the heap by using the new keyword was released by using the delete keyword. Finally, here is more basic code that allocates and then frees memory:

C++

c:\>type con > free_memory.cpp
#include <iostream>
int main()
{
   using namespace std;
   int localVariable = 5;
   int * pLocal= &localVariable;
   int * pHeap = new int;
   *pHeap = 7;
   cout << "localVariable: " << localVariable << endl;
   cout << "*pLocal: " << *pLocal << endl;
   cout << "*pHeap: " << *pHeap << endl;
   delete pHeap;
   pHeap = new int;
   *pHeap = 9;
   cout << "*pHeap: " << *pHeap << endl;
   delete pHeap;
   return 0;
}
^Z
//////////////////////////////////////////////////////////////////////////
c:\>cl /EHsc free_memory.cpp
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 15.00.30729.01 for 80x86
Copyright (C) Microsoft Corporation.  All rights reserved.

free_memory.cpp
Microsoft (R) Incremental Linker Version 9.00.30729.01
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:free_memory.exe
free_memory.obj
//////////////////////////////////////////////////////////////////////////////////////
c:\>free_memory.exe
localVariable: 5
*pLocal: 5
*pHeap: 7
*pHeap: 9

If anyone has ever developed in .NET, then the closest topic of native code pointers could be delegate types. If anyone is interested in learning the MFC or ATL frameworks, then pointers are a good way to start. Once one has a strong handle on pointers, then other more advanced programming topics will come much less complicated.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)