How to Understand Pointers in C/C++
A typical end user computer uses a member of the x86 family of microprocessors, either Pentium, AMD, or some prototype of Intel. This means a 32 bit memory address scheme. Since computers essentially use the base 2 binary number system, this means that there are 2 raised to the 32nd power of possible memory locations – or 4,294,967,296 (4 GB). These memory address locations cannot be physically moved, they therefore must be referenced. A pointer is a special data type that points to the location of an address that is storing an address, usually an address of another variable. When a variable is declared, the compiler is informed of its existence as defined by its data type. When a variable is initialized, it is assigned a value that is meant to be stored in accordance with the type of data (integer, character, floating point decimal, etc.) that it represents. Even though the .NET Framework does make much use of pointers, the C/C++ languages do for programming practices like COM or Windows frameworks like MFC and ATL. Understanding pointers can help in learning frameworks like those, while understanding that they should not be used if the location being pointed at is not clear. This paper will use both old style C++ syntax (using the Standard Template Library) as well as Windows C++ on the Visual C++ Express Edition to exemplify the use of pointers. At the same time, this article is not intended as a substitute for other, more advanced articles on pointers contained by the CodeProject website.
Pointers enable you to manipulate addresses without ever knowing their real value. Here is some code that finds out what is stored in pointers. The ‘\t’ escape sequence character means tab. In this example, the Address-of operator ‘&
’ is used to show what the address is of a value that is stored:
#include <iostream>
int main()
{
using namespace std;
unsigned short_int = 5;
unsigned long_int = 65535;
long signed_int = -65535;
cout << "short_int:\t" << short_int;
cout << "\tAddress of short int:\t";
cout << &short_int << endl;
cout << "long_int:\t" << long_int;
cout << "\tAddress of long_int:\t" ;
cout << &long_int << endl;
cout << "signed_int:\t\t" << signed_int;
cout << "\tAddress of signed_int:\t" ;
cout << &signed_int << endl;
return 0;
}
The first part of the code block shows that these variables are declared and initialized: an unsigned short, an unsigned long int, and a long. If it is unsigned, then we can assume it is signed (in some cases, either with a negative sign or a plus sign). This is the output when using the cl.exe compiler on the command line:
short_int : 5 Address of short int: 0012FF34
long_int : 65535 Address of long_int: 0012FF3C
signed_int: -65535 Address of signed_int: 0012FF38
When the variables are declared and initialized (or assigned a value), the compiler takes care of allocating memory and automatically assigns an address for it. The above example just shows the values and their corresponding addresses, however. This is not very useful information. So, we will try another example:
include <iostream>
int main()
{
using namespace std;
unsigned short int mAge = 32, yAge = 40;
unsigned short int *pAge = &mAge;
cout << "mAge:\t" << mAge
<< "\t\tyAge:\t" << yAge << endl;
cout << "&mAge:\t" << &mAge
<< "\t&yAge:\t" << &yAge << endl;
cout << "pAge:\t" << pAge << endl;
cout << "*pAge:\t" << *pAge << endl;
cout << "\nReassigning: pAge = &yAge..."
<< endl << endl;
pAge = &yAge;
cout << "mAge:\t" << mAge
<< "\t\tyAge:\t" << yAge << endl;
cout << "&mage:\t" << &mAge
<< "\t&yAge:\t" << &yAge << endl;
cout << "pAge:\t" << pAge << endl;
cout << "*pAge:\t" << *pAge << endl;
cout << "\n&pAge:\t" << &pAge << endl;
return 0;
}
Agreed is that this syntax is old-style C++ and is a bit confusing, but the underlying principles should get clearer when we look at Visual C++ examples. But, for simplicities sake, we should avoid developing any type of Windows Forms application. This is the output:
mAge: 32 yAge: 40
&mAge: 0012FF34 &yAge: 0012FF38
pAge: 0012FF34
*pAge: 32
Reassigning: pAge = &yAge...
mAge: 32 yAge: 40
&mAge: 0012FF34 &yAge: 0012FF38
pAge: 0012FF38
*pAge: 40
&pAge: 0012FF3C
While C/C++ use pointers, other high-level object-oriented languages like Java do not. Java developers content that pointers create buggy programs. However, an object user (an application program) accessing an object is inherent in object-orientation. An object, or module, is a body of code that contains semantically related functions and can never be accessed as a whole. Access to an object is through indirection. This is one of the basic tenets of interop with, say, a COM client and a .NET object. A COM Callable Wrapper (CCW) resides at the border of the unmanaged platform of COM and the managed platform of .NET. This CCW is able to read the metadata and form a type library. The regasm.exe tool is able to create Registry entries for a .NET assembly so it can appear as a COM component. Because we are dealing with the consumption of an object, its life cycle must be tracked. The CCW is able to track the reference count of a COM client so that when it reaches zero, it can map all of those pointers involved into references to that object, allowing it to be garbage collected.
Examples using Visual C++ Express Edition
Just start a new C++ project and choose the Win32 console and click Finish, rather than clicking “Empty Project”. Be sure to go to the project’s properties and set the “Use UNICODE Response files” to No, and set the Character Set to “Use Multi-byte character set”:
int _tmain(int argc, _TCHAR* argv[])
{
int i = 5;
printf(" i : %8.X (value)\n", i );
printf(" &i : %8.X (address)\n", &i);
int *p = &i;
printf (" p : %8.X (value)\n", p);
printf(" &p : %8.X (address)\n", &p);
printf(" *p : %8.X (indirection)\n", *p);
getc(stdin);
return 0;
}
Here is the output:
i : 5 (value)
&i : 2BF774 (address)
p : 2BF774 (value)
&p : 2BF768 (address)
*p : 5 (indirection)
Notice that the value of p
is the same as the address of i
. The actual address of p
appears if it is just a short distance away on the stack. Above all, note that *p
, or indirection, and then using indirection returns the value of i
, and not the memory location. Or *p = 5
.
Here is an extension of the last example, but with a reference to the variable i
:
#include "stdafx.h"
int _tmain(int argc, _TCHAR* argv[])
{
int i = 5;
printf(" i : %8.X (value)\n", i );
printf(" &i : %8.X (address)\n", &i);
int *p = &i;
printf (" p : %8.X (value)\n", p);
printf(" &p : %8.X (address)\n", &p);
printf(" *p : %8.X (indirection)\n", *p);
int &r = i;
printf(" r : %8.X (value)\n", r);
printf (" &r : %8.X (address)\n", &r);
int **pp = &p;
printf(" pp : %8.X (value)\n", pp);
printf(" &pp : %8.X (address)\n", &p);
printf(" *pp : %8.X (indirection)\n", *pp);
printf(" **pp : %8.X (double indirection)\n", **pp);
getc(stdin);
return 0;
}
Here is the output:
i : 5 (value)
&i : 1DFDDC (address)
p : 1DFDDC (value) // p has the value of the address of i
&p : 1DFDD0 (address) // the address of p is a short distance on the stack
*p : 5 (indirection) // the binary pointer for indirection returns the value of i
r : 5 (value) // r as a reference is just an alias for i.
&r : 1DFDDC (address) // the address of r is thus the same as the address of i
pp : 1DFDD0 (value)
&pp : 1DFDD0 (address)
*pp : 1DFDDC (indirection) // the indirection redirect the original indirection
// to return the address of i.
**pp: 5 (double indirection) // the double indirection dereferences the above
// to return the value of i.
Pointers provide a powerful way to access data by indirection. Every variable has an address, which can be obtained using the Address-of operator (&
). The address can be stored in a pointer.