I list various array-and-pointer-problems I have heard or encountered. They’re so basic and important that they are worth a detailed discussion.
Can a const be an Array Bound?
It depends. Just like C89, C++ requires that the number of elements of the array be a constant expression. A const
initialized by constant expression is itself a constant expression. Otherwise, it’s not.
const size_t SIZE = 100
int a[10]; int b[SIZE];
void f(const size_t& size)
{
int c[size];
}
Note that in C99, variable length array (VLA) is supported.
Does Array Bound Store Together With Array?
Arrays are not self-describing because the number of elements of an array is not guaranteed to be stored with the array. This implies that to traverse array that does not contain a terminator the way character string
s do, we must somehow supply the number of elements. And most C++ implementations offer no range checking for arrays.
void f(int a[], unsigned int size)
{
for (int i = 0; i < size; ++i) cout << a[i] << endl;
}
Why is a[5] Equal to 5[a]?
Except where it has been declared for a class, the subscript operator []
is interpreted in such a way that E1[E2]
is identical to *((E1)+(E2))
. Because of the conversion rules that apply to +
, if E1
is an array and E2
an integer, then E1[E2]
refer to the E2-th member of E1. Therefore, despite its asymmetric appearance, subscripting is a commutative operation.
int main()
{
int a[10];
a[3] = 3;
cout << 3[a] << endl;
}
Are Array and Pointer the Same?
No. The following declarations are NOT the same:
#include <typeinfo>
char ac[10];
char *pc = ac;
cout << typeid(ac).name() << sizeof(ac);
cout << typeid(pc).name() << sizeof(pc);
A picture of ac
and pc
in memory might appear as follows:
address: 1000 1234 1235 1236 1237
pc +--------+--->>>>>------+-----+-----+-----+-----+
| 1234 | ac | a | b | c | d | ...
+--------+ +-----+-----+-----+-----+
ac[0] ac[1] ac[2] ac[3]
*pc *(pc+1) and so on
How Can I Declare An Array?
Remember that array and pointer are NOT the same, so as their declaration forms. Below is a WRONG declaration.
char ac[10];
extern char *ac;
int main()
{
ac[3] = 'c';
return 0;
}
This could easily raise access violation error to you. The steps required to address pc[3]
and ac[3]
are different.
To evaluate pc[3]
, load the value of the pointer pc
from memory and add 3
. Then load the character stored at this location (pc + 3)
into a register. Assuming the small memory model, the appropriate MASM code might look like the following:
MOV BX, pc
MOV AL, [BX + 3]
Because ac
is a constant, it can be stored in the final MOV
instruction, which eliminates two MOV
instructions. The MASM code might look like the following:
MOV AL, [offset ac + 3]
If you declare array as a pointer as above, the compiler generates code to perform pointer-type addressing rather than array-type addressing. The compiler uses the first few bytes of the array as an address (rather than as characters) and accesses the memory stored at that (unintended) location.
The correct way to declare an array is to provide empty first dimension as follows. Declaration of multi-dimension array is quite similar.
extern char ac[];
You may wonder why this empty dimension of array declaration works. In fact, “char []
” and “char [10]
” are two different types! You’re right – they’re different types. However, the Standard doesn’t require them to be exactly the same here, considering that “char[]
” in our declaration is an incomplete type.
"The declared type of an array object might be an array of unknown
size and therefore be incomplete at one point in a translation unit
and complete later on; the array types at those two points ("array
of unknown bound of T" and "array of N T") are different types.
- C++ Standard, 3.9/7
Why Can I Assign an Array Name to a Pointer Then?
This is why some people think an array is equal to a pointer, which is definitely not right. It’s all about the famous “decay convention”. When the name of an array is used in an expression, it’s treated as a pointer to the first element of the array. Nothing more, nothing less.
It is impossible to avoid this implicit conversion. In other words, there is no way of declaring a function so that the array is copied element by element when the function is called. Fortunately, there is no implicit or explicit conversion from a pointer to an array.
During the conversion, the size of the array is lost.
void f(char *x);
void g(int (*x)[5]);
int main()
{
char a[10];
int b[3][5];
char *pa = a;
f(a);
cout << typeid(a + 3).name();
int (*pb)[5] = b;
g(b);
cout << typeid(b + 3).name();
cout << typeid(&*a).name(); cout << typeid(*&a).name(); }
How to Define a Pointer Array?
Array of pointers has a slightly different syntax with a pointer to array. It’s easy to mix them up.
int a[5][3];
int (*p)[3] = a;
int *p[3];
typedef int INTARR[10]; typedef int (*ARRPTR)[10]; typedef int *[10];
How to Specify the Interface if I Need to Pass an Array to a Function?
In C++, arrays are stored row-wise (last subscript varies fast) and that the first subscript in the declaration helps determine the amount of storage consumed by an array but plays no other part in subscript calculations. That is, you don’t need it to compute the address of an element. That is the reason you don’t have to specify the first dimension in a function that is being passed a multi-dimension array.
void f(int *x); void f(int x[10]); void f(int x[]);
void f(int (*x)[3]);
void f(int x[5][3]);
void f(int x[][3]);;
Since the first dimension is not used, it’s most likely that compiler will optimize the latter two versions to the first one. Although they compile the same, the latter two are more preferable in interfaces as they explicitly tell others that an array is expected.
What Exactly is Needed by the Compiler to Calculate a[i][j]?
The compiler is responsible for generating code to calculate the address of a[i][j]
correctly. The calculation is quite easy, based on the following five numbers, excluding the size of the 1st dimension as mentioned above.
- Array name, i.e., the address of the first element in the array
- Element type, e.g.,
int
- Size of dimensions excluding the 1st
- Row index
i
- Column index
j
T a[nRows][nColumns];
a[i][j] = *(a + i * nColumns * sizeof(T) + j * sizeof(T))
= *(a + i * sizeof(row) + j * sizeof(T))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Pointer to T
address(a[i][j]) = a + i * sizeof(row) + j * sizeof(T)
You should now realize why C/C++ array index starts from zero instead of one. The code generated by compiler could have been more complex if it’s not zero.
Why Can’t a Pointer to Pointer be Used as a 2D Array?
It is important to remember that decay convention mustn’t be applied recursively (more than once) to the same array, so a 2D array is NOT equivalent to a double pointer.
int a[10];
int *pa = a;
pa[3] = 3;
int b[10][5];
int **pb = b;
int **pc = (int **)b;
pc[0][0] = 3;
When a double pointer that points to the first element of an array, is used with subscript notation <span style="font-family:Courier New;">pc[0][0]</span>
, it is fully dereferenced twice. After two full dereferencings, the resulting object will have an address equal to whatever value was found INSIDE the first element of the array. Since the first element contains our data, we would have wild memory accesses.
How Can I Pass a 2D Array to a Function?
- No tricks. Pass an array with an empty first dimension.
int a[5][3];
void f(int a[][3], unsigned int nRows)
{
for (unsigned int i = 0; i < nRows; ++i)
for (unsigned int j = 0; j < 3; ++j)
cout << a[i][j] << endl;
}
f(a, 5);
- Pointer to array. Similar as above.
int a[5][3];
void f(int (*a)[3], unsigned int nRows)
{
for (unsigned int i = 0; i < nRows; ++i)
for (unsigned int j = 0; j < 3; ++j)
cout << a[i][j] << endl;
}
f(a, 5);
- Use a single pointer. The array is “flattened”. With this method, you can create general-purpose functions. The dimensions don’t appear in any declaration, so you need to add them to the argument list. Note that you need explicitly cast the 2D array to a pointer to integer before passing the pointer in.
int a[5][3];
void f(int *a, unsigned int nRows, unsigned int nColumns)
{
for (unsigned int i = 0; i < nRows; ++i)
for (unsigned int j = 0; j < nColumns; ++j)
cout << *(a + nColumns * i + j) << endl;
}
f((int *)a, 5, 3);
- Double pointer. Use an auxiliary array of pointers. With this method, you can create general-purpose functions, if you allocate “
index
” at run time.
int a[5][3];
void f(int **a, unsigned int nRows, unsigned int nColumns)
{
int** index = new int*[nRows];
for (unsigned int i = 0; i < nRows; ++i)
index[i] = (int*)a + nColumns * i;
for (unsigned int i = 0; i < nRows; ++i)
for (unsigned int j = 0; j < nColumns; ++j)
cout << index[i][j] << endl;
delete index;
}
f((int **)a, 5, 3);
- Single pointer. Use an auxiliary array of pointers.
int a[5][3];
void f(int *a, unsigned int nRows, unsigned int nColumns)
{
int** index = new int*[nRows];
for (unsigned int i = 0; i < nRows; ++i)
index[i] = a + nColumns * i;
for (unsigned int i = 0; i < nRows; ++i)
for (unsigned int j = 0; j < nColumns; ++j)
cout << index[i][j] << endl;
delete index;
}
f((int *)a, 5, 3);
How Can I Allocate a 2D Array Dynamically?
- Allocate array row by row. The number of columns must be known at compile time while the number of rows can be known at run time. The allocated memory is contiguous and the allocated array is a matrix of
nRows * nColumns
.
const unsigned int nRows = 5;
const unsigned int nColumns = 3;
typedef int rowArray[nColumns];
rowArray *a = new rowArray[nRows];
int count = 0;
for (unsigned int i = 0; i < nRows; ++i)
for (unsigned int j = 0; j < nColumns; ++j)
a[i][j] = ++count;
delete[] a;
- Pointer to array. The number of columns must be known at compile time while the number of rows can be known at run time. The allocated memory is contiguous and the allocated array is a matrix of
nRows * nColumns
.
const unsigned int nRows = 5;
const unsigned int nColumns = 3;
typedef int (*rowPtr)[nColumns];
rowPtr a = (rowPtr) new int[nRows * nColumns];
int count = 0;
for (unsigned int i = 0; i < nRows; ++i)
for (unsigned int j = 0; j < nColumns; ++j)
a[i][j] = ++count;
delete[] a;
- Array of row points. Allocate 1D array for each row. The number of rows must be known at compile time while the number of columns can be known at run time. The allocated memory is NOT contiguous. Thus, not all the methods in the previous section can be used because some of them assume the memory is contiguous.
const unsigned int nRows = 5;
const unsigned int nColumns = 3;
int *a[nRows];
for (unsigned int i = 0; i < nRows; ++i)
a[i] = new int[nColumns];
int count = 0;
for (unsigned int i = 0; i < nRows; ++i)
{
for (unsigned int j = 0; j < nColumns; ++j)
{
a[i][j] = ++count;
}
}
for (unsigned int i = 0; i < nRows; ++i)
delete a[i];
- Array of row pointers. But both the number of rows and columns are known at run time. The allocated memory is NOT contiguous.
const unsigned int nRows = 5;
const unsigned int nColumns = 3;
int **a = new int*[nRows];
for (unsigned int i = 0; i < nRows; ++i)
a[i] = new int[nColumns];
int count = 0;
for (unsigned int i = 0; i < nRows; ++i)
{
for (unsigned int j = 0; j < nColumns; ++j)
{
a[i][j] = ++count;
}
}
for (unsigned int i = 0; i < nRows; ++i)
delete a[i];
delete[] a;
- There is a way to allocate a contiguous block of memory while the number of rows and columns are both known at run time.
const unsigned int nRows = 5;
const unsigned int nColumns = 3;
int *a = new int[nRows * nColumns];
int **p = new int*[nRows];
for (unsigned int i = 0; i < nRows; ++i)
p[i] = a + nColumns * i;
int count = 0;
for (unsigned int i = 0; i < nRows; ++i)
for (unsigned int j = 0; j < nColumns; ++j)
a[i][j] = ++count;
delete[] p;
delete[] a;
What is a String Literal?
A string
literal is a character sequence enclosed within double quotes:
"this is a string literal"
A string
literal has very interesting properties:
- The compiler automatically allocates a terminating character ”’ to it.
cout << strlen("string"); cout << sizeof("string");
- The type of a
string
literal is “array of the appropriate number of const
characters”.
cout << typeid("string").name();
- A
string
literal can be assigned to a char*
. This is allowed for forward compatibility with old C/C++ code. However, you should NOT modify the contents of a string
literal via this pointer.
char *p = "string";
p[0] = 't';
char q[] = "string";
q[0] = 't';
- A
string
literal is statically allocated so that it is safe to return one from a function.
const char* error_message(int i)
{
return "range error";
}
- Whether two identical
string
literals are allocated as one is implementation-defined.
const char *s = "string";
const char *t = "string";
void f()
{
if (s == t) cout << "one!\n";
}