Array implementation in MFC

Milind Shingade

2.47/5 (8 votes)

7 Aug 2007CPOL12 min read

This article describes the implementation of arrays in MFC and the flow of control in the member functions implemented by CArray.

Arrays

An array is a data structure that consists of group of elements having a single name and are accessed by indexing. All the elements in an array have the same data type, and the array occupies a continuous area of storage. Most programming languages have a built-in array data type. Some programming languages provide operations and functions to work over arrays. Array elements are usually numbered, and individual elements are accessed by their numeric position in the array. Arrays are classified on the type of indexing. When more than one index is used to access the element in the array, then the array is called a multi-dimensional array. Generally, an n dimensional array requires n indexes to access the elements in the array. For instance, a two dimensional array requires two indices to access the elements in the array.

Arrays provide an efficient random access, but have inefficient insertion and deletion of elements except at the end of the array. Arrays provide a constant overhead in storing data. In some languages, this constant overhead is almost zero. Arrays have a fixed size. Although the size of the array can be adjusted, it is an expensive operation. Dynamic arrays are expandable arrays which automatically resize over a long period of time.

Arrays are used to implement other data structures like heaps, hash tables, deques, queues, stacks, strings etc. Associative arrays with integer keys are used to store non-unique data, missing data, or large range of indexes. Associative arrays save memory. Examples of associative arrays are hash, heap, deque, queue, and stack.

An index is used to access an element in an array. Arrays, at the conceptual level, are similar in all programming environments but, the index value for the first element changes. Based on the index value of the first element in the array, arrays are classified as, zero based, one based, and n based arrays. In zero based arrays, the first element has index 0; in one based arrays, the first element has index 1, and in n based arrays, the first element has index n. The n in the n based array is the lower bound. The upper bound can be fixed or variable.

Arrays having more than one integer for indexing are called multi dimensional array. The index in the case of multi dimensional array is a collection of integer. The number of integers in the index is always the same and is referred to as dimension of the array. An array with k integer's in the index is k-dimensional array.

Introduction to arrays in MFC

In MFC, the CArray class implements the concept of arrays. Arrays implemented by the CArray class can grow and shrink dynamically. The array index starts from 0, i.e., the first element in the array is at the zero-th position. Developers can fix the upper bound or allow the array to expand as elements are added at the index past the current bound. Even if some elements are NULL, memory is allocated continuously to the upper bound. The access time for the element in the array is constant, i.e., 1. The access time is independent of the array size.

The state variables in CArray

CArray is the template class which can hold data of the type passed as the template argument.

C++

template<class TYPE, class ARG_TYPE = const TYPE&>
class CArray : public CObject
{

The first template argument (TYPE) is the actual type of the data, and the second template argument (ARG_TYPE) is the data type for passing the arguments to the member functions of the CArray class. The default value for ARG_TYPE is const TYPE&. The member variables of the class have protected access so they are accessible by the derived classes.

C++

protected:
    TYPE* m_pData;   // the actual array of data
    INT_PTR m_nSize;     // # of elements (upperBound - 1)
    INT_PTR m_nMaxSize; // max allocated
    INT_PTR m_nGrowBy;   // grow amount

The member variables make the state of the array. The state of the array consists of the actual data, the number of elements in the array, the maximum allocated size of the array, and the amount of empty space to allocate when the array is required to grow (expand). The upper bound is always 1 less than the number of elements in the array.

The constructor of CArray

C++

template<class TYPE, class ARG_TYPE>
CArray<TYPE, ARG_TYPE>::CArray()
{
    m_pData = NULL;
    m_nSize = m_nMaxSize = m_nGrowBy = 0;
}

The constructor of the array has public access. The constructor initializes the state of the array to the start state. In the start state, the memory for the actual data of the array is initialized to NULL and the values for the number of elements, maximum size allocated, and the amount of space to allocate when the array is required to grow.

Get and Set member functions

CArray has five get/set member functions to get the state and to modify the state variables of the class.

C++

INT_PTR GetSize() const;
INT_PTR GetCount() const;
BOOL IsEmpty() const;
INT_PTR GetUpperBound() const;
void SetSize(INT_PTR nNewSize, INT_PTR nGrowBy = -1);

These member functions reduces content coupling. Also, the member functions except SetSize do not modify the state of CArray. GetSize() should be called to retrieve the size of the array, i.e., the number of elements in the array. Sometimes, GetSize() can be used to return different information related to the size of the array by reusing the CArray class, like the size of memory allocated for the array in bytes. GetCount() should be called to retrieve the number of elements in the array. IsEmpty() returns TRUE when the number of elements in the array is equal to zero. IsEmpty() should be called to determine whether the array is empty. GetUpperBound() should be called to retrieve the current upper bound of this array.

CArray implements a zero based array, so GetUpperBound() returns an integer that is less than the number of elements in the array. GetUpperBound(), GetSize(), GetCount(), and IsEmpty() return the information based on the value of the number of elements of the array. SetSize() accepts two parameters: nNewSize, the number of elements, i.e., new array size, and nGrowBy, the amount of empty space to allocate when the array is required to grow (expand). The default value for nGrowBy is always -1. If the value of nGrowBy is greater or equal to zero, then SetSize() modifies the attribute of the class that is used to allocate when the array is required to grow (expand). If nNewSize is zero, then the array is minimized to nothing by deleting the elements in the actual data array. Also, the maximum size of the array and the number of elements of the array is set to zero. If nNewSize is greater than zero and memory is not allocated for the actual data, then the SetSize() function checks for overflow only in Debug build, and allocates the memory to hold the number of requested elements, i.e., the value of nNewSize or the value of nGrowBy, whichever is greater. If the memory is already allocated for the actual data and nNewSize is greater than the size of the array, SetSize() initializes the actual data to zero and allocates more space for nNewSize - the size of the array - elements. If nNewSize is less than the size of the array, then SetSize() calls the destructor of the excess elements and sets the new size of the array. However, it does not clear the contents of actual data for all other conditions, if the value of nGrowBy is equal to zero.

SetSize() determines the growth to avoid heap fragmentation, as follows:

C++

if (nGrowBy == 0)
{
    nGrowBy = m_nSize / 8;
    nGrowBy = (nGrowBy < 4) ? 4 : ((nGrowBy > 1024) ? 1024 : nGrowBy);
}

Then the value for the maximum size of the array is computed. To compute the new maximum size of the array, SetSize() checks the value of nNewSize. If the value of nNewSize is less than the current maximum size of the array plus the value of nGrowBy, then the new maximum size of the array is the current maximum size of the array plus nGrowBy; otherwise, the new maximum size of the array is nNewSize. In Debug build, the program gives an assertion when the new size of the array computed is greater than the current maximum size of the array plus nGrowBy. After computing the new maximum size of the array, SetSize() throws an invalid argument exception when the newly computed maximum size of the array is less than the current maximum size of the array. SetSize() then checks for overflow in the Debug build. SetSize() then allocates memory for temporary data, sets the data to zero, releases the memory of the actual data without calling the destructor of each element, and assigns the memory location of the temporary data to the actual data. SetSize() also initializes the new size of the array and the maximum size of the array.

Clean-up operations

CArray has two member functions that can be used for clean-up operations:

C++

void FreeExtra();
void RemoveAll();

FreeExtra() cleans the extra memory allocated for the array. FreeExtra() checks the internal state of the object using the ASSERT_VALID macro. If the number of elements in the array is not equal to the maximum size allocated for the array, then FreeExtra() shrinks to desired size by creating temporary data of size equal to the number of elements, deleting the old actual data, and assigning the temporary data to the actual data. Also, the maximum size allocated for the array is set to the number of elements of the array.

RemoveAll() removes all the elements from the array by calling SetSize(0,-1);.

Accessing elements

Elements stored in the array are accessed by using the following functions:

C++

const TYPE& GetAt(INT_PTR nIndex) const;
TYPE& GetAt(INT_PTR nIndex);
void SetAt(INT_PTR nIndex, ARG_TYPE newElement);
const TYPE& ElementAt(INT_PTR nIndex) const;
TYPE& ElementAt(INT_PTR nIndex);

All the above functions validate the index of the array passed as the argument to be within 0 and the number of elements in the array by using the ASSERT macro. GetAt() retrieves the element if the index is within the range; otherwise, it throws an invalid argument exception. SetAt() sets the element at the index after validating the index. ElementAt() is the same as GetAt().

Directly accessing the data within CArray

To directly access the data stored in CArray, the following two functions are used:

C++

const TYPE* GetData() const;
TYPE* GetData();

GetData() just returns the pointer to the actual data.

Growing the array

Arrays can be expanded either by adding more elements, by appending another array, by inserting an element in the array, or by copying an array into another array. The following functions are provided by CArray:

C++

void SetAtGrow(INT_PTR nIndex, ARG_TYPE newElement);
INT_PTR Add(ARG_TYPE newElement);
INT_PTR Append(const CArray& src);
void Copy(const CArray& src);

SetAtGrow sets the element in the array and grows the array if necessary. SetAtGrow() checks the internal state of the object by using the ASSERT_VALID macro, and checks the index to be greater than or equal to zero by using the ASSERT macro. If the index is less than zero, then CArray throws an invalid argument exception. If the index is greater than the number of elements in the array, SetAtGrow() grows the array by calling SetSize(nIndex+1,-1); and assigns the element at nIndex.

Add() adds the element to the end of the array and grows the array if necessary. Add() calls SetAtGrow(m_nSize,newElement); to add the element at the end of the array; it returns the index of the newly added element. Append() appends another array to the existing array and grows the array if necessary. The arrays must be of the same type. Append() first validates the state of the array using the ASSERT_VALID macro, then checks that the array to append is not the same as the existing array by using the ASSERT macro. If the array to append is the same as the existing array, then Append() throws an invalid argument exception. Append() then calls SetSize(m_nSize+src.m_nSize) to make the array as big to hold the elements from the new array, and then calls CopyElements to copy the elements from the old array to the new array.

Overloaded operators

CArray overloads the [] operator for getting the element at the given index. The overloaded [] operator uses GetAt() and ElementAt() to return the elements at the given index.

Insertion and removal of elements from the array

To insert and remove elements from an array, CArray provides the following functions:

C++

void InsertAt(INT_PTR nIndex, ARG_TYPE newElement, INT_PTR nCount = 1);
void RemoveAt(INT_PTR nIndex, INT_PTR nCount = 1);
void InsertAt(INT_PTR nStartIndex, CArray* pNewArray);

The InserAt() member function of the CArray class first checks the validity of the internal state of the array, and throws an invalid argument exception only when the index is negative or the count is either negative or zero. If the index is greater than the number of elements in the array, then InsertAt() calls SetSize to grow the array so that it can contain more elements. Otherwise, the insertion is done in the middle of the array by growing the array to a new size using SetSize(), shifting the old data to make place for new data, and inserting the new data in the gap. Another flavor of InsertAt() is very similar to the one just explained: the only difference is it inserts the elements from the array.

The RemoveAt() member function of CArray first checks the validity of the internal state of the CArray object and throws an exception only when the index is negative or the count is negative, or the index plus count is greater than the number of the elements. RemoveAt() then calls the destructor of the elements from nIndex till nCount and moves all the elements from nIndex+nCount till the end of the array in the gap.

Overridden functions of CObject

CArray overrides the following functions from CObject:

C++

void Serialize(CArchive&);
#ifdef _DEBUG
    void Dump(CDumpContext&) const;
    void AssertValid() const;
#endif

Serialize() should be used to store the internal state of the CArray object to the persistent store. Serialize() first calls the base class Serialize() function, and checks if the archive state is storing the count of elements in the array; otherwise, it reads the number of elements from the persistent store. Serialize() then calls the global function SerializeElements().

Dump() should be used to dump the contents of the internal state to the output window of the debugger. This function should be used for debugging, and should only be available in Debug build. Dump() calls the Dump() of CObject and then dumps the elements using DumpElements only when the depth of the dump context is greater than zero.

AssertValid() checks the validity of the internal state of the object. AssertValid() should only be used in Debug release. In AssertValid(), if memory is not allocated and the number of elements and the maximum size of the array is more than zero, then the program gives an assertion. Otherwise, the program asserts when the number of elements is negative, the maximum size is negative, or the number of elements is more than the maximum size of the array.

History

Initial version: Poor in format. Without code examples.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)