Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C

Heap Memory Manager and Garbage Collector

4.88/5 (26 votes)
16 Apr 2008CPOL5 min read 2   774  
Describes a module to track heap memory allocations and to avoid memory leaks

Contents

Introduction

This article demonstrates a method to try and solve a common problem hated by most C++ programmers: memory leaks and memory overruns. The method used in this article is to track all memory allocated by the program. It also has basic protection checks whether the memory written to the allocated block has overrun the number of bytes actually allocated.

This method also lets you organize and group allocated memory by ID or by name. Giving a name to a memory allocation gives the advantage that you can get the memory allocated by name while not needing to keep a pointer running around. Giving a group ID to an allocation helps the programmer to keep memory allocations grouped and thus can call a single function to deallocate all memory of a certain group.

Why Track Memory?

Tracking memory is a very efficient way to keep an eye on all allocated memory. This gives you the ability to later enumerate all memory blocks allocated and also to deallocate all memory allocated. This is a garbage collection implementation.

Why Garbage Collection?

Garbage collection is used very much in modern programming languages like all .NET languages and Java. Having a garbage collection implementation in C++ does not come without its price, but it is always a safe way to remove memory leaks completely.

The Problem

When allocating memory, it has to be deallocated at some point or another. It is very easy not to deallocate the memory, either by programming errors or by the program logic. Example:

C++
void ServePizzas(int pizzacount)
{
    char* p = new char[pizzacount];

    // ... more code here...

    if (g_remainingpizzas == 0) return;

    // ... more code here...

    delete [] p;
}

In the code above, a memory block is allocated with the char* p = new char[pizzacount]; statement. The conditional statement in the middle has a return statement and causes the function to exit and all the code after it will not be executed. Since the delete [] p; statement will not be executed, the memory allocated will not be freed thus resulting in a memory leak.

Presented Solution

The solution presented in this article involves overriding the C++ new and delete operators. The new operator is normally called like this:

C++
char* p = new char[256];

This allocates a 256 byte array in pointer p. When using this library, the call remains as-is, but internally, the memory has been marked and tracked, allowing control over the allocated memory.

The new operator can also have parameters, and in this library, the new operator has three overloads with parameters. Let's look at some examples:

C++
// Allocate a 256 byte array using normal new operator.
char* p = new char[256];

// Allocate a 256 byte array with group ID 1.
char* p1 = new(1) char[256];

// Allocate a 256 byte array with name "my array"
char* p2 = new("my array") char[256];

// Allocate a 256 byte array with name "his array" with group ID 1.
char* p3 = new(1, "his array") char[256];

These were all the ways to allocate memory. To deallocate the memory allocated in the previous example, we normally do the following:

C++
delete [] p;
delete [] p1;
delete [] p2;
delete [] p3;

Now, let's say that we lost the pointers p, p1, p2, and p3, how can we deallocate the memory? There are many ways with the memory tracker library. Let's find out how.

Method 1: Recover Pointers

C++
// We cannot recover p since it is not named.
char* p = Null;//???

// We cannot recover p1 since it is not named.
char* p1 = Null;//???

// Recover p2:
char* p2 = hmt_getnamed("my array");
delete [] p2;

// Recover p3:
char* p3 = hmt_getnamed("his array");
delete [] p3;

In this method, we could recover most of the pointers but could not recover pointer p and p1 since we did not assign a name to them. There is still a way to deallocate them though. Let's look at Method 2:

Method 2: Deallocating Groups

C++
// Deallocate group 0; p and p2
hmt_garbagecollect(0);
// Deallocate group 1; p1 and p3
hmt_garbagecollect(1);

In this method, we deallocate group 1 since p1 and p3 were assigned to group 1, and group 0 since p and p2 were not assigned to any group and thus they belong to group 0. Let's have a look at a more brute-force approach:

Method 3: Deallocating All

C++
// Deallocate all memory allocated by new.
hmt_garbagecollect();

In this method, we force all memory allocated by the new operator to be deallocated, even p which was allocated using the normal new operator. This makes sure that there are no memory leaks.

More Information

If you need to know how much memory the program has allocated, you can use the following code:

C++
size_t memallocated = hmt_getallocated();
printf("Memory allocated using new: %d", memallocated);

If you need to print debug information on all the memory allocated:

C++
hmt_debugprint();

If you want to print debug information to see if any memory block has been overrun or trashed:

C++
hmt_debugcheck();

Implementation Details

Since we override the new statement, we have control on how much memory is allocated. Some more memory than actually requested is allocated to make space for tracking data.

The overhead for each memory allocation is 32 bytes for 32-bit processors, and 64 bytes for 64-bit processors. The following table represents the internal memory structure of an allocated block when the new statement is called. This also indicates how the memory tracker knows about all the memory allocated.

Number of bytes Field Description
4 bytes Trash UID Used to determine if memory has been overwritten
4 bytes Group ID Group ID of allocated block
4 bytes Hash of name If item is named, this will contain a hash of the name
4 bytes Flags Internal flags to describe memory block
Pointer * Size Size of block including overhead
Pointer * Previous block Linked list pointer to previous allocated block
Pointer * Next block Linked list pointer to next allocated block
n bytes Allocated memory ** Actual memory requested by the new statement
4 bytes Trash UID Used to determine if memory has been overwritten

* 4 bytes for 32-bit processor, 8 bytes for 64-bit processor.

** The pointer returned by the new statement points to this memory block.

With this structure, all memory blocks are known and linked together.

Conclusion

This module is part of a larger, more complex, memory pool module which should be published later on. There is, for sure, room for improvement in this module, one of them being the lookup of names, which is a linear search, and thus slow when there are a large number of memory allocations. In the memory pool module, a binary tree is used to speed up things, but that is for later.

This module is by no means the only method to do this kind of memory tracking, but it is a pretty efficient way to remove memory leaks.

Revision History

  • 16th April, 2008: Original article

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)