Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

A Generational Garbage Collector in C++

0.00/5 (No votes)
17 Dec 2001 2  
A Garbage Collector framework that is based upon Generational Copying

Introduction

The inspiration of writing this code came from a paper titled "A Real Time Garbage Collector Based on the Lifetimes of Objects" by H. Liberman and C. Hewitt. In the paper, the authors have described a heap storage framework that makes storage for short-lived objects cheaper than storage for long-lived objects. Although the algorithm was presented for LISP and similar systems, the same concept is implemented in C++. The paper is available at: http://lieber.www.media.mit.edu/people/lieber/Lieberary/GC/Realtime/Realtime.html

Dynamic memory management is often referred to as making memory requests from the operating system during different courses of program execution. All dynamic memory requests are satisfied through an area of memory called heap. In C++ this is done through the new operator. The operator is implemented by a call to malloc, which grants a pointer to the object after allocating its memory on the heap. However this flexibility of acquiring memory dynamically comes at a price: i.e. it becomes the responsibility of the programmer to return dynamically allocated memory to the free pool, by using the delete operator. The delete operator in turn calls free, which reclaims memory allocated on the heap. When delete has been called, the object is destroyed and its destructor has been called. Further access to the pointer data can cause unpredictable results.

The term "Garbage Collection" is an automated process of finding previously allocated memory that is no longer reachable by the program and then regaining that memory for future use. The garbage collector does this by several ways, one of which is traversing all pointers on the heap and finding weak pointers (pointer that allows the object memory to be recovered). In simple terms, use of Garbage Collectors leverages the programmer of worrying about calling delete every time new is called. Automated Garbage Collectors can reduce development cycles for a large-scale software by approximately 30% and additionally reduce the memory leaks, resulting in a more stable system.

Some systems also use reference counting for implementing garbage collection, however they have unnerving disadvantages of their own:

  1. The inability to reclaim circular structures i.e. circular structures can have non-zero counts, even when garbage.
  2. Often results in memory fragmentation.
  3. Its expensive since every allocation/freeing requires addition/subtraction.

Due to the above-mentioned problems, it is not a viable option to use reference counting as a primary answer to memory management problems especially when program code begins to increase. Nevertheless, there have been very few implementations of garbage collectors available in the public domain. One of them is by Silicon Graphics. This is intended to be a general purpose, garbage collecting storage allocator for gcc compiler. The algorithms used are described in:

  1. Boehm, H., and M. Weiser, "Garbage Collection in an Uncooperative Environment", Software Practice & Experience, September 1988
  2. Boehm, H., A. Demers, and S. Shenker, "Mostly Parallel Garbage Collection", Proceedings of the ACM SIGPLAN '91 Conference on Programming Language Design and Implementation, SIGPLAN Notices 26, 6 (June 1991), pp. 157-164.

Design Approach

The heap that this garbage collector maintains is made up of several generations. The user creates pointers by a wrapper pointer class Pointer<T>. Once memory allocation request is made, the garbage collector returns a pointer to the object (created in its own heap space) and also records its address (remember the pointer is itself created on the stack) in a vector<void**>. Similarly the size of object is also recorded for future generational copying. Once the wrapper pointer runs out of scope, or a pointer assignment is made, the garbage collection algorithm is run, to verify integrity of all pointers. The algorithm of garbage collection involves moving all accessible objects out of current generation, evacuating them to another generation and then iterating the heap for resolving all pointers to the object that has been recently relocated. Finally the memory of current generation is reclaimed. If there is any unreachable data in the generation, it is also recovered.

Garbage Collector Design

The garbage collector functionality is implemented by a class named GC. It has all static member functions. At any time during program execution, the process of garbage collection can be forcefully initiated by a call to GC::Collect(). The maximum number of generations can be queried by a call to GC::GetMaxGeneration(). If you are curious to find out the total number of bytes allocated on the memory managed by the garbage collector, you can call GC::GetTotalBytesAllocated().

class GC
{
private: 
//Array of pointers to pointers that are made on the stack 

	static std::vector< void**	> _PointersOnStack; 
//  Holds the size of objects that are made on the stack 

	static std::vector<unsigned int > _SizeOfObjects;  
//Holds all the generations 

	static std::vector< Generation* >  _Generations;
// Holds total bytes allocated on the heap

	static int BytesAllocated;

public:
// Invokes the GC for all generations

	static void Collect();
// Invokes the GC only upto and including the generation specified

	static void Collect( Generation*& pGeneration );
// Call this to allocate memory from the garbage collector

	static void* operator new( size_t, void** pStackPtr );
// Gets maximum number of generations that have been made

	static int GetMaxGeneration();
// Gets the total memory (bytes) that has been allocated on the heap

	static int GetTotalBytesAllocated() { return BytesAllocated; }
// Returns the total number of generations in the GC

	static int GetGenerationCount() { return _Generations.size(); }	
// Sets the total bytes that have been allocated by the garbage collector

	static void SetTotalBytesAllocated( int Value ) { BytesAllocated = Value; }
};

The pointers that are iterated during the garbage collection process must point to objects of type Pointer<T>. This class implements the functionality of smart pointers. It overloads several operators including assignment operator and automatic conversion operators. Garbage collection process is invoked whenever either the Pointer object runs out of scope or an assignment is made.

template <class T>
class Pointer 
{
private:
// Invoked on assignment and destruction

	void Destroy()
	{
		p = NULL;
		GC::Collect();		
	}
public:
// Wrapped pointer

	T* p;
// Constructor

	Pointer( T* p_ =  NULL ) : p( p_ )	{} 
// Destructor 

	~Pointer() 
		{ GC::SetTotalBytesAllocated( GC::GetTotalBytesAllocated()
		                                               - sizeof( *p )
		);p->~T(); // Explicitely call the destructor

		Destroy();
	}
// Assignment operator 1

	Pointer& operator = ( Pointer<T> & p_ ) { return operator = ( ( T* ) p_); }
// Assignment operator 2

	Pointer& operator = ( T* p_ ) 
	{     
		Destroy();
		p =  
		p_; return *this;
	} 
// Automatic type conversion to

	T* operator T*() { return p;} 
// Dereferencing

	operator T& operator*() { return *p; } 
// Pointer indirection    

	operator T*operator->() { return p; }    
// For automatic type conversion during new call

	operator void**() { return ( void** ) & p; }
};

The various generations of heap are implemented by a class Generation. Each generation wraps a table of contiguous memory locations, so by having newly created objects close together you can have fewer page faults and the objects will also reside in the processor cache. There are many generations and generation with is the highest Generation number contains the objects most recently created. Each generation has certain capacity and when the objects on heap overrun that capacity, a new generation is automatically created. The process of condemning a generation and collecting memory lost as a result of weak references is called scavenging.

class 
Generation {  
private:
// The generation number int

	_GenerationNumber;
// Pointers to the objects in the generation 

	std::vector<void* >  _Pointers;
// Points to the top of memory available in the generation

	void* _pTopOfMemory;
// The generation allocates memory from this location	

	void* _pNextObjPtr;
// Returns maximum size available for one generation

	static int MaxSize;
// Table of memory inside generation

	BYTE MemoryTable[MaxSize];
public:
// Gets the remaining memory of the Generation

	int GetRemainingMemory() const;
// Returns maximum memory that can be allocated for one generation

	int GetTotalMemory() const { return MaxSize; }
// Allocates memory for an object and returns its void*

	void* Allocate( size_t Size );
// Gets the generation number

    int GetGenerationNumber() const { return _GenerationNumber; }
// Performs bitbybit copying 

	static void* CopyBitByBit( const void* pSourceLocation, size_t Size,
	                                      Generation* pTargetGeneration );
// delete operator	

	void operator delete( void* v ) { free( v ); }	
public:
	Generation( int GenNumber = 0 );
};

How to use it

Objects allocated with the built-in "::operator new" are uncollectable. Only objects allocated with overloaded new operator that takes address of pointer as the second argument are collectable. For ease of programming, I have also written automatic conversion function of Pointer to void**:

void* operator new( const size_t sz, void** pVoid )
{
	return GC::operator new( sz , pVoid );
}

For example the following code demonstrates differences in object creation and usage by above garbage collector:

int* pInt = new int; // Traditional approach to create pointers - memory leaks !!!	

Pointer<int> pInt = new(pInt) int; // My approach - no memory leaks

Run the demo program and see memory allocated in the task manager of WinNT/2k. You would observe considerable difference in performance of the system with and without garbage collector. In order to prove the quality of garbage collection, I have also overloaded global new and delete operator to increment and decrement memory allocation counter. This counter can be a valuable indicator for detecting memory leaks in program.

Earlier I had many problems with a call to virtual functions in pointers allocated on GC's heap, but now all those have been solved. Right now memory allocation for arrays of objects has not been implemented, but that would be done in near future. I would love to have any suggestions for improving the design and functionality of the garbage collector.

Revision History

26 Jun 2002 - Resized and reformatted code to prevent scrolling

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here