Introduction
In all modern languages like C# and Java, we gain benefits of garbage collection. What about implementing our own. In this article, I will try to explain how to implement garbage collector for C language.
What is Garbage Collection?
In C language, dynamic memory management operations are done with malloc()
and free()
functions. When a piece of memory area is required, programmer calls the malloc()
function and receives a pointer of this area, and releases this area using free()
function when it is not used anymore. This is really a very easy task, you create memory area using malloc()
and release it using free()
. What if the programmer forgets to call the free()
function or application breaks before the free()
function is executed? If free()
function is not called, operating system cannot use this area and still thinks that it is in use. Large chunks of unreleased memory areas can affect system performance vitally.
Need for an automated garbage collection mechanism is born at this point. Automated garbage collection mechanism guarantees that all allocated memory during program run are released at the end.
There are a lot of garbage collection algorithms such as mark and sweep, copying, generational, reference counting, etc. In this article, I will try to explain mark and sweep algorithm.
What about Conservation?
Garbage collectors (abbreviated GC from now on) should not force developers to tag data or force to use special datatype
as pointers. GC also should work on existing source code. Working on existing code without compilation would be a more elegant solution. GC should not force to change on compilers. Conservative garbage collection approach provides GC solution preserving the above mentioned tasks.
In order to work properly, GC should have knowledge about the following tasks:
- Variables actively in use
- Which variable is a pointer and which is not
- Information of the allocated memory
Information about the allocated memory can be collected while GC allocates memory.
In C language collection about variables in use can be done with a special scanning on heap, stack and static data of the application. This solution is highly hardware dependent.
Also in C language, we do not have knowledge of a type at runtime. This means, at runtime phase it is not an easy task to distinguish pointers from non-pointers. Again we receive no assistance from the compiler. Once we have information about variables actively in use, we can scan this list with a special pointer identification algorithm to distinguish pointers. This step has some shortcomings but efficiency can be provided with elegant algorithms.
Conservative approach allows developers to use GC in their already written codes without any change on it. Developers call malloc()
function and never call free()
again inside the code. The rest is handled by GC as a smart servant.
Stop the World Approach
I mentioned that we scan memory areas of the application. We also need to release unused memory areas. These operations take additional CPU cycles. So when garbage is being collected, we need to use CPU. At this point, there are two main approaches for use of the CPU. These are stop the world and concurrent approaches.
Concurrent approach handles GC cycles on different threads. For this approach, complex locking mechanisms are needed. As a result, it benefits high performance which is desirable by most of the modern architecture. For further information, you can search on Tri-Color Marking Algorithm.
Stop the world approach stops program execution, does garbage collection and resumes program execution. This has a completely big disadvantage, it does not allow the application to use CPU while garbage is collected. This can cause the application to pause while garbage is being collected. Also we cannot use multi processor even if hardware has more than one CPU which can be a big performance gap. Although it has a lot of disadvantages, it is really very easy to implement so in this article we will use this approach.
Mark and Sweep Algorithm
Mark and Sweep Algorithm is the first algorithm which handles cyclic references. This algorithm is one of the most commonly used garbage collectors with combination of some other techniques.
Mark and sweep algorithm is a tracing collector so it traces through all available pointers to distinguish used and unused memory areas. It consists of two phases. The first phase is the marking phase. In marking phase, GC traces through all available variables and finds pointers using pointer identification algorithm. Once pointers are determined, marking phase finds the heap area of the pointers and marks them as used. In the second phase, GC traces through the heap and picks unmarked areas. Unmarked areas are the memory areas which are not currently used. These areas are reclaimed.
As mentioned, mark and sweep can handle cyclic references. Moreover, it includes no overhead on variables.
Root Sets
Conservative GC faces two main difficulties, the first is for identifying where to find root set and the second is how to identify pointers.
Root set can be described as the variables which are in use at time(t). Finding root sets without the assistance of the compiler is a highly system and hardware dependent issue. Root sets can be found on stack, registers and static area of the application. In order to implement our GC, we should find base addresses of these memory areas.
GC should discover the bottom and top of the stack. Stack is the main stack of the application. If we take a closer look into the CPU architecture, we can see that there is a dedicated stack which holds addresses of execution points, passed parameters to functions and local variables. Stack grow direction may change in each architecture. When a new item is pushed into the stack in some architectures, the stack grows downward and in some, it grows upward. GC should be aware of this. Stack bottom and top addresses can be found by combination of EBP, ESP and DS register values of 32bit architecture. Also there are alternative ways.
Static areas are held in the data segment register in 32Bit CPUs and stored in the heap of the running application. Static areas are the memory block where local static and global variables are held. In a realworld application, we can have some global and static local variables which hold pointers. GC should be aware of these variables.
Registers are CPU registers of the hardware. These memory areas are highly system dependent. GC should be aware of the root sets held in the registers before GC takes place. Reclaiming of the memory areas which are in use could cause severe bugs.
Pointer Identification
The second difficulty that Conservative GC faces is identification of the pointers. In C and C++ languages, pointers can be held inside the integer variables. In some cases, it is not an easy task to distinguish a pointer with a 32 bit integer value. As GC has no assistance from the compiler, it has to handle identification of the pointers by itself. In general, the approach for conservative garbage collector is that "GC must treat any word (integer) that it encounters as a potential pointer unless it can prove otherwise"¹.
While in this step, GC should be aware of pointers to pointers. In this project, I implemented depth first search as pointer traversal algorithm. In order to identify pointers, GC should have some test steps to filter pointers with non-pointers. Some of the tests are mentioned below:
- Does a potential pointer refer to the atom pointer.
- Does a potential pointer refer to application heap
- Does a potential pointer refer to root sets. If so, execute pointer traversal algorithm to find which portion of the heap it refers.
- If potential pointer refers to heap, traces through allocated block to find exact block that it points.
Atom pointers are the pointers which are used by GC itself. GC should distinguish these pointers from actual application pointers. Also GC should give the ability to the developer to identify custom atom pointers. Atom pointers are being skipped at pointer identification phase and they are not recognized as pointers by GC. GC never touches the memory areas of these pointers.
If a potential pointer passes these tests, it is treated as a pointer and marked as in use at mark phase. Pointer identification has some deficiencies such as false pointers. False pointers are the integer values which hold heap addresses. Assume that we have an integer i
which holds random 32bit value, also assume that this value is 0x003932e8
. When GC takes place, we have also a pointer p
which points to 0x003932e8
heap address with size in MBs. p
is set to NIL and not used anymore. Application requests new memory block but having less memory GC cannot allocate free space and steps into collection phase. In collection phase, p
should be reclaimed so it is not used anymore but i
can be recognized as a pointer actually which is not. This type of situations can be troublesome. Boehm reports that certain classes of data, such as large compressed bitmaps, introduce false references with an excessively high probability [Boehm, 1993].
Implementation
After a lot of theoretical information, let's take a look at how we can implement that type of automated memory manager.
As mentioned, the GC we will design will be highly system and hardware dependent. We will use IA32 architecture and Windows Operating system.
The first thing GC should do is to find root sets. Stack top can be found by retrieving address of the last created variable. In Windows environment, the address of the last created variable can be used to query active memory block using VirtualQuery
function. This function tells the base address and other properties of the related memory area². After calling VirtualQuery
function with top of the stack, we can retrieve full set of stack roots. This root set gives us the variables currently in use. Defining static
root sets requires another call to VirtualQuery
function. This time we query memory area using a created static
local variable. Register roots can be retrieved using asm
instructions.
When developer calls malloc
function of GC, our code should add additional header information to this memory block. This block is linked to doubly linked list. Using this list, we can store and query which memory areas are allocated by GC. In my implementation, allocation does not invoke collection step which it should do when system gets low on memory, it does only create new memory area using low level malloc
and returns address of this block. In future releases, my implementation of gc_malloc
should work in a smarter and elegant way.
In our implementations, the developer should be able to call collect
function of GC. Calling this function is not recommended but for flexibility we can allow developers to call our collect
function. Collect
function should invoke the following steps. First it should determine root sets, then it should invoke mark and sweep phases respectively.
In mark phase, GC should trace through whole root sets. Code should invoke pointer identification step for each possible pointer in the root set. Once possible pointer is passed identification step code should mark it as in use (in my current implementation, I have two lists. The first list holds used areas, and the second holds free areas. When I mark a pointer as used, I remove it from the used area and link it to a free area which decreases CPU use on sweep phase).
In sweep phase, GC should trace through the whole heap. In this step, only marked areas are not reclaimed and the rest is reclaimed. (In my current implementation, I free the memory area when it is not used anymore. This can cause performance penalties. A more advanced approach can be used at this step.)
The last thing GC should do is reclaiming the whole heap when the application quits. We can use atexit()
function of standard C. In this function, we will trace through the whole heap to reclaim all used memory.
Source Code and Last Words
This project is an open source project. Please feel free to join this project. If you wish to work on this project, please let me know. Source repository of this project can be found here.
Please note that this project is actively in development. Also the current version supports a fully working GC mechanism and it has a lot of deficiencies on performance issues.
References
Further Reading
- Garbage Collection - Algorithms for Automatic Dynamic Memory Management 1996, Richard Jones, Rafael Lins
- An introduction to garbage collection part II, Richard Gillam
- Mark-and-Sweep garbage collection
- Why conservative garbage collectors
- Automatic garbage collection
- Fast multiprocessor memory allocation and garbage collection, Hans-J.Boehm, HP Laboratories
- Composing high-performance memory allocators, Emery D. Berger, Benjamin G.Zorn, Kathryn S. McKinley
- Hoard: a scalable memory allocator for multithreaded applications, Emery D. Berger, Kathryn McKinley, Robert D. Blumofe, Paul R. Wilson
- Managing heap memory in win32
- Heap pleasures and pains
- The measured cost of conservative garbage collection, Bejamin Zorn
- Conservative garbage collection for general memory allocators, Gustavo Rodriguez-Rivera, Charles Fiterman
- Conservative garbage collection for C, Christian Höglinger