Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / desktop / MFC

Portable thread-based garbage collection library, for C++

4.47/5 (6 votes)
19 May 20065 min read 1   332  
a description of LibGC, a portable thread-based garbage collection library for C++

Introduction

We all know how painful memory management in C++ is, don't we? :-) ... Of course, there are solutions like reference counting with smart pointers, but the programmer should exercise great care not to introduce cyclic references. Programmers that use garbage-collected languages laugh at us, don't they? :-).

After some articles I have written the previous years about C++ and garbage collection (see here and here), the problem of memory management came up again in one of my personal projects, so I revisited my code, found out the bugs I had, redesigned it, and the result is in the attached file.

LibGC is a thread-based garbage-collection library with the following features:

  • thread-based garbage collection.
  • precise collection.
  • uses the mark & sweep algorithm.
  • portable; uses no low-level O/S-specific features.
  • no dependencies; headers only.
  • weak pointers.
  • garbage-collected strong and weak arrays.
  • customizable garbage collection with user-defined finalize and mark callbacks.
  • wrapper class for non-garbage-collected classes.

Library Design

The library is created such that it is easy to use, especially by people who are used to using special pointer classes for managing memory.

The library provides thread-based garbage collection. What does that mean? It means that each thread can have its own collector. The reason for this decision is two-fold:

  1. I did not want to introduce dependencies on thread libraries, and
  2. if an application has many threads, then all these threads will compete for the collector, thus slowing the application down.

I think it is better that each thread has its own collector; threads can exchange data with synchronized queues.

The library, as it is provided, only supports one thread, the main thread. If you want to provide thread support, you have to define a global function that returns a different collector for each thread, using thread local storage. It is not too difficult, and I will write an example for that later.

The main classes of the library are:

  • garbage_collector - the class that does garbage collection and maintains the garbage collection context.
  • object - the base class for all garbage collected classes.
  • ptr<T> and weak_ptr<T> - the pointer classes for declaring global/stack pointers.
  • member_ptr<T> and weak_member_ptr<T> - the pointer classes for declaring member pointers.
  • array<T> and weak_array<T> - the pointer classes for creating garbage-collected arrays.

Using the Library

The library does not contain source files; it only contains header files. Unzip it in a directory of your choice, then include the file libgc.hpp in your project. The zip file contains the code documentation.

Declaring Garbage-Collected Classes

Declaring garbage-collected classes is very easy. Here is an example:

#include "libgc.hpp"
using namespace libgc;

class Foo : public object {
public:
};

You have to use the class member_ptr<T> in order to declare garbage-collected pointers. These members must be initialized with an owner object, so that the collector has a precise picture of an object's layout. You can also initialize members:

#include "libgc.hpp"
using namespace libgc;

class Foo : public object {
public:
    member_ptr<Bar> m_next;

    Foo() : m_next(this, new Bar) {
    }
};

Declaring Garbage-Collected Pointers

Global and stack pointers to garbage-collected objects must be declared with the class ptr<T>. This class automatically registers itself to the current garbage collector on construction, and un-registers itself on destruction. Registration happens using a single-linked list as a stack, so it does not take too much code. This is necessary so that the collector knows where the pointers are. Here is an example:

int main() {
    ptr<Foo> foo1 = new Foo;
}

Weak Pointers

Sometimes, it is necessary to have pointers that do not cause objects to be kept around, and that are nullified automatically when the pointed object is collected. These pointers are called weak. The library provides both member and root weak pointers. Usage is exactly the same as strong pointers:

#include "libgc.hpp"
using namespace libgc;

class Foo : public object {
public:
    weak_member_ptr<Bar> m_next;

    Foo() : m_next(this, new Bar) {
    }
};

int main() {
    weak_ptr<Foo> foo1 = new Foo;
}

When garbage collection happens, all root and member weak pointers that point to the collected objects are set to NULL.

Arrays

The library also provides the capability of declaring garbage-collected arrays of garbage-collected objects. The declaration is very simple:

int main() {
    ptr<array<Foo> > array1 = new array<Foo>[10];
}

You can then handle the array as a normal C++ array:

array[0] = new Foo;
array1[0]->action();

Weak arrays are the same as arrays, but they contain weak pointers instead of strong pointers.

Customization

The functionality described above is not hard-coded into the garbage_collector class. In fact, that class only provides the absolute minimum interface for handling finalizing and marking. The actual algorithms for finalization and marking are provided by user-defined classes. The library's classes are coded in the same way. Check out the documentation of the class for how to call the methods malloc, free, mark, and mark_weak in order to find out how to customize certain aspects of the library.

If you want super-fast scanning of member pointers, for example, without using special classes, then you can provide your own mark routine:

class Node {
    Node *m_prev;
    Node *m_next;

    void *operator new(size_t size, temp_ptr &tmp = temp_ptr()) {
        return get_garbage_collector().malloc(size, 
               finalize_node, mark_node, 0, tmp);
    }

    static void finalize_node(void *mem) {
        cout << "finalized!\n";
    }

    static void mark_node(void *mem) {
        Node *node = (Node *)mem;
        get_garbage_collector().mark(node->m_prev);
        get_garbage_collector().mark(node->m_next);
    }
};

Thread support can be provided by specifying a custom 'garbage collector accessor function' which returns a thread-specific collector; for example, under Win32:

//returns a collector specific to thread
garbage_collector &get_thread_collector() {
   //gc_index must have been allocated elsewhere
   return *(garbage_collector *)TlsGetValue(gc_index);
}

int main() {
    //set the gc access function
    set_garbage_collector_access_function(&get_thread_collector);
}

External Classes

The library provides a wrapper class for those classes that are not garbage-collected. For example, you can make an STL list garbage-collected, like this:

new GC<std::list<int> >;

The Garbage Collection Algorithm

The algorithm used internally is the mark & sweep algorithm: first the root set is scanned for live objects, then objects that are unreachable are finalized and destroyed. Upon exit, all remaining objects are finalized and destroyed.

Garbage collection happens automatically from inside the allocation function, when the number of allocated bytes exceeds the collector's limit. The method garbage_collector::limit() can be used to retrieve the number of bytes set as limit, and the method garbage_collector::set_limit(size_t size) can be used to set the limit. The default value for the limit is 64 MB.

The number of bytes that are allocated are retrieved by the method garbage_collector::alloc_size().

Semantics

The library follows the C++ semantics closely; be careful to initialize pointers (global/stack, members, arrays) to meaningful values before using them, otherwise your application will most certainly crash. The rationale is that, in C++, you do not pay for what you do not use, so this principle is followed by this library too.

You can always delete objects and arrays, using the operator delete.

Performance

What I described above is, of course, not the fastest possible collection; I consider it, though, more than enough to solve my memory management problems. Maybe, that's valid for you. If not, there are other collectors around which are industrial strength, like this one.

Restrictions

Of course, there are some restrictions:

  • you must never set a pointer to point to an object that is a member (as a value) of another object. Pointers shall point only to heap-allocated objects.
  • You should never allocate garbage-collected objects on the stack.

Conclusion

Not many words...only happy coding!

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here