Introduction
This document describes the design and implementation of a new idea for automating memory management that can solve the problem of cyclic references.
Boost contains some nice memory management classes: shared_ptr
, weak_ptr
, and intrusive_ptr
. Memory management using these classes is very nice, but reference counting has one flaw: it cannot handle cyclic references.
In order to handle cyclic references with reference counting, the programmer is left with the task to break cycles using weak pointers. Such a task may not scale well in time and space: as a project becomes larger and more complicated, cycles are harder to spot, especially when cycles are introduced via subclasses.
The problem gets worse when a project evolves over a long period of time, with different programmers assigned to a project. During transition from one person to another, knowledge of cycles may be lost.
The code in the file 'main.cpp' implements a solution for automating memory management, borrowing some ideas from true garbage collectors. The solution can handle referential cycles and also delete objects in a good order (finalization order is a well known problem for garbage collectors) when the objects are no longer referenced anywhere from the program.
The presented solution is not a panacea that solves all the aspects of memory management. At best, it can be thought of as complementary to existing methods.
Background
In true garbage collectors, the collector scans the object graph, starting from the root set, marking each visited object as reachable. Objects left untouched at the end of the collection are deemed unreachable and therefore collected. For example, if we have three root references R1, R2, and R3, and objects A, B, and C:
R1 ---> A ---> B
|
R2 --------|
R3 ---> C
If the reference R3 is removed, then the object graph becomes:
R1 ---> A ---> B
|
R2 --------|
R3 C
The collection algorithm execution goes like this:
- From R1, A is reachable.
- Mark A as reachable.
- Scan A.
- From A, B is reachable.
- Mark B as reachable.
- Scan B.
- From R2, B is reachable.
- B is already marked, do not mark it again.
- A, B is reachable; C is not reachable.
- Collect C.
Problems of this Approach in C++
In order for this to work, we must know the root set. Unfortunately, in C++, the root set is not known (compiler assistance is required for this), so this algorithm cannot be applied.
One solution would be to register root pointers to a ptr
collection, either explicitly or implicitly in the constructor of a special pointer class. But that does not work with STL containers, because STL containers themselves must use the specified pointer classes. Unfortunately, this is not always true. For example:
ROOT_PTR_COLLECTION
|
|--> R1 ---> A ---> B
| |
R2 --------|
|
R3 ---> C
But, when we have an STL container:
ROOT_PTR_COLLECTION
|
|--> R1 ---> A ---> B
| | |
|--> R2 --------|
| |
|--> R3 ---> C
|
|--> R4 ----> STL CONTAINER
| ~~> D
| ~~> E
In the above diagram, the arrow '~~>' represents a non-garbage collected link (a shared_ptr
perhaps). When the collection runs:
- From the root
ptr
collection, the reference R4 is reached. - From R4, the STL CONTAINER is reached.
- C and D are not reachable.
- Collect C and D.
So, C and D will be collected, even though there is an STL container that refers to these objects.
Solution
In the traditional approach, the collection algorithm can be summarized in the following sentence:
An object is collectable when it is no longer accessible by any path that starts from the root set.
The above sentence can be reversed without altering the essence of the collection algorithm:
An object is collectable when it is no longer accessible by any path that ends at the root set.
Using this reversed approach, a set of smart pointer classes can be created which can be used to check if there is any path to a root pointer. The original example with the three references R1, R2, R3 and objects A, B, and C becomes:
R1 <--- A <--- B
|
R2 <-------|
R3 <--- C
When a reference is removed, we can check if there is another path that leads to a root reference. If there is no root reference that points to the object, the object can be safely deleted. For example, if the reference R3 is removed, our graph becomes:
R1 <--- A <--- B
|
R2 <-------|
R3 C
And now, the object C can be safely deleted.
Design and Implementation
In the file 'main.cpp', the following classes are defined:
The class is a boost::intrusive
list node (using boost::intrusive
) for performance reasons, since a pointer can only point to a single object at a time.
The class also contains a pointer to an owner object, set during the construction of a pointer instance. This pointer is used for knowing when a pointer is root: if a pointer is instantiated without an owner, then it is a root pointer.
- class '
object
', as a base class that contains a list of pointers that point to it. This is necessary so that the object graph can be traversed backwards towards the roots. The object class contains a flag 'm_lock
' which is used to prevent infinite loops during checking of an object for collectability. - class '
ptr_base
', which is a generic pointer class that encapsulates the logic of acquiring and releasing an object:
- The operation '
acquire
' adds the pointer to the list of pointers of the pointed object. - The operation '
release
' removes the pointer from the list of pointers of the previously pointed object. It then calls the object to dispose itself, if it is collectable. This is where the graph is scanned backwards towards the roots.
- A class '
ptr<T>
', which is a simple derived class that adds type checking to the class 'ptr_base
'. The class has normal pointer semantics as well as STL pointer semantics (functions 'get()
' and 'reset()
').
Destruction Order
When an object is disposed, it is checked if it is collectable or not: the graph of objects is traversed backwards to the roots. If the object is collectable, it is deleted.
Deleting the objects causes the member pointers of an object to release their own objects recursively, in natural order (i.e., in C++ destruction order).
Solving the Problem of Cycles
If there is a cycle, then the dispose()
function makes sure that all the pointers pointing to the disposed object are released before the object is deleted, thus resolving the cycle.
Working with STL Containers
STL containers contain root references, i.e., instances of ptr<T>
that do not have an owner. Therefore, as long as an object is referenced from an STL container, it will not be collected, because it will be accessible from the root set of the STL container.
The STL container itself can be managed via shared pointers. The example provided in 'main.cpp' contains an example with an std::list
.
Eager vs. Lazy Disposal
Trying to dispose an object each time a pointer changes value may be a very slow operation, if the object graph is too complex. A solution to this problem is to postpone the checking for a later date, i.e., to have lazy disposal.
The classes 'collector
' and 'collector_object
' work together to provide the functionality for lazy disposal.
Testing
The file 'main.cpp' contains two tests:
- A test with trees (nodes pointing to the parent, previous, and next siblings, first, and last children) with eager disposal
- The same test with lazy disposal
In both cases, all objects are collected as required.
The test has been executed in Visual Studio 2005.
Conclusion
This is a list of previous (and lame :-)) attempts by this author to make a collector, as well as by other authors:
History
- 23rd February, 2009: Initial version