Download source files - 4 Kb
Download demo project - 18 Kb
Introduction
This article deals with refactoring the code originally presented in
A garbage collection framework for C++
in order to allow polymorphic types to be used. If you've not already done so you should
read that article first. This article mostly deals with the process in which the code
was refactored to illustrate to new programmers how code is maintained in real life situations.
If all that you're interested in is using the gc_ptr<> smart pointer in your code you do not
need to read this article. Just download the code from the link above to replace the implementation
found in the previous article. You may still find a few sections in here worth reading, specifically
the section on compiler bugs and on recommended usage.
The Problem
If you've read the original article you'll remember that although I provided a smart pointer
that would give you true garbage collection in C++ the implementation had a serious problem. As
coded it would not handle polymorphic types. To illustrate the problem you'll have to understand
something about how pointers behave when cast to a base type. The following code will help
to illustrate the problem and will be used in the rest of my description here.
struct base1
{
virtual ~base1() { }
int value1;
};
struct base2
{
virtual ~base2() { }
int value2;
};
struct derived : public base1, public base2
{
};
int main()
{
derived obj;
base1* p1 = &obj;
base2* p2 = &obj;
assert(static_cast<void*>(p1) == static_cast<void*>(p2));
}
What will surprise many with the example above is that p1' and p2 will not point at the same
address in memory. This means the assert above will fail. The original implementation relied
on casting to "a pointer to cv void" returning the same value as the start of the
memory block allocated by new(gc)
. Obviously this wouldn't always be the case for polymorphic
types.
Looking for a Solution
While trying to think of a solution for this problem it occured to several people that
the following code would work.
struct base1
{
virtual ~base1() { }
int value1;
};
struct base2
{
virtual ~base2() { }
int value2;
};
struct derived : public base1, public base2
{
};
int main()
{
derived obj;
base1* p1 = &obj;
base2* p2 = &obj;
assert(dynamic_cast<void*>(p1) == dynamic_cast<void*>(p2));
}
The reason this works is that dynamic_cast<void*>(p)
will return the address of the most derived type of
'p' (ISO/IEC 14822, section 5.2.7/7). At first this seems to be the perfect solution
to our problem. However, dynamic_cast<>
can't be used on non-polymorphic types. Modifying
gc_ptr<>
to use dynamic_cast<>
instead of static_cast<> would allow
polymorphic types to be used, but would then no longer allow non-polymorphic types. In
my experience most types are non-polymorphic, so I could not live with this exchange of
problems.
This still seemed to be the obvious solution, however, so I set off to find a way
to cast a fully typed pointer to "a pointer to cv void" referencing the most
derived type regardless of whether or not the original type were polymorphic. One of the
first things I did was to pose this as a question on comp.lang.c++. I made a mistake in
wording my question, however. Instead of saying "to the address of the most derived type"
I said "to the beginning of the object." Someone quickly pointed out that the standard doesn't
dictate how an object is to be layed out and that padding may actually exist in the object before
the address of any pointer to the object. I already knew this, so being reminded of it
shouldn't have been of any help. Surprisingly, however, it was.
You see, I'd been to close to the problem. I'd been grappling with the complexities
of mark-and-sweep, calculating algorithm complexities, addressing corner cases in the
language such as the lifetime of global data, etc. So I had blinders on. I focused on
the "obvious" solution with out considering other possibilities. The little reminder
about object layout concepts for some reason sparked my thinking enough to break out
of this narrow view and the solution dawned on me.
What's really surprising is how simple and obvious the real solution was. In fact, when
I told the solution to Thant Tessman, the author of a template class called circ_ptr on which
I based my original code, he responded with "Duh! I'm embarrassed I didn't think of it." I
know how he felt. After all, if we go back to the origins of this code, specifically to
traditional garbage collection libraries, we'd find that they use the solution themselves.
So, what's the solution? Stop tracking only pointers to the beginning of the allocated
memory block and start tracking pointers that reference any memory within the entire block!
Pointers cast to base types will still point within this block. So modifying the code
that looks up the "node" in the implementation to search all known nodes until we find one
that "contains" the address pointed to takes care of most of this problem. As an added
bonus we can now have gc_ptr<>
's that point at members of objects allocated with new(gc).
I said that modifying the code that searches for registered nodes took care of most
of the problem. What's missing? There are two things missing at this point. The first is
the easiest to solve. The gc_ptr<>::get()
method casts back from "a pointer to cv void
" that references the base of the allocated memory block back to the actual type. This has the
same problem as the cast to "a pointer to cv void", and the solution turns out to
be even easier. I simply added a real pointer to gc_ptr<>
that's set to the pointer value
passed in to the constructor and modified the appropriate assignments to handle this as well.
The second problem left to be addressed is deletion of the object once there's no longer
any "root pointers" to it. Again, the implementation relied on casting the base pointer back
to a real type in order to call operator delete on it. The solution was to add a second pointer
to the node's state. This pointer will point at the object that was used to "register" the
destructor with the node (for details on this please see the implementation... this bit
is next to impossible for me to explain in text here).
With these changes made we've now got an implementation that handles polymorphic types!
Speed Issues
The first time I modified the code to allow polymorphic types everything worked with out a
hitch. However, the code used to find the nodes was not efficient. So another round of refactoring
ensued. Several things were moved from the header file to the implementation file instead to
reduce compile times for clients, since there was no longer a need for them outside of the
implementation. Several functions were broken out into multiple functions. The containers
used to track pointers and nodes were changed. And finally, code that searched for nodes was
broken out into a single function that returned an iterator.
I'm not sure that I've got the fastest possible implementation for find_node
, the function
that searches for a node that contains a given address. The first implementation simply
stepped through all of the nodes until one was found that contained the address. The final
implementation only steps through until either a node is found or the address is greater
then the current node's base pointer. This relies on the fact that std::set<>
is a
sorted container. It would be nice if we could use the set's built in searching capabilities to
find our node, but all of them (find
, lower_bound
, upper_bound
and equal_range
) use an exact
search based on the set's predicate template parameter. So if the address isn't exactly
equal to the node's base pointer then all of them will simply return an iterator to the
container's end()
. The standard algorithms aren't much help either. None of them will
be any more efficient then the hand coded for loop, and in fact are likely to be worse. The
std::binary_search
algorithm won't even help us, because even though the container is sorted,
it doesn't support random access iterators.
Compiler Bugs
This section of the article will be a bit controversial. There are many people who
think that VC++ has adequate support of the standard and aren't too concerned with the fact
that VC++ 7 won't be fully compliant. Well, gc_ptr<>
illustrates nicely some areas in which
VC++ 6 isn't compliant enough to handle simple code. In this case, I expect VC++ 7 will
likely handle the code much better, but this will still illustrate why compliance isn't
just a nice feature, but something we should demand. I know that the team in charge
of the VC++ compiler is concerned with this and that they are working hard on it, so I won't
harp on this subject too much. However, because it's impossible to completely work around
the bugs, at least to my satisfaction, I must tell you about the problems I encountered.
After fixing the polymorphic type problem I could start to address the interface in
more detail than I had with the first version. I wanted gc_ptr<>
to follow the same
general style as std::auto_ptr<>
, both for familiarity for users as well as to insure
I didn't revisit mistakes that the standards comittee addressed along the way. One of the
things that was changed was to provide two different constructors and assignment operators
for use with other gc_ptr<>
instances. In my first implementation there was only a
templated form that allowed conversions from one gc_ptr<>
type to another. The standard
contains versions for assignment and construction from identical gc_ptr<>
types as well.
I'm not sure exactly why this was necessary, but I'm not going to second guess them. I
added the non-templated versions for identical types. Surprisingly, the compiler complained
about duplicate definitions. Turns out that VC++ can only handle this if the templated
forms come first, which is a parsing bug since the standard does not require this.
Another change was to remove assignment to a "real pointer" and to add a reset()
method
instead. Assignment to a "real pointer" can result in some subtle and unexpected results in
some cases, so it should not be allowed. After making this change I recompiled the code
and ran the test harness. Surprisingly an object was being collected prematurely. Or at
least that's what appeared to be happening. Stepping through the code revealed that what
was really happening was much worse. I had failed to correct a line in the test harness
that assigned a gc_ptr<>
to a "real pointer". The compiler should have caught this and
given me an error at compile time, but it did not. Instead, it produced nothing for this
command (the line was skipped while stepping through in the debugger) and continued on
as if nothing had happened. I got no error or warning, but instead got an executable that
behaved differently from what was expected. I tried to distill this into a simpler
example to send to someone in e-mail and found that in the simpler form I received the dreaded
INTERNAL COMPILER ERROR
. The test harness included in the link above includes the
code needed to reproduce both compiler behaviors, as well as a "fix" that if you
absolutely must will prevent this on VC++ 6. I don't like the fix because it produces
a strange compiler error that will confuse anyone who doesn't know why the fix is there.
Since I expect VC++ 7 will fix this, I've left the fix commented out. If you want it
for VC++ 6, uncomment it.
Usage Notes
Now that I've got an implementation that's fully functional I think it's appropriate
to address when you should use this smart pointer. It's not a good drop in replacement
for all pointers, nor even a good replacement for ref-counted smart pointers in all situations.
First, if you want to use garbage collection exclusively, you're probably using the wrong
language. There is no language support for garbage collection, so all available sollutions
have the problem that they don't work with memory that's allocated by other libraries,
including the standard library.
If you don't need to use it everywhere, but would prefer
to use it most of the time, it may be more appropriate for you to use a
traditional C/C++ garbage collection library. They totally replace the allocation logic instead
of building on top of the normal routines. This allows them some memory and speed optimizations
that are impossible here (or at least undesirable). However, remember the drawbacks discussed
with this solution in the previous article. The conservative nature of these libraries may be
undesirable despite the benefits in speed and memory use.
Even after you decide that using it only for cases where manual memory management is too
difficult and error prone you may want to opt for a more traditional ref-counted smart pointer
instead. There is some definate overhead involved with this implementation. There's extra memory
that's going to be allocated for every object you allocate and every pointer that exists. The
allocation will be slower and calls to collect may cause some definate speed problems if not
managed carefully. In contrast, a ref-counted pointer doesn't use much more memory than a standard
pointer and will operate nearly as efficiently. For this reason, if speed and/or memory is important
to you then the gc_ptr<>
class should be used sparingly. Only use it in places where you know
there will be circular references, or where you fear there might be. When you know there won't
be any circular references, use the ref-counted pointer instead.
Now that I've talked about the inefficiencies of gc_ptr<>
I should point out that every
effort was made to insure the best possible performance. When used sparingly I'd expect
that you'll not notice any problems in this area.
I'd love to hear of real world experiences that you have using this class. If you've got the time,
post it in the comments below so that others will know what your experiences are.