(untagged)

How to Prevent Unnecessary Compilation of Source Files

Luuk Weltevreden

0.00/5 (No votes)

6 Feb 2004

This article shows techniques to design C++ classes in such a way as to improve independency between files.

Download demo project - 10.1 KB

Introduction

I got rather annoyed a while ago while working on one of my projects. It was getting rather large and the compile times went up fast. I was working on a class, and after making a small adjustment to a single class, I noticed the whole project had to recompile. To illustrate the problem, I give you the following example:

class A
{
public:
    void foo();

private:
    AMember m_member;
}

The header in which this class is declared is included in the precompiled header file. There's no need to argue if it should be in here or not, because what matters is the consequences it has. If we change the class A itself, every .cpp file in the project will recompile. This makes sense, because the header gets included in the precompiled header file, so all .cpp files might be using A. However, if we change AMember, all files will be recompiled too! We don't want that, do we? Wouldn't it make more sense if just the source file for AMember gets recompiled? I think it does. So, what I'm going to show you in this article is how to make it clear to the compiler that it only compiles the absolute minimum amount of code necessary to reflect the changes.

Why Exactly Does the Compiler Do This?

The problem is that the class specification contains implementation details (namely m_member). The compiler needs to know what type AMember is, so often you see something like #include "AMember.hpp" at the top of the header file. This however creates a compilation dependency. So, what we would like to achieve is to be able to remove the '#include' directive. To do this, we need to effectively separate all implementation details from the specification and include the implementation details in the actual implementation of the class.

For a detailed explanation, I recommend you to read Effective C++ by Scott Meyers. He dedicates quite a few pages to just this problem.

The Two Solutions From Scott Meyers

So when facing this problem, I remembered reading about this problem once. After a bit of searching around, I found the book Effective C++ by Scott Meyers in which this problem is handled. In the book, he gives us two solutions against this problem. Both however have some minor drawbacks. Because my solution is based on both solutions of Scott Meyers, I will first explain how he created a workaround for this problem.

Using a Handle Class

The first solution he gives is to use a Handle Class. A Handle Class is nothing more than an interface of a class which has a member variable pointing to the actual implementation of its member functions. How does this work? I'll show you.

// We can declare an empty shell of this class and the compiler
// will accept it simply because we only use a pointer to this
// class.
class AImpl;

class A
{
public:
    void foo();

private:
    AImpl * impl;
};

#include "A.hpp"

// This class contains the actual implementation
class AImpl
{
public:
    void foo();

private:
    AMember m_member;
};

#include "AImpl.hpp"

void A::foo()
{
    impl->foo();
}

So what we do is just to include A.hpp. The constructor will allocate the impl object (I have left this part out for readability). We don't need to include the header file for AImpl, because we declare an empty class AImpl on top of the header. This works because we only use a pointer to the actual class. In the source file of A, we will have to include the header file for AImpl however, but that's exactly what we want. The member functions of class A simply link to the member functions of AImpl. The disadvantages of this implementation are:

We need an extra pointer per object.
We need to relink the member function at runtime.
We need to dynamically allocate and free memory for the implementation object.

Using a Protocol Class

The second solution he gives is to use a Protocol Class. A Protocol Class is an abstract class, so it just represents the specification of the actual class. The following is an example of how to use a Protocol Class:

class A
{
public:
    virtual void foo() = 0;

    // We need a way to construct a class of this type. Because
    // it's abstract we can't instantiate it directly.
    static A * makeA();
};

#include "A.hpp"

class AImpl : public A
{
public:
    void foo();

private:
    AMember m_member;
};

#include "AImpl.hpp"

A * A::makeA()
{
    return new AImpl;
}

Again, all you need to do is to include A.hpp. However, this time we are facing an abstract class which can't be instantiated directly. Because of that, we need a helper function which will construct a subclass of class A (in this case AImpl) and return a pointer to it. As you can see, we completely removed the implementation details out of A.hpp, so it works like a charm. The disadvantages of this implementation are:

We need to use the helper function to construct an object.
We need to manually free the pointer allocated by the helper function.
We are forced to use a pointer, so we need to write '->' instead of '.' (of course, you can create a reference, but that's also added work).
We have virtual functions which means we have a virtual table, which is added overhead.
Virtual functions have runtime linking.

While both of these examples work perfectly, I was wondering if I could come up with a solution which works just as well, but doesn't have all of these disadvantages. Now, we come to my most favorite subject of C++, templates!

Using Templates

Please note that my solution has a serious bug in the code. I am working on a solution for it now. The two solutions suggested by Scott Meyers work correctly. For more information about the bug, read the posts below. I have made another solution now however, which is described below. I have kept this part in my article just for the sake of it.

I never really liked MFC, and when I was looking for a replacement, I found ATL/WTL. This brought me to the concept of templates, and ever since I have been addicted to using them. So, how do you use a template to separate implementation details from the specification header? Well, I'll show you:

template <class T>
class TA
{
public:
    void foo();
};

// Again we declare an empty implementation class
class AImpl;

// We use this typedef so we can instantiate objects of type TA using A
typedef TA<AImpl> A;

#include "A.hpp"

class AImpl : public TA<AImpl>
{
public:
    void foo();

private
    AMember m_member;
};

#include "AImpl.hpp"

void TA<AImpl>::foo()
{
    (static_cast<AImpl *>(this))->foo();
}

In case you don't have much experience with templates, this might seem like a bit unusual to you. I recommend you to read one of the great tutorials on this site which is more than capable of explaining the usage of templates to an average person. What I do is to make use of the fact that I already know that class T will have the value AImpl. In fact, it's the only value with which this source will compile as I only give the implementation for this specific case. This will immediately give an added level of protection, so it's impossible to accidentally instantiate for instance TA<BImpl>. In the implementation of TA, I cast the this pointer to an AImpl pointer. We can even use a static_cast as we know for sure that this is a legal conversion. Because now we have a pointer to an AImpl class, we can call its member function! Quite simple, isn't it? But I keep my word not to have the same disadvantages as the solutions offered by Scott Meyers?

The Advantages of the Templates Solution

Let's sum up the disadvantages of the other solutions again and compare them to my solution.

Scott Meyers first solution
1. Needs an extra pointer per object.
  My solution doesn't need any extra variable.
2. Functions get relinked at runtime.
  In my solution, the compiler knows at compile time which functions need to be called. This means that compiler optimizations can be done!
3. Needs to dynamically allocate and free memory.
  No memory allocation or deallocation necessary in my solution.
Scott Meyers second solution
1. Needs a helper function to instantiate an object.
  In my solution, you can use a typedef, or in case you don't like that, you can instantiate by using TA<AImpl>.
2. Object needs to be manually deallocated.
  My solution can be instantiated, so depending on how you instantiate it, you don't manually need to deallocate it.
3. Because the helper function returns a pointer, you are forced to use a pointer.
  You can instantiate the template both as a pointer and the 'normal' way, not forcing you to use a pointer.
4. Has the overhead of a virtual table.
  My implementation uses no virtual functions, so it has no virtual table.
5. Functions gets relinked at runtime.
  As I said a few lines above, in my solution, functions get linked at compile time, meaning optimizations can be done.

So it looks like I actually kept my word. It even works too! If you still don't quite understand how I achieved this, simply take a look at the provided example. In the example, I demonstrate all the above techniques and also the original situation forcing all source files to be recompiled. Note that in the demo, the compilation doesn't take long either way, but imagine the speed improvement on a large project.

The Disadvantages of the Templates Solution

Say goodbye to those coffee breaks while compiling!
It's bugged... :-O

A Much Simpler Solution

Because a couple of persons pointed me to a serious bug in the code (for which I have workarounds by now, but you might dislike the solutions for that), I had to think of a replacement. This is a solution for a specific, but very common and useful situation (which lead us to the bug). Suppose you have the following class:

class A
{
public:
    AMember m_member;
};

This is in my opinion bad object oriented programming, because you shouldn't have public member variables. In case you absolutely want it this way, well then, there simply isn't a solution other than using a pointer, much like the first solution from Scott Meyers.

In case you agree with me and think this is bad class design, you must like the following design better:

class A
{
public:
    const AMember& GetMember();

private:
    AMember m_member;
};

The GetMember() function simply returns a reference to the private member. Well -this- is something we actually can work with. I bring to you my last and also simplest (and most efficient) example:

// This type of relation doesn't require
// the compiler to know the specification of the class
class AMember;

class A
{
public:
    const AMember& GetMember();
};

#include "A.hpp"

#include "AMember.hpp"


const AMember& A::GetMember()
{
    static AMember member;

    return member;
}

Does that work? Yes, it does. Does this have nasty bugs such as my other sample? It almost can't have any, because it's this simple. :-) If you wish to use the AMember class, you simply have to include the appropriate header and call A::GetMember(). This way, you force yourself to do good object oriented programming and you create less dependencies as well! The disadvantage is that you can only have one instance of class A as it uses a static variable. In a lot of cases, this doesn't matter however, especially not with the classes for which a dependency problem is common as those are often global singleton classes.

Conclusion

I have shown you a couple of techniques to use to improve compiler independency between classes. None of those techniques (except for the template one which is yet bugged) is wrong and they all have different disadvantages. You will have to pick the right technique for the right situation. It is also not at all necessary to change every class in your project to one of these techniques, but a couple of changes in key points in your file structure can greatly improve the compilation time.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here