Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Comparing different coding approaches - Part 2

0.00/5 (No votes)
8 Dec 2003 1  
Second part: multiple projects, templates and inheritance.

Introduction

In the previous article [^], I described various ways to provide code to other developers.

In this one, I want to compare certain design techniques, in particular:

  1. Near line coding
  2. Off line coding
  • Design by inheritance
  • Design by template wrapping

The example

Let's imagine to design an application that allows the user to place on the screen (edit and save ...) shapes represented by circles and rectangles, with borders and fill colors.

The frameworks

The idea is that "you" develop the application and "I" provide the classes required to implement it. It is important to produce "comparable code", so a special design architecture is needed.

General considerations

Whatever the coding style is, "nearline" or "offilne", it is better to distribute the classes in different projects, in order to let the classes be available to possibly different applications. In particular, it could be convenient to keep the application data classes in a specific library, separated from the application GUI classes: this allows to link that library into other executables to do some testing, without the need to review the entire application.

Again, whatever framework you will use (WTL, MFC or whatever), you'll probably add some functionality (some new control or some customization) that are most probably not strictly related to your application. These classes can be candidated to a proper library.

And again, you may develop some classes not pertinent to any framework or to any application. Just "general deployment classes" (like STL container or algorithms, for example). These are candidated to another library.

The same approach can be taken by me when developing classes for you. To do that, it is possible to organize the project in such a way that "your" application remains distinct from the classes I develop. And because I've to test the classes, I can create a skeleton application that represents a user for the deployed classes. Thus, the sln file will normally contain the following projects:

App Represents "your" application, using "my" classes and a working framework. Will depend from all the following projects.
AppClasses My classes used to represent the application documents. Will depend on all the following projects.
GEWin WTL, MFC or W32 functionality extensions. Will depend on the following project.
GECommon Common generic C++ classes. Will include some STL and CRT (not related themselves to Windows)
YourWin Your extension to MFC, WTL or whatever.
YourCommon Your common generic classes.

To avoid conflict with names (mine, his and yours), namespaces are recommended. In particular, all my code is normally included in a "root" namespace (I usually name it GE_: just my initials) and in a second level namespace that categorizes the classes.

May be GE_ can conflict with your names: if that's the case, a global "find and replace" of the string GE_ in all my files, is safe to be done.

Design models

Just to avoid to rewrite sources to deploy with the different models, I generally follow these design styles:

  1. Each library has its own stdafx.h (as precompiler header) that includes the headers that are required by that library (note: not that library headers). Those headers will never be included into other library files, so, it is required that projects depending on those libraries will include those stdafx.h into their own stdafx.h (if any) or headers (if no precompiled header is used).
  2. Each source includes the required headers (h), that are: the headers not in the precompiled headers that are required to compile the source.
  3. Forward inline declaration are in "inline files" (inl) included at the end of their respective headers.
  4. Every file (h and inl) always contain a #pragma once.
  5. Compilable code is in CPP file.
  6. Every instantiation in source files (CPP) is done using macros like GE_INLINE and GE_SELECTANY. These macros are normally defined as "blank". But if you define the symbol _FORCE_GE_INLINE, they will be defined as _inline and _declspec(selectany) respectively. This makes the CPP to generate no code itself, but to become "includable".
  7. If _FORCE_GE_INLINE is defined, CPP files are included in HPP files, you can include in your sources or precompiled headers.

The following diagram shows these concepts.

Blue arrows indicate inclusions in "offline coding", reds with "nearline coding". As an effect of this architecture, if "offline" is used, CPPs compile in their project, merging their symbols into the lib, and then are linked in the EXE. If "inline" is used, CPPs compile no source, the lib remains empty, and everything will be compiled in the "exe" project (in the precompiled header or in the sources, depends on how you use the includes).

Naming conventions in coding

A number of "styling" exist. I found Hungarian notation sometimes useful and sometimes not.

For example, with all classes starting with "C", we are saying that a class is a class. But we don't say what and how it was intended to be used for.

In general, I find myself comfortable introducing other "letters" to prefix class names, like in the following table.

N* Namespaces. All namespaces are always inside namespace GE_.
I*

Interfaces. The keyword interface is used as an alias for struct to enforce. All functions should be virtual and pure (virtual ret_type fn(params)=0 ), apart eventual static members or functions.

C* Classes, when containing both private and public data, with data managed by accessor functions.
S* Struct, or classes, containing only data (public and directly accessible) plus some helper functions (typically: default constructor initializing values).
X* Internal nested classes, or sealed classes. In general, classes you don't need to derive, or even to know about.
E* Extender classes: classes providing interface implementation that should act as bases for multiple inheritance. No instances should be done from that type (no new E.....). Normally virtual inheritance is required.
P*, Q* Smart pointers: normally defined as typedef. P for strong (affecting object life) and Q for weak (only "observing") pointers (See smart pointers [^], but in this sample may be the implementation will differ).
T*, t_* Typedefs. Capital form is for globally available, lower case for internals. May also be "empty classes", assembling extenders.

May be this convention seems at a very first look, quite annoying. I instead experienced that, giving an idea also about what a class exists for, helps remembering the correct way to interpret the code when reusing it. After all, as stated before, if everything begins with "C", the only "information" we give is that a "class" is a class.

Data classes and polymorphism

In our example, Rectangles and Circles are Shapes that must be Drawable with certain Attributes in certain Places. Must be Selectable, Collectable and RefCountable. Must be Sizable, Movable and Serializable.

All of these characteristics will be implemented in specific classes, designed as templates or as interfaces. Then, we must mix-in, or embed, or derive. In any case, our object must be "Castable". May be via RTTI, may be via template parameters.

It is from now clear that our objects (circles and rectangles) are polymorphic implementations of a more generic object (a "shape"), and that the user must be able to create an arbitrarial number of those objects.

The implementation of polymorphism may happen in different ways: templates and inheritance are probably the most important.

Here's a brutal comparison:

// classwrap.cpp 

//


#include <tchar.h>

#include <iostream>





template<class T>
class W
{
private:
    T _t;
public:
    void Hallo()
    {
        std::cout << _t.GetMsg();
    }
};

class A
{
public:
    const char* GetMsg() const 
    {return "I'm A\n";}
};

class B
{
public:
    const char* GetMsg() const 
    {return "I'm B\n";}
};

class C
{
public:
    const char* GetMsg() const 
    {return "I'm C\n";}
};


///////////////////////////////////


void wait_enter()
{
    std::cout << "press enter\n";
    char ch; std::cin.get(ch);
}

int _tmain(int argc, _TCHAR* argv[])
{
    W<A> wa;
    W<B> wb;
    W<C> wc;

    wa.Hallo();
    wb.Hallo();
    wc.Hallo();

    wait_enter();
    return 0;
}





// classhinerit.cpp 

//


#include <tchar.h>

#include <iostream>


#ifndef interface
#define interface struct
#endif

interface I
{
    virtual const char* GetMsg() =0;
    virtual ~I() {;}
};

class W
{
private:
    I* _p;
public:
  W(I* p=NULL) { _p = p; }
  ~W() {  if(p) delete _p; }
  void Set(I* p) 
    { if(_p) delete _p; _p = p; }
  void Hallo() 
    { std::cout << _p->GetMsg();}
};

class A: public I
{
  virtual const char* GetMsg() 
    {return "I'm A\n";}
};

class B: public I
{
  virtual const char* GetMsg() 
    {return "I'm B\n";}
};

class C: public I
{
  virtual const char* GetMsg() 
    {return "I'm C\n";};
};


//////////////////////////////////////


void wait_enter()
{
    std::cout << "press enter\n";
    char ch; std::cin.get(ch);
}

int _tmain(int argc, _TCHAR* argv[])
{
    W wa(new A);
    W wb(new B);
    W wc(new C);
    
    wa.Hallo();
    wb.Hallo();
    wc.Hallo();

    wait_enter();
    return 0;
}

In both codes, W is a class that says "hello" in different ways depending on whether it is working with A, B or C. The difference is the way it works: in the left pane, what it embeds is defined at compile time: W<A>, W<B> and W<C> are different types and wa, wb and wc are of unrelated types. In the right pane, W is always the same, and wa, wb and wc are of the same type. The different messages are obtained from different ways: three classes derived from the same interface and created at runtime, overriding a virtual function.

If we want - for example - to have an array like W w[3], this is possible with the right coding paradigm, not the left one.

Coding with templates

There are, in general, various techniques to create template polymorphism. Here are some:

  1. Generic bases
  2. Generic derived
  3. Mix-in classes
  4. Nestable wrappers

Generic base classes

They're typical when there is the need to add a common interface implementation. A typical case is this:

template<class TDerived>
class EGenericBase
{
protected:
    // thype conversion

    TDerived* This() 
    {   return static_cast<TDerived*>(this)   }

    //overridables

    void AnOverridable() {  std::cout << "base implementations\n"; }

public:
    void CallOverridble() { This()->AnOverridable(); }
};

It is supposed that EGenericBase is used as an extensor base, like in this case:

class CMyClass1:
    public EGenericBase<CMyClass1>
{
public:
    void AnOverridable() {  std::cout << "overridden1 implementations\n"; }
};

class CMyClass2:
    public EGenericBase<CMyClass2>
{
public:
    void AnOverridable() {  std::cout << "overridden2 implementations\n"; }
};

If you call CallOverridable(), the static_cast in This(), will make you call the AnOverridable supplied with the derived. But ... note that this works just because the type of the derived is made known to the base at compile time.

In our case, we can define a EDrawable<derived> where derived is CCircle or CRectangle.

Generic derived classes

Are common when a given behavior must be attributed to different bases, each of which provides itself that behavior:

class Base1
{
    void Method() { std::cout <<"Base1\n"; }
};

class Base2
{
    void Method() { std::cout <<"Base2\n"; }
};

template<class TBase>
class CGenericDerived: public TBase
{
    void UniqueMethod() { TBase::Method(); }
};

Mix-in classes

Where multiple bases implementations are provided, here's a common scheme:

class CYourClass:
    public EMyFunctions1<CYourClass>,
    public EMyFunctions2<CYourClass>,
    public EMyFunctions3<CYourClass>
    {
        // ...

    };

Functionality dependency (a method in EMyFunctions3 needing to call a method in EMyFunctions1) is addressed with a dual static_cast:

template<class T>
class EMyFunctions3
{
    void Calling1()
    {
        EMyFunctions1* p1 = 
            static_cast<EMyFunctions1<T>* >(static_cast<T*>(this));
        p1->Method1();
    }
};

This works for every T derived from both EMyFunction1 and 3. This scheme is mostly used, for example, in WTL.

Nested wrappers

This technique can simplify the use of "casting" when the dependency between the bases are not "any to any" but only linear: class A calls B methods, class B calls C methods, but neither B or C needs to call A.

template<class T>
class C: public T
{
    //C methods

};

template<class T>
class B: public T
{
    //B methods (may call C methods)

};


template<class T>
class A: public T
{
    //A methods (may call B and C methods)

};

//here comes the magic


class CYourClass
{
    //this is the class that provide

    //  some data and accessors

}

typedef A<B<C<CYourClass> > > TDecoratedCMyClass;

Here TDecoratedCMyClass has all the functionality of A, B and C applied to CYourClass.

This can be useful when CYourClass is a W32 type (typically a struct or a handle), C is an initializer (sets memory to zero, maybe by also setting an eventual cbSize), B an "invalid value manager" (sets and checks for 0 or -1) and A an "attacher" (manages "Attach" and "Detach").

Template or not template? That's the question

Now the question: Suppose we define CCircle and CRectangle, suppose we define EDrawable<T>, EPlaceable<T> and so on, can we collect both the classes into a same collection?

No: we cannot. There is no common type between them. Templates are not themselves types: the two instantiations are completely unrelated types. This methodology (strongly used in WTL) is good to add functionality to a predefined object of known type at compile-time. Not for objects that can have different types into a same runtime context.

We can create a CCollection<CCircle> and a CCollection<CRectangle> but they will be distinct.

In that case, virtual functions and pointer to bases are required. We may have CCollection<EShape*> (CCircle and CRectangle can derive from EShape). But now the compiler will not know which type of EShape, a EShape* points to. As a consequence, EShape must provide all the differentiable methods in the form of virtual functions.

To manage EShape lifetime, then, it is probably better to store not a EShape* but a PShape that destroys the "shape" when removed from the collection.

We can still use templates to define those implementations that we can imagine applicable also to types different from EShape. (For example: a smart pointer can be, though for a more generic use, keep the life for a EShape)

Coding with abstract classes

The concept is very simple OOP literature: define abstract methods that do some actions and group them into abstract classes we name "interfaces". Then we derive our object from the interfaces we want to implement.

Two level model

In our example, we will have:

#define interface struct

interface IDrawable
{
    virtual void Draw(HDC hDC)=0;
    virtual void Invalidate()=0;
};

interface IPlaceable
{
    virtual void Place(POINT point)=0;
    virtual void Move(SIZE size)=0;
};

class CCircle:
    public IDrawable,
    public IPlaceable
{
public:
    virtual void Draw(HDC hDC);
    virtual void Invalidate();
    virtual void Place(POINT point);
    virtual void Move(SIZE size);
};

class CRectangle:
    public IDrawable,
    public IPlaceable
{
public:
    virtual void Draw(HDC hDC);
    virtual void Invalidate();
    virtual void Place(POINT point);
    virtual void Move(SIZE size);
};

We can have any number of IDrawable types, with IDrawable* that is still the same. In this sense, we have a real runtime polymorphism (in contrast with templates, where polymorphism is at compile time).

But this model doesn't offer much reusability: CRectangle and CCircle will probably both store a bounding rectangle, and IPlaceable will probably be implemented in a same manner.

Multilevel model: partial implementations

It is possible, at this point to introduce intermediate levels of implementation between interface (no implementation at all) and classes (fully implemented). So we may have an EPlaceable derived from IPlaceable, providing some data and some default implementation (I used to call them "extensor": apply it as bases to a class and they "extend" the functionality of that class). And we can derive CCircle and CRectangle from EPlaceable instead of IPlaceable.

And when a partial implementation of an interface needs to call a method of another interface? With templates, we can do a dual static_cast, but here ... we have to dynamic_cast. And this requires RTTI.

class EPlaceable:
    public IPlaceable
{
private:
    CRect _rcBounds;
public:
    virtual void Place(POINT point)
    {
        IDrawable* pDrw = dynamic_cast<IDrawable*>(this);
        if(pDrw) pDrw->Invalidate();
        
        // DO THE REPLACEMENT

    }
};


class CCircle:
    public IDrawable,
    public EPlaceable
{
    // ....

};

Information hiding

If coding in "offline mode", I can decide what to expose in the headers and what to keep "secret" in the library sources. I can expose only interfaces plus a factory that creates the types returning interface pointers, or I can expose all the classes.

In the first case, you'll have access to IDrawabe and IPlaceable virtual functions, and to globals CreateRectangle and CreateCircle, that return an IDrawable*. But you cannot see CRectangle and CCircle classes themselves.

Things became a bit more complex when multilevel model is used: suppose you want to add some functionality to my CRectangle. You will probably wish to derive from it, but how can you do it if you only have a global CreateRectangle function and a set of interfaces?

The answer is aggregate by "delegation and embedding".

// Embed.cpp

//


#include <tchar.h>

#include <iostream>


#ifndef interface
#define interface struct
#endif

interface Int
{
    virtual void IfMethod()=0;
    virtual ~Int() {;}
};

namespace {
    
    //this may be secret if in another file

    class A: public Int
    {
        virtual void IfMethod()
        {   std::cout << "A method\n"; }
    };
}

//exposed creator

Int* CreateInt() {  return new A; }


//now the override of A functionality

class B: public Int
{
private:
    Int* _pEmb;
    virtual void IfMethod()
    {
        std::cout << "B calling ...\n";
        _pEmb->IfMethod();
        std::cout << "... with something else\n";
    }
public:
    B() { _pEmb = CreateInt(); }
    ~B() {  delete _pEmb; }
};

Int* CreateB() {return new B; }

/////////////////////////////////////////////////


void wait_enter()
{
    std::cout << "Prss Enter";
    char a; std::cin.get(a);
}

void _tmain()
{
    Int* pI = CreateB();
    pI->IfMethod();
    delete pI;

    wait_enter();
}

In the sample, B extends A functionality by obtaining A from a creator function ... knowing nothing about A. This is more or less what COM aggregation model does. It has a disadvantage: suppose the interface has one-hundred methods, implemented by a secret object. Suppose you have to override only a pair of them ... You must re-implement all of them, 98 of which will be simple delegates to the embedded object. Is this real reusability?

Interface inheritance and dominance

Let's suppose we have an interface that is inherited from another interface, and suppose I'm providing implementation for that interface. It is suitable, for me, to derive the implementation of the derived interface from the implementation of the base interface.

Now, suppose you're defining a class that must implement the nested interface. You would probably derive my implementation. Now the problem: if many interfaces inherit from the same interface, what happens to the implementations? To avoid multiple implementations of the same interface, we need virtual inheritance.

This leads to this hierarchy:

Hierarchy chart

And this leads to another aspect of virtual inheritance: dominance. Suppose E2 wants to override one E1 implemented method, and E3 doesn't.

CX will receive that method from E1 (via E3) and from E2 (directly) and will use E2 because it is "nearer". This is "dominance". Suppose both E2 and E3 override one E1 method. Now inheritance of CX become ambiguous, and it's up to CX implementor to decide what to do.

How will it affect RTTI, virtual tables and virtual bases performances ?

Comparisons

There are some religious battle between inheritance and template. One aspect I don't like about those battles are certain integralisms like: I like STL so I must hate MFC. I like Linux so I must hate Windows. I like inheritance so I must hate templates etc. etc. etc.

What templates fans always say about virtual functions and virtual bases is that they require "tables" and this reduces "performance". What they often forget to say is that every time they instantiate a template on another type, the "code" is re-created over that type: std::list<long> and std::list<double>, when compiled, gives two distinct sets of routines that are the std::list methods. Or - as some compilers do - the same code is applied to a parametric class whose methods are stored apart. But this is "reverting templates to inheritance" behind the scenes.

Compared in this way, there is still a dualism, like in coding "inline" or "offline". In this case, "dualism" is that inheritance creates more data, while templates generate more code.

Here's a demonstration:

// templatehack.cpp:

//

// 


#include <tchar.h>

#include <iostream>


template<class T>
class A
{   
public:
    void Do(const T& t) {   std::cout << "A::Do on " << t << std::endl; }
};

struct I
{
    virtual Out()=0;
    virtual ~I() {;}
};

class C: public virtual I
{
    virtual Out() { std::cout << "C::Out" << std::endl; };
};

void _tmain()
{
    // create thwo instatiation of A

    A<int> ai;
    A<double> ad;

    //call A method, to force the compiler to create the code

    ai.Do(1);
    ad.Do(3.14);

    //gest A method addressed

    void(A<int>::*pDoI)(const int&) = A<int>::Do;
    void(A<double>::*pDoD)(const double&) = A<double>::Do;

    I* pI = new C;
    pI->Out();

    std::cout << std::endl;
    std::cout << "A<int>::Do address = 
      " << *(unsigned long*)&pDoI << std::endl;
    std::cout << "A<double>::Do address = 
      " << *(unsigned long*)&pDoD << std::endl;
    std::cout << std::endl;
    std::cout << "size of C         = " << sizeof(C) << std::endl;
    std::cout << "size of A(int)    = " << sizeof(ai) << std::endl;
    std::cout << "size of A(double) = " << sizeof(ad) << std::endl;
    
    delete pI;
    char a; std::cin.get(a);
}

Gives the following output:

A::Do on 1
A::Do on 3.14
C::Out

A<int>::Do address    = 4203456
A<double>::Do address = 4203520

size of C         = 8
size of A(int)    = 1
size of A(double) = 1

A is a "no data" class (so it's one byte long), and C has virtual functions and virtual bases: so it's 8 byte long to accomodate the v-table and the b-table pointers (without virtual bases, its length should be 4). Also a v-table is generated for I and C (contains two members: the method and the destructors).

In contrast, the two instantiations of A have produced two A::Do functions (and there are two distinct addresses).

Now, consider an application running with one thousand objects in memory, each 100 bytes long, of 10 different types (that is 100 objects per type). Consider each object provided using 50 functions, only 10 different (20%) in each type, and each function 200 bytes long, and let's estimate the amount of memory for code and for data.

 

Templates:

Inheritance:
Data 1000 objects, 100 bytes each, giving 100KBytes 1000 objects, 108bytes each, giving 108KBytes (8% overhead)
Code 50 functions per 10 types = 500 compiled function bodies (100KByte) 10 functions per 10 types + 40 common functions: 140 function bodies (28KBytes)
Total 200 KB 136 KB
Preference
  • Compile time polymorphism
  • Small data objects (few bytes) in very big quantity
  • Simple functions (relatively short bodies)
  • Runtime polymorphism
  • Wide data objects (above 100 bytes) in relatively small quantity
  • Complex functions, often common to many types.

In general, it's not necessarily a matter of code, but a matter of balance between the amount of data to manage (and how differentiated they are) and the amount of code used. With the hypothesis of very simple code (short functions), template wrapping of data classes is probably the best: and this is what makes WTL programs quite short. The most of WTL functions are very little more than simple W32 API delegators. And that's true also when objects are many, and quite small: 8 bytes of overhead may not be negligible, if added to a 4 byte object.

But when runtime polymorphism (like object created by the user and collected together) is required, and objects become quite complex in data and also in code, then is the time to move to inheritance.

In formulas, given

  • N: the total number of required objects
  • L: the average size of each object
  • F: the average number of functions in each type
  • C: the average length of a function
  • T: the number of different types
  • P: the "differentiation" (between 0 and 1): how many functions per type are different or overridden, with respect to the total.
  • v: the overhead of virtual functions and base table pointers (4)
  • I: the "balance ratio" (<1 good for inheritance, >1 good for templates)

it is

formula

It is interesting to note that with P=0 (every type with its own override), templates are always preferable. The number of supplied functions is the same, and no v-table stuff must be provided. But at that point, it is probably better to code independent objects!

More in general, templates become convenient when,

(Note: P is (0...1) by definition, the numerator is the overhead of tables, and the denominator is the overhead of recoding.)

The value of P with respect to N, with T as parameter, is in the following picture, with v=4, C=500 and F=10.

Graph1

In general, the bigger the number of objects, the lower is the differentiation required to switch to templates. But the higher the number of types, the best is inheritance.

Making shorter code, or a lower number of used functions (lower C or F) makes the lines more ripid (and templates more preferable), and vice verse with long functions or big number of functions.

Conclusions

I compared "inlining" vs. "offlining" and "templates" vs. "inheritance".

I don't that pretend you agree with all these analyses. What I should like is a less religious (or idealistic) approach to these kind of choices, with a pragmatic one. Of course, the analyses presented here are not the most rigorous that can exist, but I didn't want to write a book, just let anyone to have a knowledge about those aspects and - may be - elaborate one's own analysis one's own specific cases.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here