Introduction
I recently discovered an interesting C++ technique that I've never read about before, so I thought I'd share it here. It isn't a language feature or anything, but it is still interesting and (in my case at least) useful. The technique allows you to change the polymorphic behavior of an object at runtime.
Background
First, a little back story. I've got a Property
class that provides generic access to an object's property value. To provide this, the Property
class must know the data type of the property that it encapsulates. So, I've also got a DataType
class that encapsulates a data type and provides generic access to values of that type. This DataType
class uses standard polymorphic class design such that the abstract
base DataType
class is implemented for each data type that we need to support (i.e., DataType_int
or DataType_MyClass
). So, my Property
class has a reference (pointer) to a DataType
object which provides it with generic access to that type's value. This is also an example of the Strategy pattern, which allows for the Property
class to change its behavior (its DataType
) at runtime, and an example of design by composition (Property
has a DataType
) rather than inheritance (Property
is subclassed for each DataType
it must support). So far, I think that I'm on the right path.
The problem arises when I make a couple of DataType
subclasses and begin trying to assign them to Property
. Since Property
has a reference to a DataType
object, that object must exist somewhere. So, I have a couple of options. I can have Singleton instances of each DataType
subclass and let Property
objects reference those Singletons. Or, I can dynamically allocate an instance of a DataType
class and let the Property
class manage that object's memory. The latter would result in many small allocations, which would be slow and could fragment the heap. So it isn't desirable. And, I prefer not to keep globals around if at all possible, so the Singleton solution, while not terrible, was not ideal.
I started thinking of using a structure of function pointers to encapsulate the many behaviors required to encapsulate a given type. However, I quickly realized that this would result in huge objects when I really only want a single reference to a class of functionality that the group of functions would define. At this point, I realized (as I'm sure you also have) that what I needed was a class. The class provides each instance of it with a group of functions accessed via a single reference, the v-table. Following this train of thought, I began to think of an object as a reference to a group of functions (methods). If I just copied this reference, then I could change the functionality of my object (exactly the way that my Property
class can change its functionality by changing its DataType
reference). This is the standard Strategy design pattern.
Using the Code
The solution that I arrived at looks like this (I'll explain below):
#include <cstring> // for memcpy
class DataType {
public:
DataType() {}
DataType(const DataType &newType) { setType(newType); }
void setType(const DataType &newType) {
memcpy(this, &newType, sizeof(DataType));
}
protected: virtual int _getSizeOfType() const { return -1; }
public: inline int getSizeOfType() const { return _getSizeOfType(); }
protected: virtual const char *_getTypeName() const { return NULL; }
public: inline const char *getTypeName() const { return _getTypeName(); }
};
class DataType_int : public DataType {
public:
DataType_int() {}
DataType_int(const DataType &newType) : DataType(newType) {}
protected: virtual int _getSizeOfType() const { return sizeof(int); }
protected: virtual const char *_getTypeName() const { return "int"; }
};
class DataType_float : public DataType {
public:
DataType_float() {}
DataType_float(const DataType &newType) : DataType(newType) {}
protected: virtual int _getSizeOfType() const { return sizeof(float); }
protected: virtual const char *_getTypeName() const { return "float"; }
};
DataType myType = DataType_int();
const char *typeName = myType.getTypeName(); int typeSize = myType.getSizeOfType();
myType.setType(DataType_float());
typeName = myType.getTypeName();
As you can see, when we set the type, we are simply using memcpy
to make the object's v-table pointer point to the v-table of the object that gets passed in. This changes myType
's polymorphic behavior to that of the new type! And, we no longer need pointers or singletons or dynamic memory allocations! We have an object that is the size of a v-table pointer, and that is all! If you prefer a bit of a speedup here, you could just use *((void**)this) = *((void**)&newType;
to copy directly, assuming that your DataType
class has no members (thanks to Dezhi Zhao for pointing that out in his comments below).
Please keep in mind that this technique is not standards compliant, as the standard doesn't say anything about v-tables or v-ptrs (thank you to all of the commentators below that pointed this out). If a compiler implements virtual methods in such a way that doesn't store lookup information within an object's memory space, this technique will fail completely. However, I have never heard of a C++ compiler that doesn't work this way.
Also, you can see that we can easily change the type of myType
at any point during runtime. This allows you the flexibility of having an uninitialized array of DataType
objects and initialize them whenever you like later. For the performance minded out there, Dezhi Zhao also pointed out below that this will most likely cause the processor's branch prediction to fail for the getTypeName()
call immediately after changing it. This will only happen for the DataType_float
version above, however, as the prediction will only fail if the processor has made a prediction already.
One thing that you may have noticed is the use of public
proxy methods (getSizeOfType
) that call protected
virtual methods (_getSizeOfType
). We need to do this because the compiler may skip the v-table lookup when it knows the actual type of an object (as opposed to pointers or references, where it doesn't). This is perfectly reasonable, but breaks our setup. Inside the proxies, though, the v-table lookup always happens. And because they are inline, all they really do is make the compiler look up the correct method in the v-table and call that one. Remember, however, that we are not removing the virtual method lookup. This setup will not speed up virtual method calls in any way. In fact, we depend on the compiler looking up our virtual method for this to work.
One important thing to note about this setup is the absence of any member variable in DataType
. Since we are doing a memcpy
expecting that both objects have the same size (sizeof(DataType)
), none of DataType
's subclasses may add any member variables. You could add member variables to DataType
with no problem, but you are not able to add any member variables to subclasses. Since I didn't need any member variables for DataType
, this didn't present a problem for me. However, it is not impossible to add member variables to subclasses. You just need to use memory that was provided in the base class as the memory where your members live. For example:
#include <cstring> // for memcpy
class DataType {
public:
DataType() {}
DataType(const DataType &newType) { setType(newType); }
void setType(const DataType &newType) {
memcpy(this, &newType, sizeof(DataType));
}
protected:
enum { kMemberDataBufferSize = 256, kMemberDataSize = 0 };
char memberDataBuffer[kMemberDataBufferSize];
};
class DataType_MyType : public DataType {
public:
typedef DataType BASECLASS;
DataType_MyType() {}
DataType_MyType(const DataType &newType) : DataType(newType) {}
inline int getExampleMember() const { return _getMemberData().exampleMember; }
inline void setExampleMember(int newExampleMember)
{ _getMemberData().exampleMember = newExampleMember; }
protected:
struct SMemberData {
int exampleMember;
};
enum { kMemberDataSize = sizeof(SMemberData) + BASECLASS::kMemberDataSize };
#define compileTimeAssert(x) typedef char _assert_##__LINE__[ ((x) ? 1 : 0) ];
compileTimeAssert(kMemberDataSize <= kMemberDataBufferSize);
inline SMemberData &_getMemberData() {
return *((SMemberData*) memberDataBuffer);
}
inline const SMemberData &_getMemberData() const {
return *((const SMemberData*) memberDataBuffer);
}
};
As you can see, the DataType
base class simply provides a buffer of data which the subclasses may use to store whatever member data they like. While this setup is a bit messy, it clearly works, and without too many hoops to jump through.
Points of Interest
Finally, an excellent complimentary technique to use with this technique is Type Traits. While implementing my Property
class with this new DataType
setup, I realized that it was kind of a pain to specify your DataType
subclass whenever you register a method or member as a property:
Property propList[] = {
Property(
"prop1",
DataType_Prop1(), &MyClass::getProp1,
DataType_Prop1(), &MyClass::setProp1
),
Property(
"prop2",
DataType_Prop2(), &MyClass::getProp2,
DataType_Prop2(), &MyClass::setProp2
),
};
Also, this setup isn't very type-safe, since if I change the return value of MyClass::getProp1
, I would get no warnings or errors, and the program would (at best) crash and burn when I use that property. Ideally, you would declare properties like this:
Property propList[] = {
Property("prop1", &MyClass::getProp1, &MyClass::setProp1),
Property("prop2", &MyClass::getProp2, &MyClass::setProp2),
};
The data type would be pulled from the method declaration and converted into the appropriate DataTyp
e subclass. Luckily for me, my Property
constructor already looked like this:
template <class Class, typename AccessorReturnType, typename MutatorArgType>
Property(
const char *propertyName,
const DataType &accessorDataType, AccessorReturnType (Class::*accessor)(),
const DataType &mutatorDataType, void (Class::*mutator)(MutatorArgType)
) {
set(propertyName, accessorDataType, accessor, mutatorDataType, mutator);
}
So, I already had the data types that I would need: AccessorReturnType
and MutatorArgType
. All I needed to do was have some mechanism to convert those compile-time C++ types into run-time DataType
subclass objects. This is actually easy to do with a template trick called Template Specialization. I won't describe it here, but if you don't already know what it does, feel free to check out the link and come back. It is really powerful.
The basic idea is to have a templated class that is unimplemented, or implemented for the general case. Then, for each special case, we partially or completely specialize our template arguments and implement that as a new class, like this:
template <typename CppType> struct MapCppTypeToDataType;
#define MAP_DATA_TYPE(CppType, MappedDataType) \
template <> struct MapCppTypeToDataType<CppType> { \
typedef MappedDataType Type; \
}
template <typename CppType>
inline DataType GetDataType() {
return MapCppTypeToDataType<CppType>::Type();
}
MAP_DATA_TYPE(int, DataType_int);
DataType myDataType = GetDataType<int>();
You can see how powerful this is. Now, we can add a new Property
constructor that computes the correct DataType
object for you:
template <class Class, typename AccessorReturnType, typename MutatorArgType>
Property(
const char *propertyName,
AccessorReturnType (Class::*accessor)(),
void (Class::*mutator)(MutatorArgType)
) {
set(
propertyName,
GetDataType<AccessorReturnType>(), accessor,
GetDataType<MutatorArgType>(), mutator
);
}
This constructor allows you to declare properties as we would prefer to, like this:
Property propList[] = {
Property("prop1", &MyClass::getProp1, &MyClass::setProp1),
Property("prop2", &MyClass::getProp2, &MyClass::setProp2),
};
You can easily see how much easier and safer this constructor is than the old one. You no longer have to know the type of the method you are registering. The compiler, which already knows it, can simply do the work for you. And this method is safer, because if you change the return type of prop1
now, the compiler will simply change the DataType
that gets used. And, if there isn't a DataType
that supports the new return type, your compiler will give you an error, something along the lines of "Type was not declared in class 'MapCppTypeToDataType' with template parameters ...".
I hope that you've enjoyed reading about this technique. If you have any comments or questions, I'd love to hear them. Thanks for reading!
P.S.: I'm not sure that the code snippets above compile. They were meant only for illustration, not for compilation. However, if you find any errors, please let me know and I'll correct them.
History
- January 24, 2010 - Original article.
- February 01, 2010 - Fixed a few code errors.