Introduction
In a paper titled "Practical and Verifiable C++ Dynamic Cast for Hard Real-Time Systems" released in 2008, Dechev, Mahapatra, and Stroustrup published an idea for using prime numbers in a type casting system to replace dynamic_cast
. There are limitations of this approach, but when applicable, it can offer a vast improvement in performance. dynamic_cast<>
itself does not offer any time guarantee, whereas priori_cast<>
can. And while the library discussed in this paper does not support utilizing inheritance that crosses among dynamic libraries, priori_cast<>
can overcome this as well.
Realize that this is NOT a replacement for dynamic_cast<>
. priori_cast<>
is specifically tuned to allow for fast casting from a base type to a derived type.
Background
Many the experienced developer will tell you that the right time to optimize your code is when your measurements tell you to do so. Working on a large C++ project recently, I started measuring to discover hot spots. Much to my surprise, floating in the top 25 areas of work was the venerable dynamic_cast<>
. While I understood there was overhead to RTTI and, thus, to dynamic_cast<>
, I had failed to appreciate how copious casting could impact performance to such a degree.
Through some minor re-factoring, the overhead of dynamic_cast<>
started dropping out of the top of my set of hot spots. However, I wanted to find a way to use dynamic_cast<>
without paying such a large price. Enter priori_cast<>
.
Implementation
The software I was working on kept a large number of base-class pointers around which it would attempt to up-cast to various types. When successful, object-specific logic would be engaged. One simple and light weight to this type of problem would be to add a "Type
" property to the base class as a poor-man's RTTI. However, in a dynamic plugin architecture, this can be cumbersome and impractical.
priori_cast<>
utilizes the ideas from "Practical and Verifiable C++ Dynamic Cast for Hard Real-Time Systems" and assigns each derivative of a base class a unique prime number. From there, it is possible to perform all the work that dynamic_cast<>
would do, but without the RTTI overhead.
While the current implementation is not completely optimized, it demonstrates that there are practical, dynamic ways to support up-casting from a base class which are measurably (and significantly) faster than dynamic_cast<>
.
Using the Code
All classes used by Priori must inherit from the priori base class. (This is a limitation of the implementation.)
class Base
{
public:
Base();
Base(Base& other);
Base(Base&& other);
virtual ~Base() throw();
bool priori(const int x) const;
protected:
void priori(Base* x);
private:
int prioriFactor;
};
extern int get(priori::Base* x);
extern int get(const std::type_info& x);
In each derived class's constructor, the derived class calls the base class' protected priori
function to register its own type and get assigned its unique factor.
class Alpha : public priori::Base
{
public:
Alpha()
{
this->priori(this);
}
};
After classes which will utilize Priori inherit from this base class, you can then use the templated priori_cast<>
just as you would dynamic_cast<>
.
template<class T, class V> T priori_cast(V base)
{
if(base != nullptr)
{
const auto factor = priori::get(typeid(std::remove_pointer<T>::type));
if((factor != 0) && (base->priori(factor) == true))
{
return reinterpret_cast<T>(base);
}
}
return nullptr;
}
With this template, it becomes natural to be able to replace dynamic_cast<>
with priori_cast<>
where appropriate. Note that dynamic_cast<>
will still work just fine.
Alpha a;
Beta b;
auto baseClassPointer = priori_cast<priori::Base*>(&a);
auto derivedClassPointer = priori_cast<Alpha*>(checkedBaseClassPointer);
auto nullPointer = priori_cast<priori::Base*>(&b);
Results
The whole reason for attempting to replace something as ubiquitous as dynamic_cast
in our code would be for a measurable performance gain. Using the Celero benchmarking framework, a baseline measurement of the cost of various types of dynamic_cast
calls were performed. An inheritance structure either ten objects wide (where ten individual classes inherit from a single common base class) or ten objects deep (where each of ten classes inherit from the previous class) was created. From there, tests were run casting from the base class to a derived class, from a derived class to a base class, and from a class to itself.
The results of the Celero baseline tests are shown below:
[==========]
[ CELERO ]
[==========]
[ STAGE ] Baselining
[==========]
[ RUN ] priori_deep_fromBase.dynamic_cast -- 70 samples, 2000000 calls per run.
[ DONE ] priori_deep_fromBase.dynamic_cast (0.749996 sec) [2000000 calls in 749996 usec]
[0.374998 us/call] [2666680.888965 calls/sec]
[ RUN ] priori_wide_fromBase.dynamic_cast -- 70 samples, 2000000 calls per run.
[ DONE ] priori_wide_fromBase.dynamic_cast (0.741270 sec) [2000000 calls in 741270 usec]
[0.370635 us/call] [2698072.227394 calls/sec]
[ RUN ] priori_deep_toBase.dynamic_cast -- 70 samples, 2000000 calls per run.
[ DONE ] priori_deep_toBase.dynamic_cast (0.028151 sec) [2000000 calls in 28151 usec]
[0.014075 us/call] [71045433.554758 calls/sec]
[ RUN ] priori_wide_toBase.dynamic_cast -- 70 samples, 2000000 calls per run.
[ DONE ] priori_wide_toBase.dynamic_cast (0.028013 sec) [2000000 calls in 28013 usec]
[0.014007 us/call] [71395423.553350 calls/sec]
[ RUN ] priori_deep_toSelf.dynamic_cast -- 70 samples, 2000000 calls per run.
[ DONE ] priori_deep_toSelf.dynamic_cast (0.029868 sec) [2000000 calls in 29868 usec]
[0.014934 us/call] [66961296.370698 calls/sec]
[ RUN ] priori_wide_toSelf.dynamic_cast -- 70 samples, 2000000 calls per run.
[ DONE ] priori_wide_toSelf.dynamic_cast (0.027973 sec) [2000000 calls in 27973 usec]
[0.013987 us/call] [71497515.461338 calls/sec]
[ RUN ] rttiCosts.typeinfo -- 70 samples, 2000000 calls per run.
[ DONE ] rttiCosts.typeinfo (0.027951 sec) [2000000 calls in 27951 usec]
[0.013976 us/call] [71553790.562055 calls/sec]
[==========]
[ STAGE ] Benchmarking
[==========]
[ RUN ] priori_deep_fromBase.priori_cast -- 70 samples, 2000000 calls per run.
[ DONE ] priori_deep_fromBase.priori_cast (0.231020 sec) [2000000 calls in 231020 usec]
[0.115510 us/call] [8657259.111765 calls/sec]
[ BASELINE ] priori_deep_fromBase.priori_cast 0.308028
[ RUN ] priori_wide_fromBase.priori_cast -- 70 samples, 2000000 calls per run.
[ DONE ] priori_wide_fromBase.priori_cast (0.285642 sec) [2000000 calls in 285642 usec]
[0.142821 us/call] [7001771.448176 calls/sec]
[ BASELINE ] priori_wide_fromBase.priori_cast 0.385341
[ RUN ] priori_deep_toBase.priori_cast -- 70 samples, 2000000 calls per run.
[ DONE ] priori_deep_toBase.priori_cast (0.202565 sec) [2000000 calls in 202565 usec]
[0.101283 us/call] [9873373.978723 calls/sec]
[ BASELINE ] priori_deep_toBase.priori_cast 7.195659
[ RUN ] priori_wide_toBase.priori_cast -- 70 samples, 2000000 calls per run.
[ DONE ] priori_wide_toBase.priori_cast (0.202305 sec) [2000000 calls in 202305 usec]
[0.101153 us/call] [9886063.122513 calls/sec]
[ BASELINE ] priori_wide_toBase.priori_cast 7.221826
[ RUN ] priori_deep_toSelf.priori_cast -- 70 samples, 2000000 calls per run.
[ DONE ] priori_deep_toSelf.priori_cast (0.231435 sec) [2000000 calls in 231435 usec]
[0.115718 us/call] [8641735.260440 calls/sec]
[ BASELINE ] priori_deep_toSelf.priori_cast 7.748594
[ RUN ] priori_wide_toSelf.priori_cast -- 70 samples, 2000000 calls per run.
[ DONE ] priori_wide_toSelf.priori_cast (0.286469 sec) [2000000 calls in 286469 usec]
[0.143234 us/call] [6981558.213978 calls/sec]
[ BASELINE ] priori_wide_toSelf.priori_cast 10.240911
[ RUN ] rttiCosts.typeinfoHash -- 70 samples, 2000000 calls per run.
[ DONE ] rttiCosts.typeinfoHash (0.317680 sec) [2000000 calls in 317680 usec]
[0.158840 us/call] [6295643.414757 calls/sec]
[ BASELINE ] rttiCosts.typeinfoHash 11.365604
[ RUN ] rttiCosts.typeinfoName -- 70 samples, 2000000 calls per run.
[ DONE ] rttiCosts.typeinfoName (0.054228 sec) [2000000 calls in 54228 usec]
[0.027114 us/call] [36881315.925352 calls/sec]
[ BASELINE ] rttiCosts.typeinfoName 1.940109
[==========]
[ STAGE ] Completed. 15 tests complete.
[==========]
From these measurements, it is shown that priori_cast
has a runtime cost that is roughly between 31% and 39% that of dynamic_cast
so long as you are casting from a base class to a derived class. That can mean a savings of around 60%. This performance gain is tempered by the fact that priori_cast
cost up to 11.3x more than dynamic_cast<>
for other casting scenarios.
Future Work
The concept of priori_cast<>
is proved with this library insofar as it can dramatically cut down on the cost of casting from a base class to a derived class. While this implementation's performance is fairly consistent across use cases, it is too slow to be a general replacement for dynamic_cast<>
. Work should be done to lower the overall cost of priori_cast<>
in some common special cases (such as casting to self).
Points of Interest
Benchmarks should always be performed on Release builds. Never measure the performance of a Debug build and make changes based on the results. The (optimizing) compiler is your friend with respect to code performance.
Priori has Doxygen documentation of its API.
View the project on GitHub.
History
- 20th June, 2013 - First published
- 28th June, 2013: Source code updated with vastly improved performance