Introduction
We do like benchmarks, don’t we? Developers often seem to be obsessively interested in performance, and it’s not all that uncommon to find that somebody is willing to spend more time on a particular piece of code, than that code will ever spend running.
At other times, we really need to satisfy timing requirements, and then it’s pretty handy to have some sort of stopwatch functionality available.
In this article, we're going to use a class called Stopwatch
to measure the performance characteristics of two classes, Buffer
and Buffer2
- the first one is copy constructable and copy assignable, while the latter is also move constructable and move assignable.
The results demonstrate:
- That using a C++
std::vector<T>
is as efficient as using a raw array of the same type. - That implementing the move constructor and move assignment operator, can boost the performance of your code, but only when your code is written to take advantage of these features.
- That
std::shared_ptr<T>
assignment is very nearly as efficient as raw pointer assignment. Populating the vectors using std::make_shared<T>
is as efficient as populating the vectors with the move constructable/assignable type.
If we replace the compare function implemented in the Stopwatch
example with something much simpler:
int compare(const Buffer<T>& other )
{
return ptr[0] - other.ptr[0];
}
we get some really interesting results:
- Sorting an array of pointers to
Buffer<T>&
took 0.088100 ms - Sorting an array of
std::shared_ptr<>
took 0.098500 ms - Sorting an array of move assignable/constructable elements took 0.122000 ms
- Sorting an array of copy assignable/constructable elements took 4513.762700 ms
As you see, implementing the move assignment operator and the move constructor gives us an immense performance improvement.
Later, we will use full buffer comparison, which is pretty expensive when two buffers contains the same values, since this is what we would normally expect when we compare two buffers.
External Libraries
The code relies on the Boost C++ libraries, which can be downloaded from http://www.boost.org[^], so you need to download and build them, before updating the provided projects with include and library paths matching your installation.
Stopwatch
The Stopwatch
class is located in hwindatetime.h, along with the DateTime
and TimeSpan
classes, with an interface that's intentionally similar to .NETs' System.Diagnostics.Stopwatch
. Keeping things similar to proven and well documented designs is something I can heartily recommend.
class Stopwatch
{
long long elapsedTicks;
long long startedAt;
bool isRunning;
static long long frequency;
public:
HWIN_EXPORT static const bool IsHighResolution;
private:
static double tickFrequency;
static bool InitializeStopwatch();
long long GetElapsedDateTimeTicks() const;
public:
HWIN_EXPORT static long long GetTimestamp();
HWIN_EXPORT static long long Frequency();
HWIN_EXPORT Stopwatch();
HWIN_EXPORT void Start();
HWIN_EXPORT Stopwatch StartNew();
HWIN_EXPORT void Stop();
HWIN_EXPORT void Reset();
HWIN_EXPORT void Restart();
HWIN_EXPORT bool IsRunning() const;
HWIN_EXPORT TimeSpan Elapsed() const;
HWIN_EXPORT long long ElapsedMilliseconds() const;
HWIN_EXPORT long long ElapsedTicks() const;
};
The DateTime
and TimeSpan
classes are also starting to get more or less feature complete.
Internally, the Stopwatch
class uses the Windows API QueryPerformanceFrequency[^] and QueryPerformanceCounter[^] functions to provide highly accurate timing information, and if the installed hardware does not support a high-resolution performance counter, the Stopwatch
class will fall back on the timing information provided by the DateTime
class.
Buffer & Buffer2
The test will populate either vectors, arrays, or arrays of pointers - with objects based on the following two template
classes.
template<typename T>
class Buffer
{
size_t length;
T* ptr;
public:
Buffer();
Buffer(const Buffer& other);
Buffer(const Buffer& first,const Buffer& second);
Buffer( size_t theLength, const T& theValue);
~Buffer();
int compare(const Buffer& other) const;
bool operator < ( const Buffer& other );
bool operator > ( const Buffer& other );
Buffer& operator = ( const Buffer& other );
inline friend Buffer operator + (const Buffer& first,const Buffer& second);
};
The compare
function performs, if required, a full comparison of the arrays of type T
pointed to by ptr
, which is pretty common behaviour for buffer comparisons - and we're going to use this function, either directly, or indirectly through the <
operator to do the sorting performed by the tests.
template<typename T>
class Buffer2
{
size_t length;
T* ptr;
public:
Buffer2();
Buffer2(const Buffer2& other);
Buffer2(Buffer2&& other);
Buffer2(const Buffer2& first,const Buffer2& second);
Buffer2( size_t theLength, const T& theValue);
~Buffer2();
int compare(const Buffer2& other) const;
bool operator < ( const Buffer2& other );
bool operator > ( const Buffer2& other );
Buffer2& operator = ( const Buffer2& other );
Buffer2& operator = ( Buffer2&& other );
inline friend Buffer2 operator + (const Buffer2& first,const Buffer2& second);
};
Using the Stopwatch Class
Using the Stopwatch
class is pretty easy:
void Buffer2TestPushBack(int bufferCount, size_t bufferSize)
{
typedef Buffer2 <char> buffer_t;
Stopwatch stopwatch;
vector< buffer_t > buffers;
buffers.reserve(bufferCount);
You just need to declare an instance, stopwatch
, and you're able to get highly accurate timing information.
To start the stopwatch
, you just need to call Start()
, and then perform the operations you want timing information for:
printf("Test vector push_back: move constructible/move assignable\n");
stopwatch.Start();
for(int i = 0; i < bufferCount; i++)
{
char c = i%24;
buffers.push_back(buffer_t(size_t(bufferSize),c));
}
before calling Stop()
to stop the stopwatch
.
stopwatch.Stop();
printf("\tpush_back of %d elements in %f ms\n", bufferCount ,
stopwatch.Elapsed().TotalMilliseconds() );
Elapsed()
returns a TimeSpan
representing the interval which has elapsed between your call to Start()
and Stop()
. We use the TotalMilliseconds()
function of the TimeSpan
class to retrieve the timing information as a double
.
A Stopwatch
can be restarted using the Restart()
function:
stopwatch.Restart();
sort(buffers.begin(),buffers.end());
stopwatch.Stop();
printf("\tsorted %d elements in %f ms\n\n", bufferCount ,
stopwatch.Elapsed().TotalMilliseconds() );
So, reusing an existing Stopwatch
is easy.
The Tests
Since we want to perform sorting, it is customary to ensure that the data is ordered in the same way, while still causing the sort to actually perform some operations on our vectors and arrays. We ensure this by...
for(int i = 0; i < bufferCount; i++)
{
char c = i%24;
buffers.push_back(buffer_t(size_t(bufferSize),c));
}
...filling the buffer at the same position in the vectors or arrays, with values based on a modulus operation on i
. buffer_t
is either a typedef Buffer <char> buffer_t
or typedef Buffer2 <char> buffer_t
. The size of the buffer will in all cases be 1843200 bytes (640*480*8) and the vectors or arrays will have a length of 500.
The first test uses a vector < Buffer <char> >
Test vector push_back: copy constructible/copy assignable
push_back of 500 elements in 1222.143000 ms
sorted 500 elements in 7062.886100 ms
While the next uses two vector < Buffer <char> >
instances, and copies elements from one to the other.
Test vector copy: copy constructible/copy assignable
copy of 500 elements in 609.315100 ms
sorted 500 elements in 6947.904100 ms
The next test uses a vector < Buffer2 <char> >
and since Buffer2 <char>
is move assignable/constructable, we see a decent improvement in performance since Buffer2 <char>
elements are created by buffers.push_back(buffer_t(size_t(bufferSize),c));
immediately goes out of scope at the end of the expression, making it an rvalue
; which lets the compiler use the move constructor and move assignment operator to forward instance data to the vector.
Test vector push_back: move constructible/move assignable
push_back of 500 elements in 482.005300 ms
sorted 500 elements in 2628.228400 ms
As we can see, sorting also sees a fair performance gain.
It's worth noting that move semantics does not provide us with any performance improvement when we use two vector < Buffer2 <char> >
instances, and copies elements from one to the other. This is because the source element does not go out of scope when we perform the...
for(int i = 0; i < bufferCount; i++)
{
destinationBuffers[i] = sourceBuffers[i];
}
...assignment from source to destination.
Test vector move constructible/move assignable
copy of 500 elements in 610.708300 ms
sorted 500 elements in 2624.473200 ms
For the next test, we use a vector < Buffer <char>* >
, and the timing for the sort really tells us something valuable about move semantics as the value is nearly identical to the two previous tests. Sorting a vector of a type that implements the move constructor and the move assignment operator, is, in our case, as efficient as sorting a vector of pointers.
Test vector of raw pointers
push_back of 500 pointers in 509.466600 ms
sorted 500 pointers in 2618.940200 ms
Assigning from one vector < Buffer <char>* >
to another is fast.
Test vector of raw pointers
copy of 500 pointers in 0.001900 ms
sorted 500 pointers in 2630.240200 ms
So what about std::shared_ptr
?
Test vector: push_back of shared_ptrs'
push_back of 500 shared_ptrs' in 480.064000 ms
sorted 500 shared_ptrs' in 2617.170900 ms
Using vector< shared_ptr<Buffer<char> > >
is about as fast as using a vector < Buffer2 <char> >
- which shows that std::shared_ptr
is blazingly fast, particularly when we assign one std::shared_ptr
to another:
Test vector: Copy of shared_ptrs'
copied 500 shared_ptrs' in 0.009600 ms
sorted 500 shared_ptrs' in 2629.669700 ms
What if we use two arrays of Buffer<char>*
? Turns out we got the same performance with vector < Buffer <char>* >
.
Test array of raw pointers
copy of 500 pointers in 0.001900 ms
sorted 500 pointers in 2626.685100 ms
And if we use two arrays of Buffer<char>
, we get results similar to the performance we got with vector < Buffer <char> >
.
Test array: copy constructible/copy assignable
copy of 500 elements in 613.808700 ms
sorted 500 elements in 7110.636500 ms
Did using a lambda...
sort(first,last, [] (const buffer_t& a, const buffer_t& b) { return a.compare(b) < 0;});
...impact one the result? Nope ..., not particularly surprising, but well worth knowing.
Test array: Copy constructible/copy assignable (no lambda)
copy of 500 elements in 623.144300 ms
sorted 500 elements in 7121.363300 ms
Test array: Copy of move constructible/move assignable (no lambda)
copy of 500 elements in 611.294200 ms
sorted 500 elements in 2621.753100 ms
Neither does there seem to be any difference when we use an array of std::shared_ptr
.
Test array: shared_ptrs'
copy of 500 shared_ptrs' in 0.009200 ms
sorted 500 shared_ptrs' in 2617.208300 ms
Conclusion
There doesn't seem to be any benefit to using raw arrays in place of std::vector
, which is something that's good to know, and once you have created a std::shared_ptr
, passing it around is very efficient too.
History
- 16th October, 2012 - Initial posting
- 20th October, 2012 - A few bugfixes combined with extensive changes to the library
- 20th October, 2012 - Library updates
- 20th August, 2014 - More than a few updates and bug-fixes
- 3rd January, 2015 - A few new classes, some updates and a number of bug-fixes