Introduction
When you’re doing a code profiling session, it’s great to have advanced and easy to use tools. But what if we want to do some simple test/benchmark? Maybe a custom code would do the job?
Let’s have a look at simple performance timer for C++ apps.
A task might sound simple: detect what part of the code in the ABC module takes most of the time to execute. Or another case: compare execution time between the Xyz algorithm and Zyx.
Sometimes, instead of using and setting up advanced profiling tools, I just use my custom profiling code. Most of the time, I only need a good timer and a method to print something on the screen/output. That’s all. Usually, it can be enough for most of the cases… or at least a good start for a more deep and advanced profiling session.
Little Spec
What do we want?
- I’d like to measure execution time of any function in my code and even part of a routine.
- The profiling code that needs to be added to routines must be very simple, ideally just one line of additional code.
- There should be a flag that will disable/enable profiling globally
Other existing libraries:
The libraries listed above are much more powerful, equipped with some statictical features can give you a lot of information about the code that you're testing. The approach described in the article suits rather small projects or quick experiments, with the little amount of extra work.
Timer
Good timer is the core of our mechanism.
Here is a brief summary of available options:
- RDTSC instruction - It returns number of CPU cycles since the reset, 64 bit variable. Using this instruction is very low-level, but probably this is not what we need. Cpu cycles aren’t steady time events: power saving, context switching… See an interesting read from RandomAscii: rdtsc in the Age of Sandybridge.
- High performance timer on Windows - see Acquiring high-resolution time stamps. It gives highest possible level of precision (<1us).
GetTickCount
- 10 to 16 milliseconds of resolution timeGetTime
- Uses system clock (so the same resolution as GetTickCount
), but resolution can be increased up to even 1ms (via timeBeginPeriod
). See full comparison between GetTickCount vs timeGetTime at RandomASCII blog std::chrono
- Finally, there are timers from STL library!
system_clock
- System time steady_clock
- Monotonic clock, see the diff between system_clock at this SO question high_resolution_clock
- Highest possible resolution, multiplatform! Warning: it might be alias for system or steady clock… depending on the system capabilities.
Obviously, we should generally use std::high_resolution_clock
, unfortunately, it’s not working as expected in VS2013 (where I developed the original solution).
This is fixed in VS 2015: see this blog post from vs team.
In general, if you’re using the latest compilers/libraries, then std::chrono
will work as expected. If you have some older tools, then it’s better to double-check.
Output
Where do we want to get the results? In simple scenarios, we might just use printf
/cout
. The other option is to log directly to some log file or use Debug View.
Measuring some effect can alter the results. How much elapsed time is affected by our profiling code? If it takes proportionally quite a long time (against the code that we measure), we might need to defer the process somehow.
For example, if I want to measure execution time for just a small method that runs in a few microseconds, writing output to a file (each time the method is called) might be longer then the whole function!
So we can measure just the elapsed time (assuming that it’s very fast) and defer the process of writing the data to output.
Solution
Simple as it is:
void longFunction()
{
SIMPLEPERF_FUNCSTART;
SIMPLEPERF_START("loop ");
for (int i = 0; i < 10; ++i)
{
SIMPLEPERF_SCOPED("inside loop ");
internalCall();
}
SIMPLEPERF_END;
}
which shows at the end of the program:
main : 14837.797000
longFunction : 0.120000
loop : 0.109000
inside loop : 0.018000
internalCall : 0.008000
inside loop : 0.011000
internalCall : 0.009000
...
inside loop : 0.005000
internalCall : 0.002000
shortMethod : 15.226000
loop : 15.222000
We have 3 basic macros that can be used:
* SIMPLEPERF_FUNCSTART
- just put it at the beginning of the function/method. It will show the name of the function and print how long it took to execute
* SIMPLEPERF_SCOPED(str)
- place it at the beginning of a scope
* SIMPLEPERF_START(str)
- place it inside a function, as a custom marker, where you don’t have a scope opened.
* SIMPLEPERF_END
- need to close SIMPLEPERF_START
* Plus:
* add #include "SimplePerfTimer.h
* enable it by setting #define ENABLE_SIMPLEPERF
(also in SimplePerfTimer.h for simplicity)
Additionally, the code supports two modes:
- Immediate: will print just after the elapsed time is obtained. Printing might affect some performance.
- Retained: will collect the data so that it can be shown at the end of the program.
In retained mode, we can call:
SIMPLEPERF_REPORTALL
- show the current data SIMPLEPERF_REPORTALL_ATEXIT
- will show the data but after main()
is done. Can be called any time in the program actually.
The flag #define SIMPLEPERF_SHOWIMMEDIATE true
needs to be set to true
to use retained mode.
Problems
The whole timer might not work in multicore, multithreaded code since it does not use any critical sections to protected shared data, or doesn’t care about thread that the code is running. If you need a more advanced timer, then you will be interested in the article at Preshing on Programming: A C++ Profiling Module for Multithreaded APIs.
Implementation Details
The core idea for the timer is to use destructor to gather the data. This way, when some timer
object goes out of the scope, we’ll get the data. This is handy especially for whole functions/explicit scopes.
{ my_perf_timer t;
}
In a basic immediate form, the timer just saves time (using QueryPerformanceCounter
) in the constructor and then in the destructor measures end time and prints it to the output.
In the retained mode, we also need to store that data for future use. I simply create a static
vector that adds a new entry in the constructor and then fills the final time in the destructor. I also take care about indents, so that the output looks nice.
In the repo, there is also a header only version (a bit simplified, using only immediate mode): see SimplePerfTimerHeaderOnly.h.
Here is a picture showing timer
results in Debug view:
Todo
- Add file/line information when printing the data?
- Use
std::chrono
for VS2015/GCC version
Summary
This post described a handy performance timer. If you just need to check execution time of some code/system, just include a header (+and add related .cpp file) and use SIMPLEPERF_FUNCSTART
or SIMPLEPERF_START(str)/END
in analyzed places. The final output should help you find hotspots… all without using advanced tools/machinery.
Resources
History