Simple Performance Timer

Bartlomiej Filipek

4.21/5 (5 votes)

11 Jan 2016CPOL6 min read

22.4K

387

Description of a simple performance timer that can be used to profile cpp code in your app

Download source - 4.7 KB

Introduction

When you’re doing a code profiling session, it’s great to have advanced and easy to use tools. But what if we want to do some simple test/benchmark? Maybe a custom code would do the job?

Let’s have a look at simple performance timer for C++ apps.

A task might sound simple: detect what part of the code in the ABC module takes most of the time to execute. Or another case: compare execution time between the Xyz algorithm and Zyx.

Sometimes, instead of using and setting up advanced profiling tools, I just use my custom profiling code. Most of the time, I only need a good timer and a method to print something on the screen/output. That’s all. Usually, it can be enough for most of the cases… or at least a good start for a more deep and advanced profiling session.

Little Spec

What do we want?

I’d like to measure execution time of any function in my code and even part of a routine.
The profiling code that needs to be added to routines must be very simple, ideally just one line of additional code.
There should be a flag that will disable/enable profiling globally

Other existing libraries:

google/benchmark - https://github.com/google/benchmark -
Celero - article at Codeproject
Nonius -https://nonius.io/

The libraries listed above are much more powerful, equipped with some statictical features can give you a lot of information about the code that you're testing. The approach described in the article suits rather small projects or quick experiments, with the little amount of extra work.

Timer

Good timer is the core of our mechanism.

Here is a brief summary of available options:

RDTSC instruction - It returns number of CPU cycles since the reset, 64 bit variable. Using this instruction is very low-level, but probably this is not what we need. Cpu cycles aren’t steady time events: power saving, context switching… See an interesting read from RandomAscii: rdtsc in the Age of Sandybridge.
High performance timer on Windows - see Acquiring high-resolution time stamps. It gives highest possible level of precision (<1us).
GetTickCount - 10 to 16 milliseconds of resolution
timeGetTime - Uses system clock (so the same resolution as GetTickCount), but resolution can be increased up to even 1ms (via timeBeginPeriod). See full comparison between GetTickCount vs timeGetTime at RandomASCII blog
std::chrono - Finally, there are timers from STL library!
- system_clock - System time
- steady_clock - Monotonic clock, see the diff between system_clock at this SO question
- high_resolution_clock - Highest possible resolution, multiplatform! Warning: it might be alias for system or steady clock… depending on the system capabilities.

Obviously, we should generally use std::high_resolution_clock, unfortunately, it’s not working as expected in VS2013 (where I developed the original solution).
This is fixed in VS 2015: see this blog post from vs team.

In general, if you’re using the latest compilers/libraries, then std::chrono will work as expected. If you have some older tools, then it’s better to double-check.

Output

Where do we want to get the results? In simple scenarios, we might just use printf/cout. The other option is to log directly to some log file or use Debug View.

Performance Cost

Measuring some effect can alter the results. How much elapsed time is affected by our profiling code? If it takes proportionally quite a long time (against the code that we measure), we might need to defer the process somehow.

For example, if I want to measure execution time for just a small method that runs in a few microseconds, writing output to a file (each time the method is called) might be longer then the whole function!

So we can measure just the elapsed time (assuming that it’s very fast) and defer the process of writing the data to output.

Solution

Simple as it is:

C++

void longFunction()
{
    SIMPLEPERF_FUNCSTART;

    SIMPLEPERF_START("loop ");
    for (int i = 0; i < 10; ++i)
    {
        SIMPLEPERF_SCOPED("inside loop ");
        //::Sleep(10);
        internalCall();
    }
    SIMPLEPERF_END;
}

which shows at the end of the program:

main : 14837.797000
  longFunction : 0.120000
    loop  : 0.109000
      inside loop  : 0.018000
        internalCall : 0.008000
      inside loop  : 0.011000
        internalCall : 0.009000
      ...
      inside loop  : 0.005000
        internalCall : 0.002000
  shortMethod : 15.226000
    loop  : 15.222000

We have 3 basic macros that can be used:

* SIMPLEPERF_FUNCSTART - just put it at the beginning of the function/method. It will show the name of the function and print how long it took to execute
* SIMPLEPERF_SCOPED(str) - place it at the beginning of a scope
* SIMPLEPERF_START(str) - place it inside a function, as a custom marker, where you don’t have a scope opened.
* SIMPLEPERF_END - need to close SIMPLEPERF_START
* Plus:
* add #include "SimplePerfTimer.h
* enable it by setting #define ENABLE_SIMPLEPERF (also in SimplePerfTimer.h for simplicity)

Additionally, the code supports two modes:

Immediate: will print just after the elapsed time is obtained. Printing might affect some performance.
Retained: will collect the data so that it can be shown at the end of the program.

In retained mode, we can call:

SIMPLEPERF_REPORTALL - show the current data
SIMPLEPERF_REPORTALL_ATEXIT - will show the data but after main() is done. Can be called any time in the program actually.

The flag #define SIMPLEPERF_SHOWIMMEDIATE true needs to be set to true to use retained mode.

Problems

The whole timer might not work in multicore, multithreaded code since it does not use any critical sections to protected shared data, or doesn’t care about thread that the code is running. If you need a more advanced timer, then you will be interested in the article at Preshing on Programming: A C++ Profiling Module for Multithreaded APIs.

Implementation Details

The core idea for the timer is to use destructor to gather the data. This way, when some timer object goes out of the scope, we’ll get the data. This is handy especially for whole functions/explicit scopes.

C++

{ // scope start
   my_perf_timer t;
}

In a basic immediate form, the timer just saves time (using QueryPerformanceCounter) in the constructor and then in the destructor measures end time and prints it to the output.

In the retained mode, we also need to store that data for future use. I simply create a static vector that adds a new entry in the constructor and then fills the final time in the destructor. I also take care about indents, so that the output looks nice.

In the repo, there is also a header only version (a bit simplified, using only immediate mode): see SimplePerfTimerHeaderOnly.h.

Here is a picture showing timer results in Debug view:

Todo

Add file/line information when printing the data?
Use std::chrono for VS2015/GCC version

Summary

This post described a handy performance timer. If you just need to check execution time of some code/system, just include a header (+and add related .cpp file) and use SIMPLEPERF_FUNCSTART or SIMPLEPERF_START(str)/END in analyzed places. The final output should help you find hotspots… all without using advanced tools/machinery.

Resources

MSDN: Acquiring high-resolution time stamps
MSDN: Game Timing and Multicore Processors
Preshing on Programming: A C++ Profiling Module for Multithreaded APIs
Codeproject: Timers Tutorial
StackOverflow: resolution of std::chrono::high_resolution_clock doesn’t correspond to measurements

History

11^th January, 2016 - Initial version, based on my blog post

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)