C++17 standard introduced execution policies to the standard algorithms; those allow for parallel and SIMD optimizations. I wanted to see how much faster the Parallel STL can be on my quad core system but none of my compilers currently support it. Luckily, Intel has implemented it and made it available to the world.
On a side note, in this postβs example, I will be using several frameworks: TBB needed to compile the Parallel STL. And Catch2 to create the test benchmark. All are freely available on GitHub. BTW, thanks to Benjamin from Thoughts on CPP for pointing me toward the Catch2 library. Itβs great for creating unit tests and benchmarks.
Letβs benchmark the following operations using STL and PSTL: generating random numbers, sorting the generated random numbers, finally verifying if theyβre sorted. The performance increase on my quad core 2012 MacBook Pro with i7 2.3GHz is about 5x! Nice!
Program output:
Benchmark name Iters Elapsed ns Average
ββββββββββββββββββββββββββββββββββββββββββββββββββββ
STL 1 10623612832 10.6236 s
PSTL 1 1967239761 1.96724 s
#define CATCH_CONFIG_MAIN
#include <catch2/catch.hpp>
#include <vector>
#include <random>
#include <algorithm>
#include <pstl/execution>
#include <pstl/algorithm>
using namespace std;
using namespace pstl;
const unsigned long long COUNT = 100'000'000;
TEST_CASE("STL vs PSTL", "[benchmark]")
{
auto seed = random_device{}();
vector<int> data(COUNT);
BENCHMARK("STL")
{
generate(data.begin(), data.end(), mt19937{seed});
sort(data.begin(), data.end());
is_sorted(data.begin(), data.end());
}
BENCHMARK("PSTL")
{
generate(pstl::execution::par_unseq, data.begin(), data.end(), mt19937{seed});
sort(pstl::execution::par_unseq, data.begin(), data.end());
is_sorted(pstl::execution::par_unseq, data.begin(), data.end());
}
}