Let’s look at two standard data structures: std::vector
and std::list
and what happens to the performance of traveling them sequentially once they are sorted.
In this example, we will generate a std::vector
and std::list
of 10,000,000 randomly generated int
values. We will then travel them sequentially using std::accumulate
and measure the time it took. Finally, we will sort both data structures and repeat the sequential access.
Below are the durations captured on my 2012 MacBook Pro 2.3 GHz Intel Core i7. The code was compiled with maximum optimizations using latest GCC, Apple Clang, and LLVM compilers available for Mac OS:
GCC -Ofast -march=native -lstdc++
List duration 31.795 ms
Sorted list duration 970.679 ms
Vector duration 0.011 ms
Sorted vector duration 0.005 ms
Apple CLANG -Ofast -march=native -lc++
List duration 29.112 ms
Sorted list duration 996.429 ms
Vector duration 0.004 ms
Sorted vector duration 0.004 ms
LLVM CLANG -Ofast -march=native -lc++
List duration 27.358 ms
Sorted list duration 1014.641 ms
Vector duration 0.005 ms
Sorted vector duration 0.005 ms
Let’s talk about the std::vector
first. The time to run std::accumulate
on unsorted and sorted vector is virtually the same. Why? Because std::vector
is a data structure laid out continuously in memory. Regardless of what we do to it, the locality of elements is always present. By accessing Nth element, the CPU brings into L1 cache N+1,…,N+M elements; up to a cache line size (64 bytes on Intel and AMD x64 chips).
What about the std::list
? Why such a performance penalty after sorting?
Remember how a doubly-linked list (usually how std::list
is implemented) is laid out in memory: nodes containing data elements with pointers to previous and next node. By sorting it, I moved the nodes all over the heap; effectively randomizing its layout on the heap and making each node’s next and previous pointers point in random places. The process of sorting destroyed whatever locality may have been there when the list was first created.
The lesson here is to carefully choose your containers. If you are going to sort them and later access them sequentially, then std::vector
is the clear winner.
Complete Listing
#include <iostream>
#include <iomanip>
#include <chrono>
#include <vector>
#include <list>
#include <algorithm>
#include <numeric>
#include <cstdlib>
using namespace std;
using namespace chrono;
const int ELEMS = 10'000'000;
int main(int argc, char** argv)
{
srand((unsigned int)time(NULL));
using Container1 = list<int>;
Container1 c;
c.resize(ELEMS);
generate(begin(c), end(c), rand);
auto start_time = high_resolution_clock::now();
accumulate(begin(c), end(c), 0);
cout << fixed << setprecision(3);
cout << "List duration " << duration_cast<microseconds>
(high_resolution_clock::now() - start_time).count() / 1000.f << " ms" << endl;
c.sort();
start_time = high_resolution_clock::now();
accumulate(begin(c), end(c), 0);
cout << "Sorted list duration " << duration_cast<microseconds>
(high_resolution_clock::now() - start_time).count() / 1000.f << " ms" << endl;
c.clear();
using Container2 = vector<int>;
Container2 c2;
c2.resize(ELEMS);
generate(begin(c2), end(c2), rand);
start_time = high_resolution_clock::now();
accumulate(begin(c2), end(c2), 0);
cout << "Vector duration " << duration_cast<microseconds>
(high_resolution_clock::now() - start_time).count() / 1000.f << " ms" << endl;
sort(begin(c2), end(c2));
start_time = high_resolution_clock::now();
accumulate(begin(c2), end(c2), 0);
cout << "Sorted vector duration " << duration_cast<microseconds>
(high_resolution_clock::now() - start_time).count() / 1000.f << " ms" << endl << endl;
}