(untagged)

Stable Iterators for C++ Vectors and Why You Need Them

Noah L

0.00/5 (No votes)

21 Mar 2016

A vector iterator that facilitates the replacement of slow lists with fast vectors, as Bjarne decreed.

Download source - 27.8 KB

Introduction

In his talk at GoingNative'12, Bjarne told us to essentially substitute all uses of std::lists with std::vectors because std::vectors are (perhaps unintuitively) almost always faster than std::lists on modern hardware due to cache coherency. Okay. The only problem is that when insertion or removal operations are involved, std::vector::iterators behave differently from std::list::iterators and can't be used as a direct substitute. Indeed, upon an insertion operation (including push_back()), a "reallocation" event may occur causing all iterators to the std::vector to become invalid. To be blunt, when insertion or removal operations are involved, std::vector::iterators can be generously considered not very useful, and more prudently considered downright dangerous. It would be nice if we had std::vector::iterators that behaved like std::list::iterators. You could consider a container like boost::container::stable_vector, but despite its name, it is not exactly a vector and has significantly different performance characteristics, as will be discussed later.

mse::msevector, on the other hand, is a vector and it has iterators that behave like std::list::iterators. mse::msevector is very much compatible with std::vector, and can be thought of as an enhanced or extended version of std::vector. This compatibility means that mse::msevector::iterator is just a regular vector iterator that does not behave like an std::list::iterator, but mse::msevector has an additional iterator type called mse::msevector::ipointer that does. That is, iterators of this type point to items in the container (as opposed to positions in the container), they don't get invalidated when other elements are inserted or removed from the container, and they can be used with algorithms that operate on std::list::iterators.

Using the Code

To use mse::msevector, just copy two files, mseprimitives.h and msemsevector.h, into your project (or "include" directory). There are no other dependencies.

#include "msemsevector.h"

int main() {
    mse::msevector<int> v = { 1, 2, 3, 4 };
    mse::msevector<int>::ipointer ip1 = v.ibegin();
    ip1 += 2;
    assert(3 == (*ip1));
    auto ip2 = v.ibegin(); /* ibegin() returns an ipointer */
    v.erase(ip2); /* remove the first item */
    assert(3 == (*ip1)); /* ip1 continues to point to the same item, not the same position */
    ip1--;
    assert(2 == (*ip1));
    for (mse::msevector<int>::cipointer cip = v.cibegin(); v.ciend() != cip; cip++) {
        /* You might imagine what would happen if cip were a regular vector iterator. */
        v.insert(v.ibegin(), (*cip));
    }

    /* Btw, ipointers are compatible with stl algorithms, like any other stl iterators. */
    std::sort(v.ibegin(), v.iend());

    /* And just to be clear, mse::msevector also has the (high performance) std::vector iterators and
    can be used as a drop-in substitue for std::vector. */
    std::sort(v.begin(), v.end());
}

Technical Considerations

It's probably most instructive to compare mse::msevector to boost::container::stable_vector. The stable_vector maintains iterator validity by avoiding relocation of any elements during insertion and removal operations. msevector, on the other hand, maintains iterator validity by having the iterators actively follow their target elements when they are relocated. So, while msevector is generally significantly faster than the stable_vector, you could imagine a pathological case with relatively few container elements and thousands or millions of iterators pointing to them, that would slow down msevector's insertion and removal operations.

The original author of the stable_vector provides a benchmark for it. We'll just add a couple of msevector scenarios to the benchmark. One without any ipointers, and one with three ipointers configured to point to elements that are relocated on every insert() and push_back() operation. (The only element that is relocated during push_back() operations is the "end marker", unless a "reallocation" event occurs, in which case every element is relocated.) The original benchmark considered small elements (ints) and big elements. Here I do not consider big elements. My reasoning being that if vector performance is being hurt by the girth or copy cost of its elements, you can always substitute those elements with smaller elements that contain only a pointer to the large object, and maybe a search/ordering key.

The following table shows execution times relative to the fastest.

	std::vector	msevector	msevector with 3 ipointers	boost stable_vector	std::deque
`push_back`	1.00	1.14	4.65	10.68	2.03
`insert`	1.00	1.02	1.01	13.77	5.19
`operator[]`	1.00	1.42	1.38	9.83	16.70

Benchmark environment: msvc2015/x64/Window 7/Haswell

The benchmark timings are a bit noisy, but the gist is clear. The ipointer "tracking" overhead shows up in the "push_back" timings. Not so much because the ipointer overhead is particularly high as it is because push_back() is just an intrinsically fast operation. Certainly compared to insert(), where the ipointer overhead is negligible in comparison. The overhead in "operator[]" is due to msevector's built-in bounds-checking. There is no ipointer involvement in that operation.

So there you have it, everything you need to replace your slow lists with cache friendly vectors.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here