This article reviews performance of C++ standard library sequence containers on frequently occurring operations, and examines performance of another container class, the ring, or circular buffer, that is a candidate for inclusion in a future edition of C++.
Background
Memory access dominates the cost of execution in PCs and phone handsets. Memory accesses arise from fetching instructions for execution, and especially from data reads and writes to memory. Calls into the C++ memory manager to allocate and free dynamic variables execute many instructions, and make scattered access to memory, which makes these calls expensive. Since the C++ standard library container classes allocate dynamic variables, understanding their behavior is important when turning performance.
Measuring Performance of Sequence Containers
Chapter 10 of my book, Optimized C++, discusses efficient use of C++ standard library container classes and compares their performance. In the book, I reported the result of experiments to measure aspects of the performance of the sequence containers std::vector
, std::deque
, and std::list
. (and std::forward_list
whose performance is indistinguishable from std::list
). The book contains many details of this experiment. In summary, I measured the time cost of insertion and deletion, traversal, searching, and sorting, on containers of 100,000 items. Each item contained a random null-terminated char[]
key and an int
value.
Of course, no simple test captures all aspects of performance. The absolute performance numbers depend obviously upon test structure, compiler version, processor, and other factors. However, I discovered that performance of one container relative to another container given the same test was consistent across processors, compiler versions, and operating systems. Thus, it was possible to conclude that one container performed a task more efficiently than another, and even to say how much more efficiently.
For this article, I re-ran these experiments on Microsoft's Visual Studio 2017. I also performed new experiments to measure the performance of sequence containers when implementing queue-like behavior. The performance of standard containers std::deque
and std::list
, which can be used as queues, is limited by their extensive use of dynamically allocated variables. I performed additional experiments using another container class, boost::circular_buffer
, which I expected to have good performance when implementing queues. As expected, circular_buffer
performed far better than list or deque on many operations, and was generally comparable to std::vector
. Subcommittee sg14 of the C++ standard committee is actively debating a proposal to add a circular buffer container to the next version of the C++ standard. This makes the circular buffer a data structure worth getting to know.
All the tests reported in this article were performed on an i7-4650 tablet, running Windows 10, and compiled as a 32-bit release build in Visual Studio 2017, though I have experience with Visual Studio versions going back to 2010.
Queues as an Application of Standard Data Structures
Queues are important, frequently used data structures. A queue is a container for data. It has two ends, labeled the front and back. Data is inserted onto the back of the queue, maintained in sequence within the queue, and removed from the front of the queue in the same sequence as it was inserted. Among many other uses, queues buffer record-oriented data when a producer and consumer of that data operate asynchronously. Queues order the search when a tree or graph data structure is traversed breadth-first.
A container class suitable for implementing a queue must keep inserted items in sequence. (It must be a sequence container in C++ parlance). It must have efficient, preferably constant-time functions to insert items on one end and to remove them from the other. Standard C++ container classes std::deque
, std::list
, and std::forward_list
have these properties. However, perhaps surprisingly given how frequently queues occur in code, none are suited to perform this role efficiently. Another type of container, the ring, or circular buffer, not currently in the C++ standard library, offers a more efficient queue implementation.
std::deque
(deque is short for double-ended queue), given its name, seems like an obvious choice for implementing a queue. Deque meets the requirements above for implementing queues. Deque is a sequence container. Insertion and deletion at either end are constant-time operations. Deque has the additional useful property that any item may be visited in constant time using the subscript operator, as in q[i]
. Deque iterators are random-access iterators, which allows a deque to be sorted efficiently using std::sort()
, and a sorted deque to be efficiently binary-searched using algorithms in the C++ standard library.
Unfortunately, much of std::deque
's behavior relies on hidden implementation magic. Although subscripting is constant-time, this property is not strong enough to imply that consecutive deque items are in adjacent memory locations. Deque is implemented as a variable-length array of fixed-length arrays of items; the subscript operator does some arithmetic to find the offset within these arrays at which to find the subscripted deque item. The C++ standard contains very precise language about iterator invalidation in section 23.3.3.4 that enshrines the array-of-arrays implementation in the standard, without actually describing it. Thus, each time an item is inserted onto the end of a deque, there may be zero, one, or even two calls to the memory manager to allocate memory. When an item is removed, there may be a call into the memory manager to free an array of nodes.
Worse yet, some implementations limit the size of the fixed-length arrays of items. For sufficiently large items, the item array may contain only a single item. If this happens, each item insertion calls into the memory manager, making deque about as slow as list when implementing queues. Finally, deque member functions that change capacity affect the back of the deque. Deque thus isn't truly double-ended; inserting at the back is more efficient.
std::list
(and std::forward_list
since C++11) meet the minimum requirements for implementing queues. They are sequence containers with constant-time insertion and removal from either end. Beyond the minimum requirements, a list can be sorted efficiently, however it cannot be efficiently searched, and list items cannot be visited using the subscript notation.
std::list
and std::forward_list
share a significant weakness. They are implemented as linked lists of dynamically allocated nodes. Each time an item is inserted, list gets a block of memory from the memory manager, constructs a new dynamic list node variable containing the inserted item in this memory, and links it into the list. Whenever an item is removed, list destroys the removed dynamic node variable and returns the variable's storage to the memory manager. Queues based on list and forward_list are slow because every insertion and deletion calls into the memory manager.
C++ also provides the sequence container std::vector
, touted by such notable experts as Bjarne Stroustrup as the most efficient C++ container class. Items in a vector are stored in consecutive memory locations, improving cache locality when a vector is traversed or an item is inserted. Like std::deque
, any item in a vector may be visited by index in constant time. Vectors may be efficiently sorted, and sorted vectors may be efficiently searched. Although a vector must be reallocated as it grows, this happens in amortized constant time. Sadly, vector is not suitable for implementing queues because the time cost of inserting or removing items is only constant at the back. It is O(n) at the front; far too slow for efficiently implementing queues of many items, though there are partial workarounds.
There is another data structure from which queues may be implemented. It's called the ring, or circular buffer. There is no circular buffer in C++17, though a proposal is being actively developed within subcommittee sg14 of the C++ standard committee. Right now there is an experimental version in boost. The circular buffer has performance competitive with std::vector
, and much better than std::deque
or std::list
.
Circular Buffer C++ Container Class
A sort of spec sheet
for a circular buffer container class like boost::circular_buffer
might look like this:
-
Insert or remove: from front or back O(1), from elsewhere O(n)
-
Index time: by position, O(1)
-
Sort: O(n log2 n).
-
Search: O(log2 n) if sorted, else O(n)
-
Iterators and references to an item are invalidated when the item is removed. Iterators and references to the item at one end of the circular buffer are invalidated when an item is inserted on the other end of the circular buffer, and the circular buffer is already full. All iterators and references are invalidated when the capacity is increased.
-
Iterators are random-access iterators.
-
Iterators produce items front-to-back or back-to-front.
-
Fixed-capacity data structure
The circular buffer class interface looks a lot like std::deque
. It is a sequence container, retaining the order in which items are added to or removed from the circular buffer. It has a front and a back, like deque and std::list
. Items may be inserted or removed from either end in constant time. Like deque or std::vector
, items may be referenced by position, using the subscript notation as in q[i]
, in constant time, in addition to iterating from either end to the other. Like vector, list, or deque, the circular buffer has a size, the number of items it currently contains. Like vector, its capacity is the maximum number of items it can contain without calling into the memory manager to increase its capacity.
The circular buffer behaves differently from std::deque
, std::list
, or std::vector
when its capacity is reached. When a new item is inserted on one end of a circular buffer, and size = capacity, the circular buffer first removes the item at its other end, then inserts the new item, so that the circular buffer's size and capacity are unchanged. In contrast, list, deque, and vector always increase their size when an element is inserted, making an expensive call into the memory manager to allocate storage to increase the container's capacity if needed.
This difference in behavior explains the performance advantage of the circular buffer. Once initially constructed, the circular buffer never reallocates its storage or copies elements. When an item is inserted at either end, the new element is constructed in storage already owned by the circular buffer, and some pointers are updated. When an item is removed from either end, the item is destroyed, and a couple of pointers are updated. The circular buffer provided by boost does permit the user to explicitly command a change in its capacity, which can result in calls to the memory manager, copying items, and invalidation of references and iterators.
boost::circular_buffer
's implementation resembles that of std::vector
, consisting of a dynamic array of elements. But unlike vector, the front of the circular buffer is not the same as the base of its dynamic array. The front and back can point to any position in the array.
The circular buffer earns its name from the behavior of its front and back pointers, and of valid iterators. When the front or back pointer, or a valid iterator into the circular buffer is incremented, if it already points to the last entry of the dynamic array, it is reset to point to the base of the dynamic array, as if the end of the dynamic array were adjacent to the base. When the front or back pointer, or a valid iterator is decremented, if it points to the base of the dynamic array, it is reset to point to the last item in the dynamic array.
It is possible to visualize the circular buffer's implementation as a ring of storage locations with a front and back pointer into any locations in the ring, with the items currently in the circular buffer extending clockwise from the front to the back. The items in the circular buffer are numbered from the front to the back as indices 0, 1, 2, ... n-1, for a circular buffer of size n. If size ? capacity, there are unused storage locations.
The iterators into a circular buffer are random-access iterators, like those of std::deque
and std::vector
. This means that any valid item in the buffer can be accessed in constant time by index, and that the distance (that is, number of items) between two iterators can be computed in constant time. This property is sufficient to permit a circular buffer to be sorted in O(n log2 n) time, and to permit a sorted circular buffer to be binary-searched in O(log2 n) time. While all items between two iterators in a vector are in contiguous storage locations, the items between two iterators into a circular buffer are only almost contiguous, with at most one gap. This occurs when the end of the circular buffer has incremented past the end of the circular buffer's internal dynamic array, and been reset to the base of the array.
Inserting and Removing in boost::circular_buffer
The time-complexity of inserting items into a container depends on whether or not space must be made in the container before the item is inserted. Thus, for instance, inserting an item on the end of a std::vector
can be performed in O(1), that is, constant time. However, to insert an item in the middle requires all subsequent entries to be copied to make space for the item to be inserted, and is thus O(n).
Assignment to boost::circular_buffer
There are several ways that data may be inserted into a container. The simplest method, and usually the fastest, is to assign one instance of the container to another. The simple code fragment below illustrates assignment.
size_t size = ...
boost::circular_buffer<kvstruct> random_container(size),
test_container(size);
...
test_container = random_container;
An experiment that repeatedly assigned a 100,000-entry circular buffer to another circular buffer took 1,525 milliseconds. This time was divided into 1,383 milliseconds to repeatedly perform the assignment, plus 151 milliseconds to repeatedly clear the assigned-to circular buffer prior to performing the next iteration.
The following table summarizes the performance of the circular buffer (labeled as "ring"), versus std::vector
, std::deque
, and std::list
on assignment.
| ring | vector | deque | list |
assign | 1534 | 1162 | 10237 | 9389 |
assign part | 1383 | 1050 | 7312 | 6724 |
delete part | 151 | 112 | 2925 | 2665 |
On the assignment test, boost::circular_buffer
is 40% slower than std::vector
, but as shown next, these times are close. circular_buffer
is 5.3 times as fast as std::deque
, and 4.9 times as fast as std::list
. It is instructive to look at the deletion times. circular_buffer
is 19.4 times as fast at deleting its 100,000 entries as is deque, 17.7 times as fast as list. This implies that deque and list are freeing many more blocks of memory than circular_buffer
.
Inserting a Range of Entries into boost::circular_buffer
All sequence containers define an overload of the insert()
member function that copies data from a sequence delimited by two iterators to a position in the container given by an iterator. The code for inserting onto the back of a circular buffer looks like this:
std::vector<kvstruct> random_vector;
...
auto size = random_vector.size();
boost::circular_buffer<kvstruct> test_container(size);
...
test_container.insert(test_container.end(), random_vector.begin(), random_vector.end());
I performed an experiment that repeatedly inserted a 100,000-entry vector onto the back of an empty boost::circular_buffer
. The test ran in 1,155 milliseconds. If the previously measured time to clear the circular buffer is subtracted, the time spent inserting onto the back was 1,004 milliseconds.
The following table summarizes the performance of the circular buffer (labeled as "ring"), versus std::vector
, std::deque
, and std::list
for inserting onto the end of the container.
| ring | vector | deque | list |
.insert(end()) from vector iterator pair | 1004 | 1074 | 7234 | 6493 |
same, but with reserve() | | 1032 | | |
boost::circular_buffer
was seven percent faster than std::vector
on this test, but this is on the very edge of significance given the test setup. I would say the two had very competitive times. circular_buffer
is 7 times as fast as std::deque
, and 6.5 times as fast as std::list
.
Inserting a range of items onto the end of a boost::circular_buffer
is O(n), because there is no need to move other items to make room for the inserted item. The difference in running times for the containers can be neatly ascribed to allocating dynamic variables. circular_buffer
never allocates. std::vector
allocates a multiple (typically between 1.5 and 2) of the current length when it needs to grow. std::deque
and std::list
allocate a number of nodes proportional to the total number of items inserted.
Inserting at any other position may be more expensive. The containers boost::circular_buffer
, std::vector
, and std::deque
are built out of dynamic arrays. When an item is inserted in the middle of such an array-based container, all items in the tail of the array must be copied (or moved) to new locations to make room for the item being inserted. If inserting must be done one item at a time, the cost of inserting a range is O(n2), which is a big problem for big containers.
There is an optimization that can make inserting a range of items in the middle of an array-based container O(n). If the iterators for the range are random-access, as the iterators for the vector used in my test are, the distance between these iterators gives the number of items to be inserted in constant time. This value can be used to make room for all the items in a single operation. If the iterators are forward- or bidirectional-iterators, the distance can be computed in O(n) time by counting. This may still be more efficient than inserting one item at a time. If the iterators are input-iterators, no optimization is possible, and the cost remains O(n2).
The cost of inserting a range at the front of an std::deque
is O(n), because individual items can be pushed onto a deque in constant time. This optimization is possible for boost::circular_buffer
, but is not implemented in boost version 1.62.
Inserting into an std::list
is O(n) everywhere, because each node occupies its own dynamic variable and thus doesn't have to be copied. This is one of the few bright spots of list's performance.
Inserting Single Items into boost::circular_buffer
Another overload of insert()
inserts a single item into a container. This overload of insert()
is O(1) when inserting on the back, for every sequence container, including boost::circular_buffer
, and has the same behavior as push_back()
.
Inserting at any other position is more efficient for some containers than for others. The insert()
member function for std::vector
must copy all items after the insertion point to make room for the new item being inserted, making this function O(n) for inserting an item at the front. Inserting at the front of std::deque
is O(1), but inserting in the middle is O(n). Inserting anywhere in an std::list
is O(1), but finding the location at which to insert is often O(n). Inserting in boost::circular_buffer
is defined to move all items after the insertion point to make room, so it is O(n). However, it would be possible to improve this performance by special-casing insertion at the front.
The code for inserting items on the back of a boost::circular_buffer
looks like this. The code is similar to the code above for inserting on the back, and is not reproduced here.
std::vector<kvstruct> random_vector;
...
auto size = random_vector.size();
boost::circular_buffer<kvstruct> test_container(size);
...
for (auto it=random_vector.begin(); it!=random_vector.end(); ++it)
test_container.insert(test_container.end(), *it);
A test that repeatedly inserted 100,000 items from a vector iterator at the back of a boost::circular_buffer
took 1,826 milliseconds. Previous experience and a suspicious nature led me to also test inserting from a subscripted vector. While these two tests had significantly different performance for some other containers, inserting into a circular buffer from a subscripted vector took 1,778 milliseconds, a difference of less than three percent.
I also performed a test to insert items onto the front of the four sequence containers.
The following table summarizes the performance of the circular buffer (labeled as "ring"), versus std::vector
, std::deque
, and std::list
when inserting at the front and back. I subtracted out the cost of deleting the data structures at the end of each loop iteration.
| ring | vector | deque | list |
.insert(end()) from vector[] | 1842 | 4662 | 11823 | 6566 |
.insert(end()) from vector iterator | 1523 | 4647 | 10945 | 6696 |
same, but with reserve() | | 1614 | | |
.insert(begin()) from vector[] | 8562849 | 4025888 | 11764 | 6890 |
.insert(begin()) from vector iterator | 9147849 | 4084888 | 13195 | 6745 |
std::vector
did poorly. It is thousands of times as expensive to insert 100,000 items one-by-one on the front of a vector as on the back. std::deque
did much better. Items can be inserted on the front of a deque in constant time. This is due to the array-of-arrays implementation of std::deque
. std::list
, which is node based and supports inserts at any position in constant time, did well in this test too. It was a little surprising that boost::circular_buffer
did as poorly as std::vector
, even though the push_front()
member function can push onto the front in constant time. Clearly deque implements an optimization that boost::circular_buffer
version 1.62 does not.
push_front() and push_back()
All of the sequence containers, including boost::circular_buffer
, provide a member function push_back()
that efficiently (that is, in constant time) puts items onto the end of the container. When it can be provided efficiently, push_front()
puts items onto the front. boost::circular_buffer
can push_back()
and push_front()
in constant time. The code to test circular buffer's push_back()
is reproduced below. The code to push onto the front of a circular buffer is similar, and is not reproduced here.
std::vector<kvstruct> random_vector;
...
auto size = random_vector.size();
boost::circular_buffer<kvstruct> test_container(size);
...
for (auto it=random_vector.begin(); it!=random_vector.end(); ++it)
test_container.push_back(*it);
A test of this code repeatedly pushed a 100,000-entry vector onto the end of a circular buffer in 1,225 milliseconds. For push_front()
the corresponding time was 1,286 milliseconds.
The following table summarizes the performance of the circular buffer (labeled as "ring"), versus std::vector
, std::deque
, and std::list
on push_back()
and push_front()
.
| ring | vector | deque | list |
.push_back() from vector[] | 1218 | 4539 | 6711 | 6759 |
.push_back() from vector iterator | 997 | 4724 | 6814 | 6635 |
same, but with reserve() | | 1412 | | |
.push_front() from vector[] | 1265 | | 6588 | 6878 |
.push_front() from vector iterator | 1212 | | 6738 | 6641 |
Experience has taught me not to make assumptions when doing optimization, so I tried two versions of the push_back()
test: one using a vector iterator, and one using a subscripted vector. For boost::circular_buffer
, the loop using a subscripted vector took 22% longer. By contrast, in the push_back()
test for std::vector
, the loop using the vector iterator took 4% longer.
boost::circular_buffer
completed the push_back()
test between 3.7 and 4.7 times as fast as std::vector
, due almost certainly to circular_buffer not having to allocate memory to increase capacity. A program can improve insertion performance of vector by reserving the capacity of the vector in advance, so its dynamic array is allocated only once. In fairness, this is what the size argument to circular_buffer's constructor does. When I tested this optimization, circular_buffer
was about 50% faster than vector. circular_buffer
is between 5.5 and 6.5 times as fast as std::deque
and std::list
on pushing items onto the back or front. Only vector's performance was comparable, and only if the vector could be pre-allocated.
std::vector
does not have a push_front()
member function, because pushing onto the front of a vector is not efficient. Inserting at the beginning, as shown in the program fragment below, produces the same result as push_front()
, but is much slower.
std::vector<kvstruct> random_vector, test_container;;
...
for (auto it=random_vector.begin(); it!=random_vector.end(); ++it)
test_container.insert(test_container.begin(), *it);
Unfortunately, this operation must copy the entire vector on each insertion to make space for the inserted item. it is O(n) for each pushed item, making the full test O(n2). Inserting 100,000 items at the front of a vector was thousands of times slower than inserting them on the back.
A developer who has heard Bjarne Stroustrup's advice to prefer std::vector
for its superior performance may seek to use vector in spite of this weakness. If many items can be removed from the front at once, there is a range erase()
member function, which is O(n). If many items can be inserted on the front at once, vector's range insert()
member function is O(n). Of course, this only solves half the problem. The other half of the problem is that pushing items on the front does not produce the same result as inserting a range. If a program inserted the range [1, 2, 3, 4, 5] onto an empty vector, the contents of the vector would be [1, 2, 3, 4, 5]. If the items 1, 2, 3, 4, and 5 were pushed onto the front one-at-a-time, the resulting vector would be [5, 4, 3, 2, 1]. This problem may be solved using reverse iterators for the range to be inserted. The resulting program fragment looks like this:
std::vector<kvstruct> random_vector, test_container;;
...
test_container.insert(test_container.begin(),
random_vector.rbegin(),
random_vector.rend());
This code fragment ran 14% slower than the push_front()
test for boost::circular_buffer
. This is a best-case result, that is probably not achievable in real world applications. It might thus be possible to force-fit std::vector
as a queue, but why? circular_buffer
is still faster, more flexible, and easier to use.
Iterating in boost::circular_buffer
A program may iterate through the elements of a circular buffer in two ways; by indexing it as if it was a primitive array, or by incrementing an iterator through each element. Intuition suggests these methods should have similar performance, but experience caused me to test both.
boost::circular_buffer<kvstruct> test_container(size);
...
unsigned sum = 0;
for (auto it=test_container.begin(); it!=test_container.end(); ++it)
sum += it->value;
boost::circular_buffer<kvstruct> test_container(size);
...
unsigned sum = 0;
for (unsigned i = 0; i < nelts; ++i)
sum += test_container[i].value;
The subscripting version of this loop, run repeatedly to make the total time measurable, took 209 milliseconds. The iterator version took 154 milliseconds. The iterator-based loop is about 36% faster in Visual Studio 2017.
The following table summarizes the performance of the circular buffer (labeled as "ring"), versus std::vector
, std::deque
, and std::list
on the iteration test.
| ring | vector | deque | list |
traverse container (subscript) | 226 | 215 | 1167 | |
traverse container (iterator) | 154 | 158 | 554 | 388 |
Iteration is efficient in all sequence containers. Still, boost::circular_buffer
is competitive with std::vector
, 3.6 times as fast as std::deque
, and 2.5 times faster than std::list
.
Sorting boost::circular_buffer
The C++ standard library sorting algorithms std::sort()
and std::stable_sort()
can sort containers in O(n log2 n) time if their iterator arguments are random-access iterators. Iterators to boost::circular_buffer
are random-access iterators. Both algorithms run somewhat faster on data that is already sorted. Sorting is accomplished by this program fragment:
size_t nelts = ...;
boost::circular_buffer<kvstruct> random_container(nelts),
sorted_container(nelts);
...
sorted_container = random_container;
std::sort(sorted_container.begin(), sorted_container.end());
The following table summarizes the sorting performance of boost::circular_buffer
(labeled as "ring"), versus std::vector
, std::deque
, and std::list
.
| ring | vector | deque | list |
sort() container | 3337.9 | 2335.8 | 2968.5 | 3333.5 |
sort() sorted container | 954.9 | 420.8 | 460.5 | 1301.5 |
stable_sort() container | 2630.9 | 1981.8 | 2524.5 | |
stable_sort() sorted container | 1231.9 | 978.8 | 970.5 | |
The iterators of std::list
are only bidirectional iterators. std::sort()
takes O(n2) time in this case. Fortunately list has a sort()
member function that is O(n log2 n).
Searching in boost::circular_buffer
The C++ standard library contains algorithms that can search a sorted container in O(log2 n) time, if it supports random-access iterators, as all of the sequence containers do. The following program fragment looks for every key from random_vector
in sorted_container
:
std::vector<kvstruct> random_vector;
...
auto size = random_vector.size();
boost::circular_buffer<kvstruct> sorted_container(size);
...
for (auto it=random_vector.begin(); it!=random_vector.end(); ++it) {
kp = std::lower_bound(
sorted_container.begin(),
sorted_container.end(),
*it);
if (kp != sorted_container.end() && *it < *kp)
kp = sorted_container.end();
}
I performed an experiment that repeatedly looked up all 100,000 keys in the sorted boost::circular_buffer
. It ran in 3,264 milliseconds.
The following table summarizes the performance of the circular buffer (labeled as "ring"), versus std::vector
and std::deque
. There is no fast algorithm for searching a std::list
.
| ring | vector | deque | list |
search container | 3580 | 2452 | 3729 | |
boost::circular_buffer as a Queue
My tests show that std::vector
is significantly more efficient that std::deque
or std::list
on insertion and traversal. However, as noted previously, std::vector
is not well suited for use as a queue, because insertion and removal is only O(1) at the back. Inserting or removing at the front of a vector is thousands of times more expensive due to the cost of copying every item into new storage locations.
I performed an additional test of boost::circular_buffer
, std::deque
, and std::list
, to see if circular buffer's performance advantages over deque and list held up when it was used as a queue. For the test, I created a circular buffer of 20,000 entries. I pre-filled it by pushing 10,000 items onto the back, then began to pop one item off the front for every item I pushed, maintaining a queue of 10,000 items,. This required the circular buffer to cycle around multiple times, exercising all aspects of its behavior. When all 100,000 items had been pushed, I popped the remaining items off the queue. I repeated the same test with deque and list, except that there is no mechanism for reserving capacity of a list or deque. The following code fragment implements this test for circular_buffer
.
std::vector<kvstruct> random_vector;
boost::circular_buffer<kvstruct> test_container(20000);
...
kvstruct popped(0);
auto nelts = random_vector.size();
auto it = random_vector.begin();
for (unsigned i = 0; i < nelts / 10; ++i) {
test_container.push_back(*it++);
}
while (it != random_vector.end()) {
popped = test_container.front();
test_container.pop_front();
test_container.push_back(*it++);
}
while (!test_container.empty()) {
popped = test_container.front();
test_container.pop_front();
}
The following table summarizes the performance of the circular buffer (labeled as "ring") as a queue, versus std::deque
and std::queue
. std::vector
is not suitable for use as a queue.
| ring | vector | deque | list |
push /pop as queue | 1155 | | 1848 | 8463 |
The results were slightly surprising, but still in line with previous experiments. The circular buffer test took 1,155 milliseconds. The test for std::list
took 8,463 milliseconds, or 7.3 times as long as boost::circular buffer
, in line with previous tests.
The test for std::deque
took 1,848 milliseconds, approximately 60% longer than boost::circular buffer
. This was a little surprising given the relative performance of circular buffer versus deque on other insertion/deletion tasks.
Analysis and Conclusion
Performance of boost::circular_buffer
on the standard tests in my book is competitive with std::vector
, and at least five times as fast as std::deque
or std::list
on the same tests. This is not surprising given the similarity between the implementation of circular_buffer and vector. I also performed tests to assess the performance of list, deque, and circular_buffer
when used as a queue. As expected, the circular buffer outperformed list and deque.
The circular buffer is an example of how relaxing constraints in a design can lead to improved performance. std::vector
has a very strict constraint, that the program can obtain, in constant time, a pointer to an array storing the items in order. This constraint is stronger than what is needed for constant-time push and pop from both ends, constant-time subscripting, constant-time distance computation, or efficient searching and sorting.
The design of boost::cicrular_buffer
relaxes this constraint only a little bit, making the cost of producing the primitive array O(n), as the buffer entries may have to be moved to produce them in contiguous storage locations. Relaxing this constraint allows the circular buffer to provide efficient insertion and deletion at both ends, while retaining efficient searching, sorting, and indexing. It may be more expensive to get a naked pointer to the storage and use this pointer like a primitive array, but this feature is not always (perhaps not even frequently) needed. Relaxing the constraint of automatic growth of the container, versus vector, is a straightforward trade-off between flexibility and performance. The result is that circular buffer can do most of what deque can do, but with the efficient performance of vector, and most of what vector can do, with only a small loss of functionality and performance.
History
- 27th April, 2017: Original version
- 12th May, 2017: Small formatting tweaks