This article discusses double-ended vector, which allows to push front as well as push back, in contrast to std::vector, where only the push back is available. It looks at Boost devector. It shows an area where the implementation can be improved (insertion inside the vector), provides an alternative implementation, called DeVector. In this article, the performance of both implementations is compared with std::deque, which demonstrates that double-ended vector is usually faster than std::deque. The source code is provided that can be compiled in any C++17 compiler.
Code: DeVectorTests.zip
1. Introduction
Double-ended vector (devector) has been discussed a few times. Some implementations are done by Lars Greger Nordland Hagen [1], Valasiadis Fotios [2] and there is also a Boost implementation [3].
The double-ended vector idea is rather simple: if in a vector the data starts with the beginning of the pre-allocated space, in a double-ended vector the data is positioned relatively in the middle, which means that for a while new items can be pushed at both ends without re-allocating space or moving the elements.
In Figure 1, the std::vector
memory reservation is shown.
Figure 1. std::vector memory reservation. The data space always starts with the start of the reserved memory, the size of which is called capacity. As the data grows, the capacity should increase.
If the data needs more space than the capacity, the new space with increased capacity should be allocated. The newly allocated space is usually greater than required (by a factor between 1.5 or 2.0). That allows not to allocate memory too often, as the data increases. The issue with the vector is that it is easy to append data to the end of a vector (push_back
operation), but not to the beginning.
If data is positioned roughly in the middle of the reserved space, so that it can grow in both directions, we end up with double-ended vector or devector
, which is shown in Figure 2.
Figure 2. Devector memory reservation. The data can grow in both directions, so that push_front as well as push_back can be easily performed.
The headroom is used for push_front
and the tail capacity for push_back
; if there is not enough room for either of them, usually a new memory slot can be reserved, which can be, for instance, three times the size of the required data, which will be positioned in the middle of the reserved slot.
2. Special Consideration: Balanced Insertion
One issue that should be considered is insertion. When an element is inserted into a vector
, there is only one option -- to move the rest of the elements forward, by increasing their addresses in memory. In devector
,
there are two options:
- to move the rest of elements forward;
- to move the elements in front backwards.
Which option should be chosen? it depends on the position of the element. If it is closer to the end of the data, then option 1 (forward). If it is closer to the beginning, then option 2 (backwards). If it is in the middle, either option will do. In Figure 3, the basic idea of balanced insertion is illustrated.
Figure 3. Balanced Insertion into a devector.
Is this important? Yes, it is. Imagine if a devector is build using random insertion of elements. By using balanced insertion, we will cut the time by half! If a devector
used to create a flat map
, which contains "key-value" pairs as elements and binary search is used to find the right element. By using balanced devector
, the insertion time into a flat map
can be cut by half in comparison with time used by vector
.
Unfortunately, in the Boost implementation (Boost devector
), the insertion is not balanced and the benchmarks show it. I have included my own implementation, which I call DeVector
, where both insertion and deletion are balanced.
3. Benchmarks
3.1. General Observation
At first, I tested devector in Visual C++ 2022 and saw major advantage in using devector
. Then I tested in GCC 11.2 with C++17 and the results were not so pronounced. I am presenting results for both compilers.
3.2. System Used
The code was executed on a PC with Intel i7-4790, 3.6GHz processor, 16GB RAM.
When compiled in Visual C++ 2022, the following options were used:
In GCC 11.2, the following command line was used:
g++ -O3 -std=c++17 -I"/boost_1_80_0" -m64 DeVectorTests.cpp -o DeVectorTests.exe
3.3. Random Insertion
3.3.1. Results for Visual C++ 2022
In Figure 4a, the random insertion results are shown.
Figure 4a. Visual C++ 2022. Random insert of double floating-point values (microseconds per element). The data is in logarithmic scale.
It shows that random insertion into a vector
or Boost devector
is about twice as slow as into a DeVector
, which uses balanced insertion. At the same time, insertion into a deque
is on average 7 times slower than into a DeVector
, which increases to 14 times for 5 million elements. What it means in practice for creating a structure of 5 million elements:
- for
std::deque
, it takes 8 hours; - for
std::vector
and Boost devector
, it takes 1 hour and 5 minutes; - for
DeVector
, it takes 34 minutes.
For 100,000 elements:
- for
std::deque
, it takes 3 seconds; - for
std::vector
it takes about 0.7s; - and
Boost devector
, it takes about 0.85s; - for
DeVector
, it takes 0.5s.
For random insertion, the advantage double-ended vector over std::deque
is obvious.
3.3.2. Results for GCC 11.2 C++
In Figure 4b, the random insertion results are shown.
Figure 4b. GCC 11.2 C++. Random insert of double floating-point values (microseconds per element).
There is some diffrence, but it is not as dramatic as in Visual C++: for the number of elements 5 million and more, deque
takes over twice as much time as DeVector
.
3.4. Random Access
3.4.1. Results for Visual C++ 2022
The random access results are shown in Figure 5a.
Figure 5a. Visual C++ 2022. Random access of double float-point values (nanoseconds per element).
You can see that there is no much difference in the results of up to 100,000 elements, but than the access to the deque elements slows down in contrast to all other structures.
3.4.2. Results for GCC 11.2 C++
The random access results are shown in Figure 5b.
Figure 5b. GCC 11.2 C++. Random access of double float-point values (nanoseconds per element).
There is not much diffrence. In practice, it takes 38 seconds to access 109 elements in a deque
, and it takes 32 seconds to access the same number of elements in a devector
.
3.5. Insertion at the Beginning
3.5.1. Results for Visual C++ 2022
In all the structures, except for vector
, this opertation is called push_front
.
As for vector
, insert
with index 0 is used. The results are shown in Figure 6a.
Figure 6a. Visual C++ 2022. Insertion at the beginning (nanoseconds/element).
For deque
, it takes about 27 nanosecond per element; for double-ended vector about 10-12 ns/element; for vector
, it is very inefficient. As a result, it takes over 2.5 hours to insert 5 million elements at the front of a vector, as opposed to a fraction of a second for all the other structures.
3.5.2. Results for GCC 11.2 C++
The results for push_front
are shown in Figure 6b.
Figure 6b. GCC 11.2 C++. Results for push_front (nanoseconds per element).
There is no major difference. But you can see that deque
is faster. But still we are taking about fractions of a second: for pushing to the front 10 million elements it takes 0.05s for a deque
and about 0.097s for a devector
. One billion elements take 4.4 seconds for deque
and about 7.7 for both devectors.
3.6. The push_back Operation
3.6.1. Results for Visual C++ 2022
There are nothing special about push_back
: it is a very quick operation and even for one million elements takes a fraction of a second. The results are shown in Figure 7a.
Figure 7a. push_back (nanoseconds/element).
The performance of deque
is about 2.5 times slower than that of devectors
.
3.6.2. GCC 11.2 C++
The results are shown in Figure 7b.
Figure 7b. The push_back operation (nanoseconds/element).
The results deque
is slightly faster. In general the results are similar to push_front
.
4. Conclusion
The results clearly demonstrate that deque
is slower than double-ended vector, particularly in random construction with one-million elements or more. In random access, deque
is slower as well. In Visual C++ deque
is particularly inefficient. I was suprised to see how the difference
between Visual C++ and GCC implementation: deque
implementation is much better in the latter, where push operations are rather efficient.
Perhaps, C++ libraries should provide devector
as one of the standard structures, like std::deque
, std::vector
and std::list
. The only advantage of using deque
instead of devector
may be systems with small memory space, where it is difficult to allocate a lump memory as fragmented allocation could give some advantage.
In practice, for storing 10000 floating-point elements, deque
is good enough. With more elements, special attention should be paid to the performance and devector
should be preferred, where random insertion and access are more efficient.
References
1. http://larshagencpp.github.io/blog/2016/05/22/devector
2. https://github.com/fvalasiad/devector
3. https://www.boost.org/