Contents
1. Introduction
In this article, we will first discuss the efficiency of the most popular implementations of the search_n
STL algorithm. Afterwards, I will demonstrate a new search_n
specialization for random access iterators, which outruns by far the
most commonly used implementations. This is, in fact, a substantially
improved search_n
algorithm, which has significantly lower time complexity.
My intention is to move straight to the point, without lengthy
introductions about the C++ templates or the STL itself. In order to
understand this article, you need to have a good understanding of the
C++ templates, the standard template library (STL) and, of course, the
C++ language. Finally, I have to admit that search_n
is
not the most notable algorithm in STL, and improving it is not going to
make a big difference to the majority of the C++ developers. On the
other hand, speeding-up an algorithm 2 - 8 times can be sometimes
useful...
2. The algorithm usage and behavior
The search_n
algorithm is already sufficiently documented. [2,4,6]
Hence, in order to describe the usage and behavior of the algorithm, I
have borrowed the following three paragraphs (2.1, 2.2 & 2.3) from
the excellent SGI STL documentation. [1,2].
2.1 Prototype
Search_n
is an overloaded name; there are actually two search_n
functions.
template <class ForwardIterator, class Integer, class T>
ForwardIterator search_n(ForwardIterator first, ForwardIterator last,
Integer count, const T& value);
template <class ForwardIterator, class Integer,
class T, class BinaryPredicate>
ForwardIterator search_n(ForwardIterator first, ForwardIterator last,
Integer count, const T& value,
BinaryPredicate binary_pred);
2.2 Description
search_n
searches for a subsequence of count
consecutive elements in the range [first, last)
, all of which are equal to value
. It returns an iterator pointing to the beginning of that subsequence, or else last
if no such subsequence exists. The two versions of search_n
differ in how they determine whether two elements are the same: the first uses operator==
, and the second uses the user-supplied function object binary_pred
.
The first version of search_n
returns the first iterator i
in the range [first, last-count)
such that, for every iterator j
in the range [i, i+count)
, *j == value
. The second version returns the first iterator i
in the range [first, last-count)
such that, for every iterator j
in the range [i, i+count)
, binary_pred(*j, value)
is true
.
2.3 Complexity
Linear. search_n
performs at most last-first
comparisons. (The C++ standard permits the complexity to be O(n(last-first))
, but this is unnecessarily lax. There is no reason for search_n
to examine any element more than once.)
3. The usual implementations
All the search_n
implementations that I know are quite
similar to each other and, in all known cases, the same algorithm is
being used for both forward and random access iterators. The following flow chart 1 illustrates in some detail such a typical search_n
implementation. Among all these search_n
versions, we can easily distinguish the VC++ and SGI implementations as the dominant ones. We are going to discuss both these versions in the following subsections [s3.1.] and [s3.2.] respectively. For simplicity, we are discussing only the first variant of search_n
, which is testing equality by using the operator==
, instead of the user-supplied function object binary_pred
. [s2] Apart from the use of the binary_pred
,
both the design and behavior of the second variant are identical to the
design and behavior of its counterpart. Hence, everything that we
discuss about the first variant applies to the second as well.
3.1. The VC++ implementation
This search_n
implementation is part of the STL that is
accompanying the popular Microsoft Visual C++ 6.x-7.0 compilers.
Moreover, it is also very similar to the implementation that the STL of
Metrowerks CodeWarrior 8.x-9.x provides (although, Metrowerks version
has some advantages discussed in the point of interest 4 below). In
other words, this is one of the most frequently used search_n
implementations.
The source code
The source code of the VC++ search_n
implementation is not presented here, in order to avoid a potential
copyright violation. Nevertheless, anyone who owns a Visual C++ 6.x-7.0
compiler can very easily find this piece of code inside the algorithm
header file.
Points of interest
- According to the VC++
search_n
documentation in the MSDN site, [4] its time complexity is: linear with respect to the size of the searched (see also performance tests B1, B2 & C in section 5 [s5]).
- The VC++
search_n
first calculates the breadth of the search range [_First1, _Last1)
and it is henceforth using this breadth value, in order to form the
logical expressions that control the loops. The breadth calculation has
obviously constant time complexity when _FwdIt1
is a
random access iterator but linear time complexity otherwise. In other
words, we will experience significant overhead when using this search_n
implementation with lists and other sequences that don't provide random access iterators (see also performance test E in section 5 [s5]).
- The VC++
search_n
frequently examines many of the search range elements more than once. Like mentioned above, search_n
is searching for a subsequence of _Count
consecutive elements in the range [_First1, _Last1)
, all of which are equal to _Val
. In case that the VC++ version meets a subsequence of only M (M < _Count)
consecutive elements equal to _Val
,
it will decide that this particular subsequence is not long enough and
it will continue its search, starting from the second element of the
current subsequence (instead of the element that is next to the last
mismatch). Then it will immediately meet M-1
consecutive elements equal to _Val
, which are in fact the M-1
last elements of the initial subsequence of M
elements. Afterwards, it will also revisit the last M-2
elements of the initial subsequence and then again the last M-3
elements, and so on.
Namely, when meeting M (M<_Count)
consecutive elements equal to _Val
, VC++ search_n
will eventually make M(M+1)/2
redundant comparisons! Hence, the VC++ search_n
implementation is rather inefficient in cases when it's probable that many elements in the search range [_First1, _Last1)
are equal to _Val
(see also performance test D in section 5 [s5]).
- The Metrowerks version of
search_n
is very similar to its VC++
counterpart, but it has the significant advantage to never examine any
element of its search range more than once. Hence the above point of
interest 2 does apply to the Metrowerks implementation as well, whereas
the point of interest 3 doesn't.
3.2. The SGI implementation
The following search_n
implementation is part of the excellent SGI STL (version 3.3). [1,2]
This implementation is mainly used in all the modern versions of the
GCC compiler (including GCC for Darwin and MinGW). Thus, this is also
one of the most popular search_n
implementations.
The source code
template <class _ForwardIter, class _Integer, class _Tp>
_ForwardIter search_n(_ForwardIter __first, _ForwardIter __last,
_Integer __count, const _Tp& __val) {
__STL_REQUIRES(_ForwardIter, _ForwardIterator);
__STL_REQUIRES(typename iterator_traits<_ForwardIter>::value_type,
_EqualityComparable);
__STL_REQUIRES(_Tp, _EqualityComparable);
if (__count <= 0)
return __first;
else {
__first = find(__first, __last, __val);
while (__first != __last) {
_Integer __n = __count - 1;
_ForwardIter __i = __first;
++__i;
while (__i != __last && __n != 0 && *__i == __val) {
++__i;
--__n;
}
if (__n == 0)
return __first;
else
__first = find(__i, __last, __val);
}
return __last;
}
}
Points of interest
- According to the SGI
search_n
documentation [2], its time complexity is: Linear. search_n
performs at most last-first
comparisons (see also performance tests B1, B2 & C in section 5 [s5]).
- The SGI
search_n
is using the find
algorithm internally, whenever appropriate. Hence, it can utilize potential specializations of the find
algorithm and, therefore, increase its performance significantly. (For instance, if _ForwardIter
is of type (char *)
, then find
is probably using the memchr
function internally.) A very good design choice indeed!
- The iterator advancements inside the SGI
search_n
are significantly restrained, compared to the VC++ implementation. This implementation does not need to calculate the breadth of the search range [__first, __last)
and is much more suitable for use with lists and other sequences that
don't provide random access iterators (see also performance test E in section 5 [s5]).
- The SGI documentation correctly states that there is no reason for
search_n
to examine any element more than once, [2] but unfortunately the SGI implementation of search_n
does not fully comply with this notion. Again, search_n
is searching for a subsequence of __count
consecutive elements in the range [__first, __last)
, all of which are equal to __val
. In case that the SGI version meets a subsequence of only M (M < __count)
consecutive elements equal to __val
,
it will decide that the particular subsequence is not long enough and
it will continue its search, starting from the element of last mismatch
(instead of the element that is next to the last mismatch). Namely, the
__first = find(__i, __last, __val)
expression in the above code always examines the *__i
element, while *__i
could be the element that caused the mismatch (and therefore has been already examined) in the *__i == __val
logical expression of the previous while
loop.
In other words, the SGI search_n
is making one redundant comparison every time it meets a subsequence of elements equal to __val
. Practically, this shouldn't be a big problem in most of the real world situations, but it's for sure a flaw in the SGI implementation of search_n
(see also performance test D in section 5 [s5]).
4. An enhanced implementation for random access iterators
In this section, I am introducing an enhanced search_n
implementation, which can only be used with random access iterators.
The strength of this implementation lies in the way that its main loop
works [flow chart 2: loop 1]. When searching for N
consecutive elements that meet the search_n
criteria, this main loop examines only one of every N
elements of the search range, whereas the VC++ and SGI versions [s3] examine the search range elements one by one [flow chart 1: loop 1]. Of course, when the main loop finds a match, this search_n
version has to backtrack, because the matched element could be in the middle of a subsequence that meets the search_n
criteria. Therefore, it will first move backward [flow chart 2: loop 3] and then again forward [flow chart 2: loop 2]
in order to accurately estimate the start and length of the particular
subsequence. Nevertheless, the backtracking happens rather infrequently
and then again no element is ever examined more than once.
Consequently, it is expected that this search_n
implementation will be always faster than the VC++ and SGI versions. For convenience (and for brevity's sake), I will henceforth use the name "DPX search_n
" when I refer to this particular algorithm.
Below follows a demonstration of the DPX search_n
implementation in both the forms of the flow chart and the source code.
Afterwards, we will also discuss some points of interest regarding this
implementation.
The source code
template <class _RandomAccessIter, class _Integer, class _Tp> inline
_RandomAccessIter search_n(_RandomAccessIter __first,
_RandomAccessIter __last, _Integer __count, const _Tp& __val)
{
if (__count <= 0)
return __first;
if (__count == 1)
return std::find(__first, __last, __val);
typedef std::iterator_traits<_RandomAccessIter>::difference_type iter_diff;
iter_diff __tailSize = __last - __first;
const iter_diff __pattSize = __count;
if (__tailSize >= __pattSize)
{
_RandomAccessIter __backTrack;
iter_diff __remainder, __prevRemainder;
const iter_diff __skipOffset = __pattSize - 1;
_RandomAccessIter __lookAhead = __first + __skipOffset;
for ( ; ; __lookAhead += __pattSize )
{
assert( __tailSize >= __pattSize );
__tailSize -= __pattSize;
for ( ; ; )
{
if (*__lookAhead == __val)
break;
if (__tailSize < __pattSize)
return __last;
__lookAhead += __pattSize;
__tailSize -= __pattSize;
}
assert( __tailSize == (__last - __lookAhead) - 1 );
__remainder = __skipOffset;
for ( __backTrack = __lookAhead - 1; *__backTrack ==
__val; --__backTrack )
{
if (--__remainder == 0)
return (__lookAhead - __skipOffset);
}
for ( ; ; )
{
if (__remainder > __tailSize)
return __last;
__lookAhead += __remainder;
__tailSize -= __remainder;
if (*__lookAhead == __val)
{
__prevRemainder = __remainder;
__backTrack = __lookAhead;
do
{
if (--__remainder == 0)
return (__lookAhead - __skipOffset);
} while ( *--__backTrack == __val);
__remainder += __pattSize - __prevRemainder;
}
else
break;
}
if (__tailSize < __pattSize)
return __last;
}
}
return __last;
}
Points of interest
- DPX
search_n
performs at most last-first
comparisons, hence its worst case complexity is linear, similar to the average complexity of the VC++ and the SGI
implementations. But this worst-case is actually extremely rare to
happen with this algorithm. On the other hand, its average time
complexity will be O((last-first)/count)
(see also performance tests A, B1, B2 & C in section 5 [s5]).
- Apart from the fact that the DPX
search_n
actually examines only few of the elements that meets, I have also done
my best to ensure that no element of the search range is ever examined
more than once (see also performance test D in section 5 [s5]). Consequently, this search_n
version is ideal for use in cases where the comparison of the elements is a rather slow procedure.
- For the above reasons, I believe that this
search_n
implementation outperforms both the VC++ and the SGI
implementations in every possible case. Its only disadvantage is the
fact that it requires random access iterators, thus it cannot be used
with lists and other sequences which only provide forward iterators
(see also performance tests in section 5 [s5] and conclusions in section 6 [s6]).
5. Performance tests
So far we have met three different implementations of the search_n
algorithm, and we have also discussed in theory their expected runtime
performance. In this section, we are going to observe what really
happens when we run these algorithms on a computer. In particular, I
have carried out a series of performance tests, using the code that is
included at the top of this article. In the next subsections, follow a
description of each test and a brief discussion about its results. All
the results of the performance tests have been visualized as graphs at
the end of the section.
Some new symbols, which I am going to use frequently in the next paragraphs, are:
- The symbol
N
, which denotes the number of the consecutive elements that search_n
is searching for.
- The symbol
M
, which denotes the number of the sequence elements that search_n
overtakes during the search.
The objective of this test is to observe the behavior of the DPX search_n
algorithm [s4], when the number of the overtaken elements M
is growing. The test has been repeated three times, for three different values of N
. The searched elements in this test are stored in a vector
, and the probability of meeting a matching element is 1%.
The results of the test A have been visualized in the corresponding graph A. The vertical axis of the graph represents the elapsed time and the horizontal axis represents the overtaken elements M
. This graph shows clearly that the elapsed times grow linearly with respect to M
, but it is also clear that DPX search_n
performs much better as N
is being increased. This is a strong indication that the particular search_n
implementation has time complexity O(M/N)
, whereas the most popular implementations usually have time complexity O(M)
.
The objective of these tests is to observe and compare the behavior of all three search_n
implementations discussed so far, when the number of the overtaken elements M
is growing. The only difference between test B1 and test B2 is the value of N
, which is equal to 4 and 32 respectively. The searched elements in both tests are stored in a vector
and the probability of meeting a matching element is 1%.
The results of the tests B1 and B2 have been visualized in the corresponding graphs B1 and B2. The vertical axes of both these graphs represent the elapsed time and the horizontal axes represent the overtaken elements M
. Both graphs show that, in all three search_n
implementations, the elapsed times grow linearly with respect to M
. The SGI version seems to always perform better than its VC++ counterpart and the DPX version outperforms all its competitors. In the test B2, in which N
value is 32 instead of 4, the performance of the SGI and VC++ versions is identical to their corresponding performance in test B1, whereas the performance of the DPX
version is significantly improved. This is exactly what we would
expect, knowing the time complexities of the three algorithms. (DPX search_n
has time complexity O(M/N)
, whereas the other two versions have time complexity O(M)
).
The objective of this test is to observe and compare the behavior of all three search_n
implementations discussed so far, when the number of the overtaken elements M
is kept constant (one million elements), whereas the value of N
is growing. The searched elements are stored in a vector
and the probability of meeting a matching element is 1%.
The results of the test C have been visualized in the corresponding graph C. The vertical axis of the graph represents the elapsed time and the horizontal axis represents N
. The graph shows that the SGI and VC++ versions of search_n
are not affected when N
is changing, whereas the DPX version performs much better when N
is increased. This is another proof that DPX search_n
has time complexity O(M/N)
.
The objective of this test is to observe and compare the behavior of all three search_n
implementations discussed so far, when M
and N
are kept constant (one million elements and 32 respectively), but the
density of the matching elements is growing. Namely, in this test, we
increase the probability of meeting a matching element. The searched
elements are stored in a vector
.
The results of the test D have been visualized in the corresponding graph D.
The vertical axis of the graph represents the elapsed time and the
horizontal axis represents the density of the matching elements. The
graph shows that the performance of the SGI and VC++ search_n
versions becomes significantly lower when the density of the matching elements becomes higher, whereas the DPX version remains almost unaffected. That is because the SGI and VC++ versions are making many redundant comparisons (worst case in test D: +100%) when the density of the matching elements is high. In the same circumstances, the DPX version examines only few (worst case in test D: 9%) of the elements that meets.
The objective of this test is to observe and compare the behavior of the SGI and VC++ search_n
implementations, when the number of the overtaken elements M
is growing and the searched elements are stored in a list
. Note that DPX search_n
cannot be used in this test, because it requires random access iterators, whereas list
only provides forward iterators. N
is constantly equal to 16 and the probability of meeting a matching element is 1%.
The results of the test E have been visualized in the corresponding graph E. The vertical axis of the graph represents the elapsed time and the horizontal axis represents the overtaken elements M
. When compared to the previous graphs B1 and B2, this graph shows that the performance advantage, which SGI search_n
has over the VC++ version when using random access iterators, gets even bigger in case that forward iterators are being used.
More details
The vertical axes of all graphs represent the elapsed time and you
may notice that these axes does not provide any information regarding
the exact values of the elapsed time. That is intentional, because we
are not interested in the individual elapsed times, but instead we need
to pay attention to the growth of the times when a problem factor is
being changed. Moreover, the actual elapsed times are only meaningful
on the computer where I have run these tests, with the particular
hardware and software configuration that this computer had at that time.
The code used to carry out these tests is included at the top of
this article. It is a small and simple piece of code, fairly documented
with comments inside the source files, which I believe can be easily
used both for verification of my test results, and also for further
experimentation regarding the runtime behavior of the search_n
algorithms.
6. Conclusions
In this article, we have discussed the efficiency of the search_n
STL algorithm. In particular, we have discussed the very popular VC++ and SGI versions [s3] and I have also introduced the new DPX search_n
[s4], which is an efficient specialization of the search_n
algorithm for random access iterators. Furthermore, I have carried out a series of performance tests [s5]
in order to demonstrate what really happens when we run these
algorithms on a computer. The conclusions of all these discussions and
tests can be summarized as follows:
- When it comes to random access iterators, DPX
search_n
is by far the best choice. Its average time complexity is much better (see the point of interest 1 in section 4 [s4]) and the performance tests have verified this fact (see tests A, B1, B2, C in section 5 [s5]).
- DPX
search_n
performs even better when the
number of the consecutive elements that is searching for gets bigger
(see tests A & C in section 5 [s5]) or the probability of meeting a matching element is increased (see test D in section 5 [s5]).
- For forward access iterators, SGI
search_n
performs much better than the VC++ version (see test E in section 5 [s5]).
7. More about this article
7.1. Acknowledgments
Portions of the article's text and code are derived from the SGI/HP implementation and documentation of STL.
Copyright � 1996-1999
Silicon Graphics Computer Systems, Inc.
Permission to use, copy, modify, distribute and sell this
software and its documentation for any purpose is hereby granted
without fee, provided that the above copyright notice appears in all
copies and that both that copyright notice and this permission notice
appear in supporting documentation. Silicon Graphics makes no
representations about the suitability of this software for any purpose.
It is provided "as is" without express or implied warranty.
Copyright � 1994
Hewlett-Packard Company
Permission to use, copy, modify, distribute and sell this
software and its documentation for any purpose is hereby granted
without fee, provided that the above copyright notice appears in all
copies and that both that copyright notice and this permission notice
appear in supporting documentation. Hewlett-Packard Company makes no
representations about the suitability of this software for any purpose.
It is provided "as is" without express or implied warranty.
7.2. Related news
- 2005-09-12
- A
search_n
specialization for random access iterators has been incorporated in the libstdc++ library of the GCC package.
(Changelog,
Release-changes)
- 2005-02-27
- A
search_n
specialization for random access iterators has been incorporated in the STLPort library.
(Changelog,
Release-changes)
7.3. Revision History
- 2007-03-23
- A bug-fix and performance improvements in the
search_n
specialization for random access iterators.
- 2004-04-19
- Potential construction and destruction of objects moved outside the loops (proposed by Simon Hughes).
- 2004-04-12
8. References
- The SGI Standard Template Library.
- The search_n algorithm of the SGI STL.
- The VC++ Standard Template Library.
- The search_n algorithm of the VC++ STL.
- The Rogue Wave Standard Template Library.
- The search and search_n algorithms of the Rogue Wave STL.
- The STLport Standard Template Library.