Contents
This article discusses the efficiency problems of the find_first_of
generic STL algorithm in the most notable STL implementations [1,2,3,4]. The following paragraphs will prove that, in these STL implementations, find_first_of
is virtually a performance pitfall among the STL algorithms and must be avoided in performance-critical applications. Furthermore, this article also presents solutions that can be used by the STL implementers in order to improve the performance of the find_first_of
and alternatives which the C++ developers can utilize instead of using this inefficient generic algorithm.
My intention is to move straight to the point, without lengthy introductions about the C++ templates or the STL itself. In order to understand this article, you need to have a good understanding of the C++ templates, the standard template library (STL) and, of course, the C++ language.
In order to sufficiently describe the usage and behavior of the algorithm, I have borrowed the following three subsections from the excellent SGI STL documentation. [5] Please see the original documentation for more.
find_first_of
is an overloaded name; there are actually two
find_first_of
functions.
template <class InputIter, class ForwardIter>
InputIter find_first_of(InputIter first1, InputIter last1,
ForwardIter first2, ForwardIter last2);
template <class InputIter, class ForwardIter, class BinaryPredicate>
InputIter find_first_of(InputIter first1, InputIter last1,
ForwardIter first2, ForwardIter last2, BinaryPredicate comp);
find_first_of
is similar to find
in that it performs a linear search through a range of Input Iterators. The difference is that while find
searches for one particular value, find_first_of
searches for any of several values. Specifically, find_first_of
searches for the first occurrence in the range [first1, last1)
of any of the elements in [first2, last2)
. Note that this behavior is reminiscent of the function strpbrk
from the standard C library.
The two versions of find_first_of
differ in how they compare elements for equality. The first uses operator==
and the second uses an arbitrary user-supplied function object, comp
. The first version returns the first iterator i
in [first1, last1)
such that for some iterator j
in [first2, last2)
, *i == *j
. The second returns the first iterator i
in [first1, last1)
such that for some iterator j
in [first2, last2)
, comp(*i, *j)
is true. As usual, both versions return last1
if no such iterator i
exists.
At most, there are
(last1 - first1) * (last2 - first2)
comparisons.
The find_first_of
STL algorithm is a successor of the older strpbrk
C library function. The main advantage of the find_first_of
algorithm is its generality and flexibility. It can easily work with a variety of containers and arbitrary types of elements, while strpbrk
only works with null-terminated char
strings. But what if we just need to work with char
strings or with std::string
objects, which can be easily converted to null-terminated char
strings? Should we prefer the newer find_first_of
algorithm or just use the older strpbrk
function?
One of the most remarkable qualities of STL is the fact that it aims both towards generality and efficiency at the same time! More specifically, in 1995 the designer of STL, Alexander Stepanov, affirmed in his article in BYTE magazine [7] that generic STL algorithms should not impose any performance penalty compared to their non-generic versions written in C/C++ code. In the same year, Stepanov said in an interview, "Efficiency is a fundamental concern of mine. It is silly to abstract an algorithm in such a way that when you instantiate it back it becomes inefficient." [8] Consequently, there is virtually no reason for any developer to suspect that the use of the find_first_of
STL algorithm may introduce any performance problems.
Furthermore, find_first_of
is more secure than strpbrk
because it performs boundary checks, reducing the possibility of having buffer overruns. Last but not least, most C++ developers have read and heard repeatedly that it is preferable to use the available STL algorithms, like find_first_of
, instead of the equivalent C library functions like strpbrk
. According to the above facts, the most reasonable choice for C++ developers is to always favor the generic, flexible and safer find_first_of
STL algorithm, even in cases where the older strpbrk
C library function is also applicable.
Fortunately, in an article like this we can afford to spend time in performance tests, even in the least suspicious situations. Therefore we have done a performance comparison between the find_first_of
STL algorithm and the strpbrk
function. [s5] The results of these performance tests in two different configurations [s5] have proved that the find_first_of
algorithm performs miserably when the number of the values we are searching for (M=last2-first2
) starts to increase! Not only is the find_first_of
STL algorithm much slower than the strpbrk
C library function, but it also seems to have time complexity no better than O(N*M)
. In contrast, strpbrk
has a superior O(N)
time complexity (N=last1-first1
).
Having inferior complexity is the worst kind of inefficiency, since it practically means that the performance will become constantly worse when the size of a particular problem factor increases. Unfortunately, the developer who prefers to use the find_first_of
STL algorithm will have to suffer a severe performance penalty, even though he has made a very reasonable choice. Hence, find_first_of
turns out to be a dangerous performance pitfall among the STL algorithms.
It would be unfair not to mention that in some modern compilers the strpbrk
C library function is highly optimized and tuned for maximum CPU exploitation, or even implemented using assembly instructions. However, these advantages in the strpbrk
implementations cannot justify its superior complexity compared to the find_first_of
implementations. The time complexity cannot be affected by the level of code optimization or even by switching between C/C++ and assembly. It is practically hardware independent, as well. Only design faults or unfortunate design decisions in the core algorithm can actually damage the time complexity. Consequently, it would be very interesting to proceed taking a closer look at the most notable find_first_of
implementations in order to determine the source of the performance problems. Afterwards, we will also try to improve the efficiency of this generic algorithm.
The following find_first_of
implementation is part of the SGI STL version 3.3. [1,5] This is a rather typical implementation, very similar to the implementations currently used by the very popular MS-VC++ and GCC-C++ compilers [2,4] and also identical to the STLPort STL implementation. [3]
template<typename InputIter, typename ForwardIter>
InputIter find_first_of(InputIter first1,
InputIter last1, ForwardIter first2, ForwardIter last2)
{
for ( ; first1 != last1; ++first1)
{
for (ForwardIter iter = first2; iter != last2; ++iter)
{
if (*first1 == *iter)
return first1;
}
}
return last1;
}
Apparently this typical find_first_of
implementation consists of two nested for-loops. The outer loop picks one-by-one the elements of the search range [first1, last1)
and executes the inner loop, which is displayed bold in the above code snippet. The inner loop runs for each search range element; its sole task is to discover if the element in question is included in the range [first2, last2)
. In case an inclusion is detected, find_first_of
immediately returns the current element (first1
). Otherwise, it continues until the outer loop is exhausted and finally returns last1
.
Given that the [first2, last2)
range virtually defines the set of elements we are searching for, the inner for-loop of the above find_first_of
implementation actually tests for set-membership. However, loops are very inefficient when used for set-membership testing and this technique is evidently the source of the find_first_of
performance problems. Namely, the inner-loop of the above find_first_of
implementation carries out (last2 - first2)
iterations in the case of a mismatch. The worst case scenario is to have a mismatch for all elements of the search range [first1, last1)
, performing the total of (last1 - first1) * (last2 - first2)
iterations, exactly like the SGI STL documentation [5] mentions. Nevertheless, the performance comparison between find_first_of
and strpbrk
[s5] has already proved that this number of iterations is rather excessive. Under the same circumstances, the complexity of the strpbrk
C library function is no worse than O(last1 - first1)
, regardless of the [last2 - first2)
range size.
Since the source of the find_first_of
performance problems has been identified, it is now very important to search for techniques that can be used by the STL implementers in order to improve the efficiency of this generic algorithm. Interestingly enough, in the case that the input ranges of the find_first_of
contain signed
or unsigned chars
, we can take advantage of the fact that one-byte integers can only have up to 256 distinct values. Namely, a find_first_of
implementation specialized for signed
or unsigned chars
can easily prepare a 256 bytes
array named MapArray
in such a way that, for every one-byte integer ch
it will be MapArray[ch]=1
if the value of ch
is included in the range [first2, last2)
and MapArray[ch]=0
otherwise.
After having this array prepared, the whole inner for-loop of the typical find_first_of
implementation presented above [s4.1] can be effectively replaced by the expression: if (MapArray[*first1] != 0) return first1
. This replacement will reduce the main iterations to the amount of (last1 - first1)
, while adding (last2 - first2)+256
operations because of the MapArray
preparation. Consequently, the complexity of this specialized find_first_of
implementation will be (last1 - first1) + (last2 - first2)
. This outruns by far the most commonly used find_first_of
implementations [1,2,3,4] while being comparable to the time complexity of the strpbrk
C library function.
The entire code of this enhanced find_first_of
specialization is presented in the following snippet. Displayed in bold are the portions that have been modified compared to the typical find_first_of
implementation of the previous subsection. [s4.1] The same code has been benchmarked in two different test configurations [s5] and has been proved to be very efficient in practice, exactly like its theoretical complexity indicates.
template<typename InputIter, typename ForwardIter>
InputIter find_first_of(InputIter first1,
InputIter last1, ForwardIter first2, ForwardIter last2)
{
unsigned char MapArray[256];
memset(MapArray, 0, 256);
for (ForwardIter iter = first2; iter != last2; ++iter)
MapArray[(unsigned char) *iter] = 1;
for ( ; first1 != last1; ++first1)
{
if (MapArray[(unsigned char) *first1] != 0)
return first1;
}
return last1;
}
Although the dpx::find_first_of
specialization is only applicable with one-byte integers, this fact does not reduce its significance at all. The specialization is still more generic than the strpbrk
C library function. Hence, it can effectively moderate the drawbacks of the find_first_of
algorithm compared to its successor. Furthermore, it will eliminate the frustration that C++ developers will feel when they discover the performance penalty that they have to pay each time they prefer find_first_of
instead of strpbrk
. In addition, if a specialization like this is included in an STL implementation, the performance of find_first_of
will be improved automatically, without requiring any special handing by the developer.
Moreover, I could add to the above reasoning that many C++ developers are still very hesitant when they are about to use a generic STL algorithm, even though most of these algorithms are quite elegant, very versatile and STL guarantees their performance. Having STL algorithms that perform much worse than their C library counterparts -- i.e. find_first_of
vs strpbrk
[s3.2] -- will make programmers even more reluctant to exploit these excellent programming tools. In addition, since Alexander Stepanov affirmed a long time ago [7] that the generic STL algorithms should not impose any performance penalty compared to their non-generic versions written in C/C++ code, it would be important for the reputation of the STL to eliminate any severe performance drawbacks that may still exist among its generic algorithms. For all these reasons, the use of a find_first_of
specialization for one-byte integers could be quite beneficial.
However, there is also a price to pay when using the dpx::find_first_of
specialization. Namely, the byte-array which is used by this specialization to dramatically speed up the set-membership testing consumes 256 bytes of RAM when the algorithm is running. This fact may be prohibitive in cases when RAM is limited, such as embedded systems. A possible workaround is to replace the array of 256 bytes with an array of 256 bits. The bit array will consume only 32 bytes of RAM -- 8 times less -- while reducing the performance of the algorithm only slightly and without affecting its complexity. The final choice clearly depends on RAM availability.
Apart from the fact that the find_first_of
algorithm can be significantly improved, in the case of the one-byte integers, C++ developers can also utilize some other efficient alternatives that C++ and STL offer, provided that they are aware of the find_first_of
performance problems. For instance, they have at their disposal associative containers like set
[9] and hash_set
[10], which are specifically designed for efficient set-membership testing. Consequently, if the elements of the [first2, last2)
range are loaded in an associative container, then an adequate functor in conjunction with the find_if
[11] STL algorithm can effectively imitate the find_first_of
behavior in a much more efficient way. The following code snippet presents two examples that demonstrate how to use the set
and hash_set
containers respectively in order to efficiently replicate the find_first_of
algorithm.
template<typename Set>
struct SetMembershipFunctor : public std::unary_function<
typename Set::value_type, bool>
{
public:
SetMembershipFunctor(Set *setPtr)
{
mSetPtr = setPtr;
}
inline bool operator() (const typename Set::value_type val) const
{
return mSetPtr->find(val) != mSetPtr->end();
}
protected:
Set *mSetPtr;
};
typedef std::set<element_type> set_of_elements;
set_of_elements s1(search_elements.begin(), search_elements.end());
SetMembershipFunctor<set_of_elements> memberOfSet1(&s1);
search_range_type::const_iterator it1 =
std::find_if(search_range.begin(), search_range.end(), memberOfSet1);
typedef hash_set<element_type> hash_set_of_elements;
hash_set_of_elements s2(search_elements.begin(), search_elements.end());
SetMembershipFunctor<set_of_elements> memberOfSet2(&s2);
search_range_type::const_iterator it2 =
std::find_if(search_range.begin(), search_range.end(), memberOfSet2);
Furthermore, we can also effectively imitate the behaviour of the find_first_of
algorithm by just using sorted ranges. Namely, if we transform the [first2, last2)
elements into a sorted range, we can afterwards use an adequate functor in conjunction with the find_if
[11] STL algorithm, like the following code snippet demonstrates, and outrun by far the most notable find_first_of
implementations.
template<typename ForwardIter>
struct SortedRangeMembershipFunctor
: public std::unary_function<
typename std::iterator_traits<ForwardIter>::value_type, bool>
{
public:
SortedRangeMembershipFunctor(ForwardIter first, ForwardIter last)
: mFirst(first), mLast(last) {}
inline bool operator() (
const typename std::iterator_traits<ForwardIter>::value_type val) const
{
return std::binary_search(mFirst, mLast, val);
}
protected:
ForwardIter mFirst, mLast;
};
typedef std::vector<test_string::value_type> sorted_range;
sorted_range elementsRange(search_elements.begin(), search_elements.end());
std::sort(elementsRange.begin(), elementsRange.end());
SortedRangeMembershipFunctor<sorted_range::const_iterator>
memberOfRange(elementsRange.begin(), elementsRange.end());
search_range_type::const_iterator it =
std::find_if(search_range.begin(), search_range.end(), memberOfRange);
Theoretically, the set-membership testing has only constant time complexity on the hash_set
container, while the complexity of the same procedure on the std::set
container and on sorted ranges is logarithmic. Consequently, the expected time complexities of the find_first_of
alternatives are O(last1-first1)
for the hash_set
based solution and O((last1-first1)*log(last2-first2))
for the other two approaches. The performance tests [s5] have actually confirmed the above complexity estimations, proving in practice the superiority of these approaches against the most notable find_first_of
implementations. [2,4]
Unfortunately, the above workarounds are only compatible with a subset of the data types which the find_first_of
algorithm supports. Namely, they can only work with items which are either sortable or good candidates for hash_set
elements. Consequently, these alternatives cannot completely and unconditionally replace the more generic find_first_of
STL algorithm.
In the previous sections, this article discussed the performance problems found in the most notable find_first_of
implementations [s4.1]. This article has also presented an efficient find_first_of
specialization [s4.2] and some other options. [s4.3] In this next section, we are going to measure and compare the speed of these techniques in practice, by actually running and timing them. The performance tests have been based upon the following rules:
- The test scenario is rather simple. Two notable
find_first_of
implementations [2,4], the strpbrk
C library function [6], the dpx::find_first_of
specialization [s4.2] and some other find_first_of
alternatives presented in this article [s4.3] have been timed while processing large amounts of data. Namely, every tested algorithm has scanned and overtaken exactly N
items, looking for the first occurrence of any one out of M
different values. In order to ensure accurate time measurements, the input data have been fabricated in such a way that the search criteria are only met at the N+1
item. N=last1-first1
& M=last2-first2
in find_first_of
[s2.1].
- The number
N
of the items that are overtaken during the searching procedure has been kept the same in all tests. Thus, this factor can be safely excluded from the reasons that can cause timing variations.
- The only factor that actually varies during the tests is the number
M
of the values we are searching for. The primary mission of these performance tests is to demonstrate how the increment of the M
factor affects the elapsed times in each tested algorithm.
- One-byte integers are the only item type that has been used in these tests. This was actually a rather imperative decision, since both the
strpbrk
C library function and the dpx::find_first_of
specialization are only applicable with one-byte integers.
- Two distinct configurations have been used in the tests. In each configuration the hardware, operating system, compiler,
find_first_of
(STL) implementation and strpbrk
(C library) implementation are intentionally different. All tests have been performed twice: once in each configuration, using exactly the same test scenario and code.
- The code used for the tests is included at the top of this article and can be used both for verification of the test results and also for further experimentation.
The test results have been visualized in the following two graphs: one graph for each of the two test configurations and one graph-line for each tested algorithm. The vertical axes in both graphs represent the elapsed time, while the horizontal axes represent the number M
of the values we are searching for. The exact values of the elapsed times have been omitted, since they are configuration-dependent and not particularly useful. What matters the most is the variation of the elapsed times when M
is being increased.
Performance test A
|
Configuration
HW |
Intel Pentium III - 800 Mhz |
OS |
Fedora Core 5 Linux |
SW |
GCC 4.1.0 |
Description
|
Performance test B
|
Configuration
HW |
Intel(R) Core(TM)2 6400 - 2.13 GHz |
OS |
Windows XP Pro |
SW |
Microsoft Visual C++ 2005 |
Description
|
Points of interest
Evidently, the elapsed times of all the tested algorithms depend on the number N
of the items that are overtaken during the searching procedure. On the other hand, the number M
of the values we are searching for has an impact on the performance of the above algorithms. This can vary from insignificant to severe, depending on the design of each algorithm. These facts can be visually observed in the graphs. Thus, the six tested algorithms can actually be divided into three basic categories according to their time complexities:
- Insignificant impact of M, time complexity O(N). Both graphs demonstrate that the performance of the
strpbrk
(line B) function, the dpx::find_first_of
specialization (line C) and the find_first_of
alternative that is based on the hash_set
(line E) remain practically unaffected when M
is increased. However, the hash_set
based alternative is significantly slower than the other two counterparts.
- Mild impact of M, time complexity O(N*log(M)). The
find_first_of
alternatives which are based on std::set
(line D) and on sorted ranges (line F) are slowing down a little bit when M
is increased.
- Severe impact of M, time complexity O(N*M). The
find_first_of
implementations (line A) of both the test configurations are slowing down very much when M
is increased. These two implementations are by far the most inefficient algorithms in these performance tests.
Undoubtedly the find_first_of
generic STL algorithm is a generalization of the older strpbrk
C library function, more versatile and superior in theory. [s3.1] Unfortunately, in the most notable STL implementations [1,2,3,4] the find_first_of
algorithm has been abstracted disregarding its efficiency drawbacks [s4.1]. This makes it quite dangerous in performance-critical applications! [s3.2]
The worst part of the problem is that only a few C++ developers are likely to suspect the performance drawbacks of the find_first_of
generic algorithm compared to the strpbrk
C library function. [s3] The average programmer will endure a completely unexpected performance penalty, falling practically into a performance pitfall. Fortunately, there is a way out of this pitfall, provided that the STL implementers are willing to incorporate in their libraries some efficient find_first_of
specializations for one-byte integers, like the one presented in this article. [s4.2]
However, things can be much better in the rare case that the programmer is fully aware of the find_first_of
efficiency problems. Many efficient alternatives and workarounds [s4.3] can be utilized by C++ developers in order to avoid using this hopelessly inefficient generic STL algorithm. It turns out that find_first_of
is probably an STL algorithm that we should learn how to avoid!
My general conclusion, after studying many notable STL implementations, is that although the generic STL algorithms can theoretically compete with the performance of their non-generic counterparts, there is still much to be done before this can actually happen in practice. Unfortunately, I am not particularly pleased with the relatively slow rate that the STL generic algorithms evolve regarding their efficiency. This is ultimately why I write articles like this one and like my previous article about the inefficiency of the search_n
implementations. [12] As an epilogue, I would like to repeat once more the words of Alexander Stepanov [8], "It is silly to abstract an algorithm in such a way that when you instantiate it back it becomes inefficient."
Portions of the article's text and code are derived from the SGI/HP implementation and documentation of STL.
Copyright � 1996-1999
Silicon Graphics Computer Systems, Inc.
Permission to use, copy, modify, distribute and sell this software and its documentation for any purpose is hereby granted without fee, provided that the above copyright notice appears in all copies and that both that copyright notice and this permission notice appear in supporting documentation. Silicon Graphics makes no representations about the suitability of this software for any purpose. It is provided "as is" without express or implied warranty.
Copyright � 1994
Hewlett-Packard Company
Permission to use, copy, modify, distribute and sell this software and its documentation for any purpose is hereby granted without fee, provided that the above copyright notice appears in all copies and that both that copyright notice and this permission notice appear in supporting documentation. Hewlett-Packard Company makes no representations about the suitability of this software for any purpose. It is provided "as is" without express or implied warranty.
- 19 April 2007
- 18 June 2007
- Article edited and moved to the main CodeProject.com article base.
- The SGI STL implementation
- The STL shipped with the GCC-C++ compiler
- The STLport STL implementation
- The STL shipped with the MS-VC++ compiler
- The find_first_of generic algorithm
- Search Functions: strpbrk
- Alexander Stepanov writes about the STL in the BYTE magazine
- Alexander Stepanov talks about the STL in the Dr.Dobb's Journal
- The set associative container
- The hash_set associative container
- The find_if generic algorithm
- Can search_n be more efficient?