Introduction
In C++, traditionally we have been writing loops to iterate over a range of values as follows:
for (int i = 0; i < 100; i++)
{
…
}
It works well most of the time if we use signed integers. But for unsigned values, there may be a problem. For example, if we would like to iterate through unsigned values in descending order from 5 down to 0, we can easily make a mistake:
for (std::size_t j = 5; j >= 0; j--) {
. . .
}
We have to be inventive:
for (std::size_t j = 6; j-- != 0;)
{
. . .
}
We had to start with 6, not 5.
Another issue is if we would like to look through the whole range of values: say, for the unsigned char
type. We have to write something like this:
for (unsigned char c = std::numeric_limits<unsigned char>::min(), stop = 0;
stop == 0;
(c == std::numeric_limits<unsigned char>::max() ? stop = 1 : c++))
{
std::cout << static_cast<unsigned>(c) << std::endl;
}
The problem is more serious with enumerated types. If we want to iterate in descending order through the whole range of values for the following type:
enum CountThree
{
One,
Two,
Three
};
We have to be even more inventive:
for (CountThree x = Three; x >= One; x = static_cast<CountThree>(static_cast<int>(x)-1))
{
std::cout << x << std::endl;
}
It would be nice to be able to provide a proper range for a loop, which allows to easily specify the left and the right bounds and, optionally, the step.
Implementation of the range Template
In C++11, there is a range-based for
-loop, which allows to iterate over values if an iterator is available. Usually, it is used to iterate through the contents of a container. For example:
std::vector<int> a = { 1, 2, 3 };
for (auto x : a)
{
std::cout << x << std::endl;
}
This loop will print:
1
2
3
In order to implement a range template, we have to create an iterator, which will step through the given range of values. Here is a possible implementation:
template<class T, class StepType = int>
class range
{
public:
class range_iterator
{
StepType step;
StepType i;
public:
range_iterator(StepType a, StepType step1) :i(a), step(step1) {}
range_iterator& operator++()
{
i += step;
return *this;
}
T operator*() const
{
return static_cast<T>(i);
}
operator StepType() const { return i; }
};
private:
range_iterator start;
range_iterator finish;
inline static StepType finish_value(StepType left, StepType range, StepType step)
{
if (step == 0) return left;
StepType ratio = (range + step)/ step;
if (ratio >= 1) return left + ratio * step;
return left;
}
public:
range(T left, T right, StepType step = 1) :
start(static_cast<StepType>(left), step),
finish(finish_value(static_cast<StepType>(left),static_cast<StepType>(right)-static_cast<StepType>(left),step), step) {}
range_iterator& begin() { return start; }
range_iterator& end() { return finish; }
};
Let's look at the function finish_value
, which returns either the left bound or the value beyond the right bound. If it returns the left bound, the loop will not perform any iterations at all. If the step is zero, no iterations will be performed, no matter what the values of the loop range bounds are.
There is a conversion function to StepType
, which makes it possible to compare values of the objects of the range_iterator
class. Instead of this conversion function, it would be possible just to define operator=
and operator!=
.
The problem with the above definition of the range
template is that the StepType
is int
by default. What if we want to iterate through a wider range using unsigned
or long long int type
: for example, from 0 to 4,000,000,000. In this case, we have to use, for example, the long long int
type. That means that the user has to make a decision of the StepType
depending on the range of the values, which is inconvenient.
Another approach could be just to define StepType
as long long int
by default. It would be all right on 64-bit systems, but on 32-bit systems, the code will run more slowly than it could: we will lose speed.
The right solution would be to select the StepType
automatically, depending on the type T
. The way to do it is to write a template with specialization:
template <class T>
struct range_step_type
{
typedef long int type;
};
template <>
struct range_step_type<int>
{
typedef std::conditional<sizeof(long long int) ==
sizeof(std::ptrdiff_t), long long int, long int>::type type;
};
template <>
struct range_step_type<long>
{
typedef std::conditional<sizeof(long long int) ==
sizeof(std::ptrdiff_t), long long int, long int>::type type;
};
template <>
struct range_step_type<unsigned>
{
typedef std::conditional<sizeof(long long int) ==
sizeof(std::ptrdiff_t), long long int, long int>::type type;
};
template <>
struct range_step_type<long unsigned>
{
typedef std::conditional<sizeof(long long int) ==
sizeof(std::ptrdiff_t), long long int, long int>::type type;
};
template <>
struct range_step_type<long long int>
{
typedef long long int type;
};
template <>
struct range_step_type<long long unsigned>
{
typedef long long int type;
};
Here we expect that, as in most systems, that the long int
size is the same as the int
size. The std::ptrdiff_t
size depends on the system. On the 64-bit system, it is usually equal to the size of long long int
, but on the 32-bit system its size is the same as that of int
.
On the 64-bit system, it is efficient to use long long int
for a loop counter. That's why, for such a system, the range_step_type<T>
is defined as long long int
for the int
, unsigned
, long
and unsigned long
types. On the other hand, for other systems it is defined as long int
.
Now we can modify the range
template as follows:
template<class T>
class range
{
typedef typename range_step_type<T>::type StepType;
public:
class range_iterator
{
StepType step;
StepType i;
public:
...
};
...
};
Some Examples
Let's look at some examples. First of all, the following three loops will not execute their bodies at all:
(1)
for (unsigned x : range<unsigned>(0, 10, -3)) {
std::cout << x << std::endl;
}
(2)
for (unsigned x : range<unsigned>(10, 10, 0)) {
std::cout << x << std::endl;
}
(3)
for (unsigned x : range<unsigned>(10, 1) {
std::cout << x << std::endl;
}
The following loop will correctly produce 4 values as its output:
for (auto x : range<unsigned>(0, 100, 27))
{
std::cout << x << std::endl;
}
It will print out these values:
0
27
54
81
This code will correctly print the line "65536 iterations
":
count = 0;
for (auto x : range<short>(-32768, 32767))
{
count++;
}
std::cout << count << " iterations" << std::endl;
And this loop will output the line "256 iterations
":
count = 0;
for (unsigned char x : range<unsigned char>(0, 255))
{
count++;
}
std::cout << count << " iterations" << std::endl;
But when it comes to 32-bit int
(or long int
) ranges, there are limitations. On a 32-bit system, you will not get the correct result for the following two loops:
(1)
long long unsigned counter = 0;
for (auto x : range<int>(0, 2147483647)) {
counter++;
}
std::cout << counter << " iterations" << std::endl;
(2)
counter = 0;
for (auto x : range<int>(-2147483647,0)) {
counter++;
}
std::cout << counter << " iterations" << std::endl;
In fact, there are two major limitations if you use range<int>
:
- The range of value (the absolute difference between the right and the left bounds) plus the abs value of the step should be less than 2147483647 (=231-1);
- Absolute value of the sum of the right bound and the step value should be less than 2147483647.
If one of these limitations is violated, you could get an unpredictable result.
If you would like to apply the ranges shown in the last two examples, use range<long long int>
:
counter = 0;
for (auto x : range<long long int>(-2147483648LL,0))
{
counter++;
}
std::cout << counter << " iterations" << std::endl;
This code will output the expected result:
2147483649 iterations
The range template works with enumeration types (unscoped and scoped). Here is an example with a scoped enumeration:
enum class MyEnum
{
A,
B,
C
};
int count = 0;
for (MyEnum e : range<MyEnum>(MyEnum::C, MyEnum::A, -1))
{
std::cout << static_cast<unsigned>(e) << std::endl;
}
This code will print:
2
1
0
Benchmarks for the 64-bit Code
Several tests were used to compare the performance of loops with range<>
against standard for
-loops. The tests used two levels of loops and simple arithmetic expressions, which, however, allowed to avoid aggressive optimization that could distort the results. 64-bit compilers were used on a 64-bit system.
For GCC 4.9 C++ the following command line was used:
g++ -std==c++1y -m64 -O3 range_test.cpp -o range_test
Clang 3.4 C++ compiler was called with this command:
clang++ -std=c++1y -m64 -O3 range_test.cpp -o range_test
In VC++ 2013, the code was compiled using available optimizations with the options "Maximized for Speed" and "Favor Fast Code".
The GCC 4.9 C++ compiler produced code, which showed, in one case, lower speed -- about 9%, in other cases 7% higher than using standard loops; on average the speed was slightly higher using ranges.
The Clang 3.4 compiler generated the code, which showed, in few tests that the range<>
speed was about 4% lower than using standard loops; in other tests the speed was the same or higher (about 2% higher). On average, the deviation was about 2%.
The code compiled with Visual C++ 2013 showed, in some tests, higher speed in using ranges (about 2%); in other tests slower speed (the worst case was 2% slower than using standard loops). On average, the test showed about 2% deviation between the two types of loops.
Benchmarks for the 32-bit Code
Tests were also carried out for the 32-bit code, produced by GCC 4.9 and Visual C++2013 compilers.
The worst case for GCC 4.9 code was 7% slower speed for range<>
than for standard loops; the average deviation was less than 2%. For Visual C++ 2013, the worst case was 4%, with average deviation about 3%.
Conclusion
The worst cases in benchmarks were in few tests that contain high number of iterations in the outer loop (over 1,000,000) and few iterations in the inner loop (less than 200). The relatively poor performance (7%-9% slower) in those cases was due to the fact that initialization of the range template takes more time due to using multiplication and division.
Overall, the results of the tests have shown that the range template can be used for writing efficient loops.