The second part of the iterators series. Iterators might be a dangerous thing sometimes. What is the risk they bring with them, and how to prepare for / deal with it?
Iterators can make your code maintainable and generic, but they also require an additional attention when used in a dynamic collection. This time, the story is based on a real project I worked on, and I hope it’ll help you solve an issue you’re currently working on, or avoid this issue in the first place.
Previous article on the series: Maintain Your Iterations – The Iterators Secret – Part 1
Next article on the series: Maintain Your Iterations – Iterators Customization – Part 3
The Phenomenon
It all started approximately two years ago. My team and I were about to add a significant ability to our system. At first, we thought that this change should not have a significant effect on our code architecture, and that it’ll sum up to a change of a single shared pointer to a vector of shared pointers. It looked like that:
class my_system {
public:
explicit my_system(int val) : value(val) {};
~my_system() = default;
my_system(const my_system&) = default;
my_system(my_system&&) = default;
my_system& operator=(const my_system &ref) = default;
my_system& operator=(my_system &&ref) = default;
void func() {
std::cout << value << std::endl; std::this_thread::sleep_for(std::chrono::seconds(1));
std::cout << value << std::endl; }
private:
int value; };
class my_collection {
public:
void add_system() {
std::vector<my_system>::iterator it;
{
std::lock_guard g(guard);
my_systems.emplace_back(my_systems.size() + 500);
it = my_systems.end() - 1;
}
it->func();
}
private:
std::vector<my_system> my_systems;
std::mutex guard;
};
But when we did it, some strange phenomena appeared, and it didn’t matter what we did to solve it, the phenomena still occurred. For example: sometimes, the system crashed after it initialized two systems in my_collection
, and erased the first one of them.
At this point, the team divided into two groups: The first continued to try new ways of solving the issue and to achieve the desired feature. The second group (myself) tried to revert the changes and to find the latest stable version of the system.
After some digging, my conclusion was: there is no stable version that doesn’t have this bug, which means that we just increased the bug chances to appear with our change and didn’t create it.
In order to understand the crash, which in our system occurred in a different place each time (more complex than the above example), we had two directions. The first was threads collisions and the second was the behavior of our collection behind the scenes.
It took some time but the conclusion was that although we had a threads collisions issue, they weren’t the cause for this specific crash. On the other hand, the second lead (which researched by a team member who preferred to stayed anonymous), raised an interesting conclusion.
For example, let’s take the following, relatively simple, case:
int main() {
my_collection mc;
std::thread t([&] { mc.add_system(); });
std::thread t1([&] { mc.add_system(); });
t.join();
t1.join();
return EXIT_SUCCESS;
}
A possible result:
500
501
A garbage value
501
Take a moment and think about the way std::vector
works: contiguous allocation.
Behind the Scenes – Vectors
Well, something bad happened there, and when we replace the single int
to a vector, it won’t print that garbage value, it’ll crash. So what happens behind the scenes of std::vector
that makes this phenomenon?
std::vector
is a dynamic array, which means that its elements will always be contiguous in the memory and it can increase & decrease its size. Assume the vector placed itself on a place in the memory where it has only one place for a contiguous allocation available. The next time, we’ll ask him nicely to add one more element to the collection, it’ll have to change the entire array location in the memory and allocate everything all over again in another place.
Now pay attention to our my_system::func
. There is a delayed time between the first print and the second print, and both of them read from the pointer this
. this
is a const
pointer, which means that as long as it is alive, it’ll point to the exact same address. Now, have a look at the possible events order:
- Create
my_system
inside my_systems
vector. - Start: Call
my_systems[0].func()
. - Try: Create
my_system
inside my_systems
vector.
- Not enough contiguous memory allocation place.
- Move the first element to a new memory location.
- Create
my_system
inside my_systems
vector.
- Start: Call
my_systems[1].func()
. - End: Call
my_systems[0].func()
. - End: Call
my_systems[1].func()
.
Pay attention to the fact that the event of calling my_systems[0].func()
might start when the location is X
and to end when the location is different. This will lead to the invalidation of this
pointer, and any usage of it is a UB.
Iterators Invalidation
This special case is called “Iterators Invalidation”. There are several collections, given by std
, that will cause iterators invalidation when performing specific methods on them, e.g., increase / decrease collection size.
A simple example to iterators validation is the following loop:
std::vector<int> vec = {1, 2, 3, 4};
for (auto &curr : vec) {
if (curr == 2) vec.emplace_back(4);
std::cout << curr << std::endl;
}
In the above loop, everything that happens after the insertion, is a UB.
How to Fix / Avoid
There are several ways for fixing/avoiding this issue:
- Use
std::vector::reserve(size_t n)
. This function makes sure to reserve the vector other n
contiguous values places in memory (without allocating them), so whenever you create another allocation, as long as it inside the range you specified, it won’t re-allocate the vector’s elements all over again.
Be aware that if you need more places than you specified, you will probably meet your old friend again. - Go around the problem with smart pointers. Smart pointers will make you care less about the vector’s elements location, because from now on, it doesn’t corrupt your data.
Now let’s point to the big elephant in the room: Do you really want to move everything you work with to your heap memory? Watchout! The elephant almost broke another door! - And my last offered solution: Move to a different container that doesn’t have this Iterators Invalidation issue. My team and I decided to move to
std::list
. This collection has slower access to a specific element inside it, but the other options were worse (For the first: we don’t want to grab unneeded memory ahead and we don’t want to limit our collection size, and for the second: The elephant is too big!).
Conclusion
In C++, you can’t take things, even from the standard library, without a full consideration before. The things which happen behind the scenes might, and usually will, have a dramatic effect on your code performances and issues.
The std
library does offer you shortcuts and does implement a lot of features and structures for you, but it usually doesn’t take responsibility for your usage. For example, the collections usually won’t be thread protected and some of them will have cases of iterators invalidation.
Please share similar issues in the comments section below and feel free to share if this article helped you to solve an issue in your code.
Posts in this Series