Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C++

Maintain Your Iterations – Insecure Iterations – Part 2

0.00/5 (No votes)
22 Apr 2023CPOL5 min read 3.1K  
Risk Iterators bring with them, and how to prepare for / deal with it
The second part of the iterators series. Iterators might be a dangerous thing sometimes. What is the risk they bring with them, and how to prepare for / deal with it?

Iterators can make your code maintainable and generic, but they also require an additional attention when used in a dynamic collection. This time, the story is based on a real project I worked on, and I hope it’ll help you solve an issue you’re currently working on, or avoid this issue in the first place.

Previous article on the series: Maintain Your Iterations – The Iterators Secret – Part 1

Next article on the series: Maintain Your Iterations – Iterators Customization – Part 3

The Phenomenon

It all started approximately two years ago. My team and I were about to add a significant ability to our system. At first, we thought that this change should not have a significant effect on our code architecture, and that it’ll sum up to a change of a single shared pointer to a vector of shared pointers. It looked like that:

C++
class my_system {
public:
    explicit my_system(int val) : value(val) {};
    ~my_system() = default;
    my_system(const my_system&) = default;
    my_system(my_system&&) = default;
    my_system& operator=(const my_system &ref) = default;
    my_system& operator=(my_system &&ref) = default;

    void func() {
        std::cout << value << std::endl; // 500 as expected
        std::this_thread::sleep_for(std::chrono::seconds(1));
        std::cout << value << std::endl; // A garbage value
    }

private:
    int value; // Make it a vector
};

class my_collection {
public:
    void add_system() {
        std::vector<my_system>::iterator it;
        {
            std::lock_guard g(guard);
            my_systems.emplace_back(my_systems.size() + 500);
            it = my_systems.end() - 1;
        }
        it->func();
    }

private:
    std::vector<my_system> my_systems;
    std::mutex guard;
};

But when we did it, some strange phenomena appeared, and it didn’t matter what we did to solve it, the phenomena still occurred. For example: sometimes, the system crashed after it initialized two systems in my_collection, and erased the first one of them.

At this point, the team divided into two groups: The first continued to try new ways of solving the issue and to achieve the desired feature. The second group (myself) tried to revert the changes and to find the latest stable version of the system.

After some digging, my conclusion was: there is no stable version that doesn’t have this bug, which means that we just increased the bug chances to appear with our change and didn’t create it.

In order to understand the crash, which in our system occurred in a different place each time (more complex than the above example), we had two directions. The first was threads collisions and the second was the behavior of our collection behind the scenes.

It took some time but the conclusion was that although we had a threads collisions issue, they weren’t the cause for this specific crash. On the other hand, the second lead (which researched by a team member who preferred to stayed anonymous), raised an interesting conclusion.

For example, let’s take the following, relatively simple, case:

C++
int main() {
    my_collection mc;
    std::thread t([&] { mc.add_system(); });
    std::thread t1([&] { mc.add_system(); });

    t.join();
    t1.join();
    return EXIT_SUCCESS;
}

A possible result:

500
501
A garbage value
501

Take a moment and think about the way std::vector works: contiguous allocation.

Behind the Scenes – Vectors

Well, something bad happened there, and when we replace the single int to a vector, it won’t print that garbage value, it’ll crash. So what happens behind the scenes of std::vector that makes this phenomenon?

std::vector is a dynamic array, which means that its elements will always be contiguous in the memory and it can increase & decrease its size. Assume the vector placed itself on a place in the memory where it has only one place for a contiguous allocation available. The next time, we’ll ask him nicely to add one more element to the collection, it’ll have to change the entire array location in the memory and allocate everything all over again in another place.

Now pay attention to our my_system::func. There is a delayed time between the first print and the second print, and both of them read from the pointer this. this is a const pointer, which means that as long as it is alive, it’ll point to the exact same address. Now, have a look at the possible events order:

  1. Create my_system inside my_systems vector.
  2. Start: Call my_systems[0].func().
  3. Try: Create my_system inside my_systems vector.
    1. Not enough contiguous memory allocation place.
    2. Move the first element to a new memory location.
    3. Create my_system inside my_systems vector.
  4. Start: Call my_systems[1].func().
  5. End: Call my_systems[0].func().
  6. End: Call my_systems[1].func().

Pay attention to the fact that the event of calling my_systems[0].func() might start when the location is X and to end when the location is different. This will lead to the invalidation of this pointer, and any usage of it is a UB.

Iterators Invalidation

This special case is called “Iterators Invalidation”. There are several collections, given by std, that will cause iterators invalidation when performing specific methods on them, e.g., increase / decrease collection size.

A simple example to iterators validation is the following loop:

C++
std::vector<int> vec = {1, 2, 3, 4};
for (auto &curr : vec) {
    if (curr == 2) vec.emplace_back(4);
    std::cout << curr << std::endl;
}

In the above loop, everything that happens after the insertion, is a UB.

How to Fix / Avoid

There are several ways for fixing/avoiding this issue:

  1. Use std::vector::reserve(size_t n). This function makes sure to reserve the vector other n contiguous values places in memory (without allocating them), so whenever you create another allocation, as long as it inside the range you specified, it won’t re-allocate the vector’s elements all over again.
    Be aware that if you need more places than you specified, you will probably meet your old friend again.
  2. Go around the problem with smart pointers. Smart pointers will make you care less about the vector’s elements location, because from now on, it doesn’t corrupt your data.
    Now let’s point to the big elephant in the room: Do you really want to move everything you work with to your heap memory? Watchout! The elephant almost broke another door!
  3. And my last offered solution: Move to a different container that doesn’t have this Iterators Invalidation issue. My team and I decided to move to std::list. This collection has slower access to a specific element inside it, but the other options were worse (For the first: we don’t want to grab unneeded memory ahead and we don’t want to limit our collection size, and for the second: The elephant is too big!).

Conclusion

In C++, you can’t take things, even from the standard library, without a full consideration before. The things which happen behind the scenes might, and usually will, have a dramatic effect on your code performances and issues.

The std library does offer you shortcuts and does implement a lot of features and structures for you, but it usually doesn’t take responsibility for your usage. For example, the collections usually won’t be thread protected and some of them will have cases of iterators invalidation.

Please share similar issues in the comments section below and feel free to share if this article helped you to solve an issue in your code.

Posts in this Series

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)