How to Write Abstract Iterators in C++

ocoudert

4.60/5 (5 votes)

7 Jul 2010CPOL4 min read

14.3K

How to write abstract iterators in C++

When developing in C++, an impeccable API is a must have: it has to be as simple as possible, abstract, generic, and extensible. One important generic concept that STL made C++ developers familiar with is the concept of iterator.

An iterator is used to visit the elements of a container without exposing how the container is implemented (e.g., a vector, a list, a red-black tree, a hash set, a queue, etc.). Iterators are central to generic programming because they are an interface between containers and applications. Applications need access to the elements of containers, but they usually do not need to know how elements are stored in containers. Iterators make it possible to write generic algorithms that operate on different kinds of containers.

For example, the following code snippet exposes the nature of the container –a vector.

C++

void process(const std::vector<E>& v)
{
    for (unsigned i = 0; i < v.size(); ++i) {
        process(v[i]);
    }
}

If we want to have the same function operating on a list, we have to write a separate function. Or if we later decide that a list or a hash set is more appropriate as a container, we need to rewrite the code everywhere we access the vector. This may require a lot of changes in many files. Contrast this container-specific visitation scheme to the following:

C++

template <typename Container>
void process(const Container& c)
{
    typename Container::const_iterator itr = c.begin();
    typename Container::const_iterator end = c.end();
    for (; itr != end; ++itr) {
        process(*itr);
    }
}

Using the notion of iterator, we have a generic processing of a container ‘c’, whether it is a vector, a list, a hash set, or any data structure that provides iterators in its API. Even better, we can write a generic process function that only takes an iterator range, without assuming that the container has a begin() and end() method:

C++

template <typename Iterator>
void process(Iterator begin, Iterator end)
{
    for (; itr != end; ++itr) {
        process(*itr);
    }
}

An STL iterator is a commodity that behaves as a scalar type:

It can be allocated on the heap
It can be copied
It can be passed by value
It can be assigned to

The essence of an iterator is captured by the following API.

C++

template <typename T>
class Itr {
public:
    Itr();
    ~Itr();
    Itr(const Itr& o);                   // Copy constructor
    Itr& operator=(const Itr& o);        // Assignment operator
    Itr& operator++();                   // Next element
    T&   operator*();                    // Dereference
    bool operator==(const Itr& o) const; // Comparison
    bool operator!=(const Itr& o) const { return !(*this == o); }
}

Usually the container will provide a begin() and end() method, which build the iterators that denote the container’s range. Writing these begin/end methods is an easy task if the container is derived from an STL container, if the container has a data member that is an STL container, or if the iterator is a scalar type, like a pointer or an index.

It is more complicated if we want iterators that dereference to the same type of object, but that must visit several containers, possibly of different types, or iterators that visit containers in different manners. For instance, let us assume that we have objects with some property (say, a color) stored in several containers, some of them of different types. We would like to visit all the objects, independently of the number of containers and their type, or we would like to visit objects of a given color, or we would like to visit objects that satisfy some predicate:

C++

Itr<E> begin();          // This give the range to visit
Itr<E> end();            // all the elements of type E

Itr<E> begin(const Color& color);    // Same as above but only for the
Itr<E> end(const Coir& color);       // elements of the given color

class Predicate {
public:
    bool operator()(const E& e);
};

Itr<E> begin(Predicate& p);  // Same as above but only for the
Itr<E> end(Predicate& p);    // elements that satisfy the predicate

In this case, the iterator is more complex than a scalar type like a pointer or an index: it needs to keep track of which container it is currently visiting, or which color or predicate it needs to check. In general, the iterator may have data members so that it can fulfill its task. Also, we want to factorize the code and reuse general purpose iterators’ methods when writing more targeted iterators –e.g., visiting elements of a specific color should make use of the next-element method Itr<E>::operator++(). This can be done by having Itr<E> be a virtual class, and having derived classes to implement the different iterators. For example:

C++

class E {
public:
    Color& color() const;
};

template <typename E>
class ColoredItr<E> : public Itr<E> {
private:
    typedef Itr<E> _Super;
public:
    ColoredItr<E>(const Color& color) : Itr<E>(), color_(color) {}
    virtual ~ColoredItr<E>;
    virtual ColoredItr<E>& Operator++() {
       for (; _Super::operator*().color() != color_; _Super::operator++());
       return *this;
    }
private:
    Color color_;
};

We would like a generic iterator that meets all the requirements described above:

It can be allocated on the heap
It can be copied
It can be passed by value
It can be assigned to
It dereferences to the same type
It can visit several containers
It can visit containers of different types
It can visit containers in an arbitrary manner

This can be implemented as follows:

C++

template<typename E>
class ItrBase {
public:
    ItrBase() {}
    virtual ~ItrBase() {}
    virtual void  operator++() {}
    virtual E&    operator*() const { return E(); }
    virtual ItrBase* clone() const { return new ItrBase(*this); }
    // The == operator is non-virtual. It checks that the
    // derived objects have compatible types, then calls the
    // virtual comparison function equal.
    bool operator==(const ItrBase& o) const {
        return typeid(*this) == typeid(o) && equal(o);
    }
protected:
    virtual bool equal(const ItrBase& o) const { return true; }
};

template<typename E>
class Itr {
public:
    Itr() : itr_(0) {}
    ~Itr() { delete itr_; }
    Itr(const Itr& o) : itr_(o.itr_->clone()) {}
    Itr& operator=(const Itr& o) {
        delete itr_; itr_ = o.itr_->clone(); return *this;
    }
    Itr&  operator++() { ++(*itr_); return *this; }
    E&    operator*() const { return *(*itr_); }
    bool  operator==(const IDItr& o) const {
        return (itr_ == o.itr_) || (*itr_ == *o.itr_);
    }
    bool  operator!=(const IDItr& o) const { return !(*this == o); }

protected:
    ItrBase<T>* itr_;
};

The ItrBase class is the top class of the hierarchy. Itr is simply a wrapper on a pointer to an ItrBase, so that it can be allocated on the heap – the actual implementation of the class deriving from ItrBase can have an arbitrary size. Note how the Itr copy and assignment operators are implemented via the ItrBase::clone() method, so that Itr behaves as a scalar type. Last but not least, the (non-virtual) ItrBase::operator== equality operator first checks for type equality before calling the (virtual) equality method equal on the virtual subclass.

Iterators on containers of elements of type E just need to derive from ItrBase<E>, and a factory providing the begin() and end() methods for any specialized iterator returns object of type Itr<E>.

No related posts.

CodeProject

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)