Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C++

How to write abstract iterators in C++

4.75/5 (17 votes)
14 Jul 2010CPOL4 min read 55.4K  
How to write generic STL-like iterators in C++.

Introduction

When developing in C++, an impeccable API is a must have: it has to be as simple as possible, abstract, generic, and extensible. One important generic concept that STL made C++ developers familiar with is the concept of iterator.

An iterator is used to visit the elements of a container without exposing how the container is implemented (e.g., a vector, a list, a red-black tree, a hash set, a queue, etc.). Iterators are central to generic programming because they are an interface between containers and applications. Applications need access to the elements of containers, but they usually do not need to know how elements are stored in containers. Iterators make it possible to write generic algorithms that operate on different kinds of containers.

For example, the following code snippet exposes the nature of the container - a vector.

C++
void process(const std::vector<E>& v)
{
    for (unsigned i = 0; i < v.size(); ++i) {
        process(v[i]);
    }
}

If we want to have the same function operating on a list, we have to write a separate function. Or, if we later decide that a list or a hash set is more appropriate as a container, we need to rewrite the code everywhere we access the vector. This may require a lot of changes in many files. Contrast this container-specific visitation scheme to the following:

C++
template <typename Container>
void process(const Container& c)
{
     typename Container::const_iterator itr = c.begin();
     typename Container::const_iterator end = c.end();
     for (; itr != end; ++itr) {
         process(*itr);
     }
}

Using the notion of iterator, we have a generic processing of a container 'c', whether it is a vector, a list, a hash set, or any data structure that provides iterators in its API. Even better, we can write a generic process function that only takes an iterator range, without assuming that the container has a begin() and an end() method:

C++
template <typename Iterator>
void process(Iterator begin, Iterator end)
{
     for (; itr != end; ++itr) {
         process(*itr);
     }
}

An STL iterator is a commodity that behaves as a scalar type:

  • It can be allocated on the heap
  • It can be copied
  • It can be passed by value
  • It can be assigned to

The essence of an iterator is captured by the following API:

C++
template <typename T>
class Itr {
public:
     Itr();
     ~Itr();
     Itr(const Itr& o);                   // Copy constructor
     Itr& operator=(const Itr& o);        // Assignment operator
     Itr& operator++();                   // Next element
     T&   operator*();                    // Dereference
     bool operator==(const Itr& o) const; // Comparison
     bool operator!=(const Itr& o) const { return !(*this == o); }
}

Usually, the container will provide a begin() and an end() method, which build the iterators that denote the container's range. Writing these begin/end methods is an easy task if the container is derived from an STL container, if the container has a data member that is an STL container, or if the iterator is a scalar type, like a pointer or an index.

It is more complicated if we want iterators that dereference to the same type of object, but that must visit several containers, possibly of different types, or iterators that visit containers in different manners. For instance, let us assume that we have objects with some property (say, a color) stored in several containers, some of them of different types. We would like to visit all the objects, independently of the number of containers and their type, or we would like to visit objects of a given color, or we would like to visit objects that satisfy some predicate:

C++
Itr<E> begin(); // This give the range to visit
Itr<E> end();   // all the elements of type E      

Itr<E> begin(const Color& color); // Same as above but only for the
Itr<E> end(const Coir& color);    // elements of the given color      

class Predicate {
public:
    bool operator()(const E& e);
};      

Itr<E> begin(Predicate& p); // Same as above but only for the
Itr<E> end(Predicate& p);   // elements that satisfy the predicate

In this case, the iterator is more complex than a scalar type like a pointer or an index: it needs to keep track of which container it is currently visiting, or which color or predicate it needs to check. In general, the iterator may have data members so that it can fulfill its task. Also, we want to factorize the code and reuse general purpose iterator methods when writing more targeted iterators, e.g., visiting elements of a specific color should make use of the next-element method Itr<E>::operator++(). This can be done by having Itr<E> be a virtual class, and having derived classes to implement the different iterators. For example:

C++
class E {
public:
     Color& color() const;
};      

template <typename E>
class ColoredItr<E> : public Itr<E> {
private:
     typedef Itr<E> _Super;
public:
     ColoredItr<E>(const Color& color) : Itr<E>(), color_(color) {}
     virtual ~ColoredItr<E>;
     virtual ColoredItr<E>& Operator++() {
        for (; _Super::operator*().color() != color_; _Super::operator++());
        return *this;
     }
private:
     Color color_;
};

We would like a generic iterator that meets all the requirements described above:

  • It can be allocated on the heap
  • It can be copied
  • It can be passed by value
  • It can be assigned to
  • It dereferences to the same type
  • It can visit several containers
  • It can visit containers of different types
  • It can visit containers in arbitrary manners

This can be implemented as follows:

C++
template<typename E>
class ItrBase {
public:
     ItrBase() {}
     virtual ~ItrBase() {}
     virtual void  operator++() {}
     virtual E&    operator*() const { return E(); }
     virtual ItrBase* clone() const { return new ItrBase(*this); }
     // The == operator is non-virtual. It checks that the
     // derived objects have compatible types, then calls the
     // virtual comparison function equal.
     bool operator==(const ItrBase& o) const {
         return typeid(*this) == typeid(o) && equal(o);
     }
protected:
     virtual bool equal(const ItrBase& o) const { return true; }
};      

template<typename E>
class Itr {
public:
     Itr() : itr_(0) {}
     ~Itr() { delete itr_; }
     Itr(const Itr& o) : itr_(o.itr_->clone()) {}
     Itr& operator=(const Itr& o) {
         delete itr_; itr_ = o.itr_->clone(); return *this;
     }
     Itr&  operator++() { ++(*itr_); return *this; }
     E&    operator*() const { return *(*itr_); }
     bool  operator==(const Itr& o) const {
         return (itr_ == o.itr_) || (*itr_ == *o.itr_);
     }
     bool  operator!=(const Itr& o) const { return !(*this == o); }      

protected:
     ItrBase<E>* itr_;
};

The ItrBase class is the top class of the hierarchy. Itr is simply a wrapper on a pointer to an ItrBase, so that it can be allocated on the heap - the actual implementation of the class deriving from ItrBase can have an arbitrary size. Note how the Itr copy and assignment operators are implemented via the ItrBase::clone() method so that Itr behaves as a scalar type. Last but not least, the (non-virtual) ItrBase::operator== equality operator first checks for type equality before calling the (virtual) equality method equal on the virtual subclass. The reason ItrBase is not a pure virtual is that it can conveniently be used to denote an empty range, i.e., the range (ItrBase(), ItrBase()) is empty.

Iterators on containers of elements of type E just need to derive from ItrBase<E>, and a factory providing the begin() and end() methods for any specialized iterator returns object of type Itr<E>.

For example, let us assume that we have a container c of Es, and that we want an iterator to visit:

  1. all the elements of c, possibly with repetition;
  2. all the elements of c without repetition.

This can be done as follows:

C++
class E; // Let's assume it's defined somewhere
class ItrAll : public ItrBase<E> {
private:
    typedef ItrAll     _Self;
    typedef ItrBase<E> _Super;
public:
    ItrAll(Container& c) : _Super(), c_(c) {}
    virtual ~ItrAll() {}
    virtual void  operator++() { ++itr_; }
    virtual E&    operator*() const { return *itr_; }
    virtual ItrBase<E>* clone() const { return new _Self(*this); }
protected:
    virtual bool equal(const ItrBase<E>& o) const {
        // Casting is safe since types have been checked by _Super::operator==
        const _Self& o2 = static_cast<const _Self&>(o);
        return &c_ == &o2.c_ && itr_ == o2.itr_;
    }
protected:
    Container&          c_;
    Container::iterator itr_;
};     

class ItrNoRepeat : public ItrAll {
private:
    typedef ItrNoRepeat _Self;
    typedef ItrAll      _Super;
public:
    ItrNoRepeat (Container& c) : _Super(c) {}
    virtual ~ItrNoRepeat () {}
    virtual void  operator++() {
        _Super::operator++(); // Go to the next element then
        // look for an element that has not been visited yet.
        for (; itr_ != c_.end(); _Super::operator++()) {
            E& e = _Super::operator*();
            if (visited_.find(e) == visited_.end()) {
                visited_.insert(e);
                return;
            }
        }
    }
    virtual E&    operator*() const { return _Super::operator*(); }
    virtual ItrBase<E>* clone() const { return new _Self(*this); }
protected:
    virtual bool equal(const ItrBase<E>& o) const { return _Super::equal(o); }
protected:
    set<E> visited_;
};

// Build the container's range w/ and w/o repetition
Itr<E> begin(Container& c, bool noRepeat = false)
{
    Itr<E> o;
    if (noRepeat) {
        o.itr_ = new ItrNoRepeat(c);
    } else {
        o.itr_ = new ItrAll(c);
    }
    o.itr_->itr_ = c.begin();
    return o;
}     

Itr<E> end(Container& c, bool noRepeat = false)
{
    Itr<E> o;
    if (noRepeat) {
        o.itr_ = new ItrNoRepeat(c);
    } else {
        o.itr_ = new ItrAll(c);
    }
    o.itr_->itr_ = c.end();
    return o;
}

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)