Introduction
When developing in C++, an impeccable API is a must have: it has to be as simple as possible, abstract, generic, and extensible. One important generic concept that STL made C++ developers familiar with is the concept of iterator.
An iterator is used to visit the elements of a container without exposing how the container is implemented (e.g., a vector, a list, a red-black tree, a hash set, a queue, etc.). Iterators are central to generic programming because they are an interface between containers and applications. Applications need access to the elements of containers, but they usually do not need to know how elements are stored in containers. Iterators make it possible to write generic algorithms that operate on different kinds of containers.
For example, the following code snippet exposes the nature of the container - a vector.
void process(const std::vector<E>& v)
{
for (unsigned i = 0; i < v.size(); ++i) {
process(v[i]);
}
}
If we want to have the same function operating on a list, we have to write a separate function. Or, if we later decide that a list or a hash set is more appropriate as a container, we need to rewrite the code everywhere we access the vector. This may require a lot of changes in many files. Contrast this container-specific visitation scheme to the following:
template <typename Container>
void process(const Container& c)
{
typename Container::const_iterator itr = c.begin();
typename Container::const_iterator end = c.end();
for (; itr != end; ++itr) {
process(*itr);
}
}
Using the notion of iterator, we have a generic processing of a container 'c
', whether it is a vector, a list, a hash set, or any data structure that provides iterators in its API. Even better, we can write a generic process function that only takes an iterator range, without assuming that the container has a begin()
and an end()
method:
template <typename Iterator>
void process(Iterator begin, Iterator end)
{
for (; itr != end; ++itr) {
process(*itr);
}
}
An STL iterator is a commodity that behaves as a scalar type:
- It can be allocated on the heap
- It can be copied
- It can be passed by value
- It can be assigned to
The essence of an iterator is captured by the following API:
template <typename T>
class Itr {
public:
Itr();
~Itr();
Itr(const Itr& o); Itr& operator=(const Itr& o); Itr& operator++(); T& operator*(); bool operator==(const Itr& o) const; bool operator!=(const Itr& o) const { return !(*this == o); }
}
Usually, the container will provide a begin()
and an end()
method, which build the iterators that denote the container's range. Writing these begin/end methods is an easy task if the container is derived from an STL container, if the container has a data member that is an STL container, or if the iterator is a scalar type, like a pointer or an index.
It is more complicated if we want iterators that dereference to the same type of object, but that must visit several containers, possibly of different types, or iterators that visit containers in different manners. For instance, let us assume that we have objects with some property (say, a color) stored in several containers, some of them of different types. We would like to visit all the objects, independently of the number of containers and their type, or we would like to visit objects of a given color, or we would like to visit objects that satisfy some predicate:
Itr<E> begin(); Itr<E> end();
Itr<E> begin(const Color& color); Itr<E> end(const Coir& color);
class Predicate {
public:
bool operator()(const E& e);
};
Itr<E> begin(Predicate& p); Itr<E> end(Predicate& p);
In this case, the iterator is more complex than a scalar type like a pointer or an index: it needs to keep track of which container it is currently visiting, or which color or predicate it needs to check. In general, the iterator may have data members so that it can fulfill its task. Also, we want to factorize the code and reuse general purpose iterator methods when writing more targeted iterators, e.g., visiting elements of a specific color should make use of the next-element method Itr<E>::operator++()
. This can be done by having Itr<E>
be a virtual class, and having derived classes to implement the different iterators. For example:
class E {
public:
Color& color() const;
};
template <typename E>
class ColoredItr<E> : public Itr<E> {
private:
typedef Itr<E> _Super;
public:
ColoredItr<E>(const Color& color) : Itr<E>(), color_(color) {}
virtual ~ColoredItr<E>;
virtual ColoredItr<E>& Operator++() {
for (; _Super::operator*().color() != color_; _Super::operator++());
return *this;
}
private:
Color color_;
};
We would like a generic iterator that meets all the requirements described above:
- It can be allocated on the heap
- It can be copied
- It can be passed by value
- It can be assigned to
- It dereferences to the same type
- It can visit several containers
- It can visit containers of different types
- It can visit containers in arbitrary manners
This can be implemented as follows:
template<typename E>
class ItrBase {
public:
ItrBase() {}
virtual ~ItrBase() {}
virtual void operator++() {}
virtual E& operator*() const { return E(); }
virtual ItrBase* clone() const { return new ItrBase(*this); }
bool operator==(const ItrBase& o) const {
return typeid(*this) == typeid(o) && equal(o);
}
protected:
virtual bool equal(const ItrBase& o) const { return true; }
};
template<typename E>
class Itr {
public:
Itr() : itr_(0) {}
~Itr() { delete itr_; }
Itr(const Itr& o) : itr_(o.itr_->clone()) {}
Itr& operator=(const Itr& o) {
delete itr_; itr_ = o.itr_->clone(); return *this;
}
Itr& operator++() { ++(*itr_); return *this; }
E& operator*() const { return *(*itr_); }
bool operator==(const Itr& o) const {
return (itr_ == o.itr_) || (*itr_ == *o.itr_);
}
bool operator!=(const Itr& o) const { return !(*this == o); }
protected:
ItrBase<E>* itr_;
};
The ItrBase
class is the top class of the hierarchy. Itr
is simply a wrapper on a pointer to an ItrBase
, so that it can be allocated on the heap - the actual implementation of the class deriving from ItrBase
can have an arbitrary size. Note how the Itr
copy and assignment operators are implemented via the ItrBase::clone()
method so that Itr
behaves as a scalar type. Last but not least, the (non-virtual) ItrBase::operator==
equality operator first checks for type equality before calling the (virtual) equality method equal on the virtual subclass. The reason ItrBase
is not a pure virtual is that it can conveniently be used to denote an empty range, i.e., the range (ItrBase(), ItrBase())
is empty.
Iterators on containers of elements of type E
just need to derive from ItrBase<E>
, and a factory providing the begin()
and end()
methods for any specialized iterator returns object of type Itr<E>
.
For example, let us assume that we have a container c
of E
s, and that we want an iterator to visit:
- all the elements of
c
, possibly with repetition; - all the elements of
c
without repetition.
This can be done as follows:
class E; class ItrAll : public ItrBase<E> {
private:
typedef ItrAll _Self;
typedef ItrBase<E> _Super;
public:
ItrAll(Container& c) : _Super(), c_(c) {}
virtual ~ItrAll() {}
virtual void operator++() { ++itr_; }
virtual E& operator*() const { return *itr_; }
virtual ItrBase<E>* clone() const { return new _Self(*this); }
protected:
virtual bool equal(const ItrBase<E>& o) const {
const _Self& o2 = static_cast<const _Self&>(o);
return &c_ == &o2.c_ && itr_ == o2.itr_;
}
protected:
Container& c_;
Container::iterator itr_;
};
class ItrNoRepeat : public ItrAll {
private:
typedef ItrNoRepeat _Self;
typedef ItrAll _Super;
public:
ItrNoRepeat (Container& c) : _Super(c) {}
virtual ~ItrNoRepeat () {}
virtual void operator++() {
_Super::operator++(); for (; itr_ != c_.end(); _Super::operator++()) {
E& e = _Super::operator*();
if (visited_.find(e) == visited_.end()) {
visited_.insert(e);
return;
}
}
}
virtual E& operator*() const { return _Super::operator*(); }
virtual ItrBase<E>* clone() const { return new _Self(*this); }
protected:
virtual bool equal(const ItrBase<E>& o) const { return _Super::equal(o); }
protected:
set<E> visited_;
};
Itr<E> begin(Container& c, bool noRepeat = false)
{
Itr<E> o;
if (noRepeat) {
o.itr_ = new ItrNoRepeat(c);
} else {
o.itr_ = new ItrAll(c);
}
o.itr_->itr_ = c.begin();
return o;
}
Itr<E> end(Container& c, bool noRepeat = false)
{
Itr<E> o;
if (noRepeat) {
o.itr_ = new ItrNoRepeat(c);
} else {
o.itr_ = new ItrAll(c);
}
o.itr_->itr_ = c.end();
return o;
}