Introduction
As a part of a larger project I had to write some basic string utility functions and classes. One of the things needed was a flexible way of splitting strings into separate tokens.
As is often the case when it comes to programming, there are different ways to handle a problem like this. After reviewing my options I decided that an iterator based solution would be flexible enough for my needs.
Non-iterator based solutions to this particular problem often have the disadvantage of tying the user to a certain container type. With an iterator based tokenizer the programmer is free to chose any type of container (or no container at all). Many STL containers such as std::list
and std::vector
offer constructors that can populate the container from a set of iterators. This feature makes it very easy to use the tokenizer.
Example usage
std::vector<std::string> s(string_token_iterator("one two three"),
string_token_iterator());
std::copy(s.begin(),
s.end(),
std::ostream_iterator<std::string>(std::cout,"\n"));
std::copy(string_token_iterator("one,two..,..three",",."),
string_token_iterator(),
std::ostream_iterator<std::string>(std::cout,"\n"));
The code has been tested with Visual C++.NET and GCC 3.
The Code
#include <string>
#include <iterator>
struct string_token_iterator
: public std::iterator<std::input_iterator_tag, std::string>
{
public:
string_token_iterator() : str(0), start(0), end(0) {}
string_token_iterator(const std::string & str_, const char * separator_ = " ") :
separator(separator_),
str(&str_),
end(0)
{
find_next();
}
string_token_iterator(const string_token_iterator & rhs) :
separator(rhs.separator),
str(rhs.str),
start(rhs.start),
end(rhs.end)
{
}
string_token_iterator & operator++()
{
find_next();
return *this;
}
string_token_iterator operator++(int)
{
string_token_iterator temp(*this);
++(*this);
return temp;
}
std::string operator*() const
{
return std::string(*str, start, end - start);
}
bool operator==(const string_token_iterator & rhs) const
{
return (rhs.str == str && rhs.start == start && rhs.end == end);
}
bool operator!=(const string_token_iterator & rhs) const
{
return !(rhs == *this);
}
private:
void find_next(void)
{
start = str->find_first_not_of(separator, end);
if(start == std::string::npos)
{
start = end = 0;
str = 0;
return;
}
end = str->find_first_of(separator, start);
}
const char * separator;
const std::string * str;
std::string::size_type start;
std::string::size_type end;
};