Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

String Tokenizer Iterator Class

0.00/5 (No votes)
26 Jun 2002 1  
A string tokenizer iterator class that works with std::string

Introduction

As a part of a larger project I had to write some basic string utility functions and classes. One of the things needed was a flexible way of splitting strings into separate tokens.

As is often the case when it comes to programming, there are different ways to handle a problem like this. After reviewing my options I decided that an iterator based solution would be flexible enough for my needs.

Non-iterator based solutions to this particular problem often have the disadvantage of tying the user to a certain container type. With an iterator based tokenizer the programmer is free to chose any type of container (or no container at all). Many STL containers such as std::list and std::vector offer constructors that can populate the container from a set of iterators. This feature makes it very easy to use the tokenizer.

Example usage

    
std::vector<std::string> s(string_token_iterator("one two three"),
                             string_token_iterator());
std::copy(s.begin(),
          s.end(),
          std::ostream_iterator<std::string>(std::cout,"\n"));
// output:

// one

// two

// three


std::copy(string_token_iterator("one,two..,..three",",."),
          string_token_iterator(),
          std::ostream_iterator<std::string>(std::cout,"\n"));
// same output as above

The code has been tested with Visual C++.NET and GCC 3.

The Code

#include <string>

#include <iterator>


struct string_token_iterator 
  : public std::iterator<std::input_iterator_tag, std::string>
{
public:
  string_token_iterator() : str(0), start(0), end(0) {}
  string_token_iterator(const std::string & str_, const char * separator_ = " ") :
    separator(separator_),
    str(&str_),
    end(0)
  {
    find_next();
  }
  string_token_iterator(const string_token_iterator & rhs) :
    separator(rhs.separator),
    str(rhs.str),
    start(rhs.start),
    end(rhs.end)
  {
  }

  string_token_iterator & operator++()
  {
    find_next();
    return *this;
  }

  string_token_iterator operator++(int)
  {
    string_token_iterator temp(*this);
    ++(*this);
    return temp;
  }

  std::string operator*() const
  {
    return std::string(*str, start, end - start);
  }

  bool operator==(const string_token_iterator & rhs) const
  {
    return (rhs.str == str && rhs.start == start && rhs.end == end);
  }

  bool operator!=(const string_token_iterator & rhs) const
  {
    return !(rhs == *this);
  }

private:

  void find_next(void)
  {
    start = str->find_first_not_of(separator, end);
    if(start == std::string::npos)
    {
      start = end = 0;
      str = 0;
      return;
    }

    end = str->find_first_of(separator, start);
  }

  const char * separator;
  const std::string * str;
  std::string::size_type start;
  std::string::size_type end;
};

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here