Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Simple string parsing in nested loops

0.00/5 (No votes)
14 Dec 2004 1  
Fast string parsing in nested loops.

Introduction

Parsing strings is a simple operation, and can be done using the C function strtok which is a function of the C run time library. It can help in finding string tokens in a fast and simple way like that:

#define DELIMITERS    " \r\n\t!@#$%^&*()_+-={}|\\:\"'?�/.,<>������"

char string[] = "A string\tof ,,tokens\nand some  more tokens";
char* token = strtok(string, DELIMITERS);
while(token != NULL)
{    // While there are tokens in "string"

    // ...

    // doing some thing with token

    // ...

    // Get next token

    token = strtok(NULL, DELIMITERS);
}

Problems

But with this way or this function, you will face many problems like:

  1. You can't get the delimiter char that delimits this token, as the strtok function inserts '0' at token end, so the input string is modified.
  2. You can't use this function in nested loops as the function strtok is using a static variable to hold some passing information, as you can see in the help note:

    Note: Each function uses a static variable for parsing the string into tokens. If multiple or simultaneous calls are made to the same function, a high potential for data corruption and inaccurate results exists. Therefore, do not attempt to call the same function simultaneously for different strings, and be aware of calling one of these functions from within a loop where another routine may be called that uses the same function. However, calling this function simultaneously from multiple threads does not have undesirable effects.

  3. You can't parse strings for sequence of delimiters, like a delimiter that contains many characters, but they should appear in sequence.

Solution

So, I have built the class CStrTok to solve all of these problems, specially the second problem of the usage of a static variable; just encapsulate it in a class like this.

class CStrTok
{
public:
    CStrTok();
    ~CStrTok();
public:
    LPSTR m_lpszNext;
    char m_chDelimiter;
    // ... some attributes

public:
    LPSTR GetFirst(LPSTR lpsz, LPCSTR lpcszDelimiters);
    LPSTR GetNext(LPCSTR lpcszDelimiters);
    // ... some functions

};

The variable m_lpszNext is used to hold the next token to be parsed, and the variable m_chDelimiter is used to hold the delimiter that was ending the current token, to be returned after the next call of GetNext, so the class can be used in nested loops without any problems, as you can see:

CStrTok Usage

// code to parse tab delimited text files

CStrTok StrTok[3];
StrTok[0].m_bDelimitersInSequence = true; // for "\r\n"

// parse file buffer for rows and columns

char* pRow = StrTok[0].GetFirst(pFileBuffer, "\r\n");
while(pRow)
{
    // parse the row

    char* pCol = StrTok[1].GetFirst(pRow, "\t");
    while(pCol)
    {
        // parse the col

        char* pToken = StrTok[2].GetFirst(pCol, " ,;");
        while(pToken)
        {
            // ... using pToken

            pToken = StrTok[2].GetNext(" ,;");
        }
        // get next column

        pCol = StrTok[1].GetNext("\t");
    }
    // get next row

    pRow = StrTok[0].GetNext("\r\n");
}

I think you will find it so easy to use.

Source code files

StrTok.cpp, StrTok.h

Thanks to...

I owe a lot to my colleagues for helping me in implementing and testing this code. (JAK)

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here