Introduction
Parsing strings is a simple operation, and can be done using the C function strtok
which is a function of the C run time library. It can help in finding string tokens in a fast and simple way like that:
#define DELIMITERS " \r\n\t!@#$%^&*()_+-={}|\\:\"'?�/.,<>������"
char string[] = "A string\tof ,,tokens\nand some more tokens";
char* token = strtok(string, DELIMITERS);
while(token != NULL)
{
token = strtok(NULL, DELIMITERS);
}
Problems
But with this way or this function, you will face many problems like:
- You can't get the delimiter char that delimits this token, as the
strtok
function inserts '0' at token end, so the input string is modified.
- You can't use this function in nested loops as the function
strtok
is using a static variable to hold some passing information, as you can see in the help note:
Note: Each function uses a static variable for parsing the string into tokens. If multiple or simultaneous calls are made to the same function, a high potential for data corruption and inaccurate results exists. Therefore, do not attempt to call the same function simultaneously for different strings, and be aware of calling one of these functions from within a loop where another routine may be called that uses the same function. However, calling this function simultaneously from multiple threads does not have undesirable effects.
- You can't parse strings for sequence of delimiters, like a delimiter that contains many characters, but they should appear in sequence.
Solution
So, I have built the class CStrTok
to solve all of these problems, specially the second problem of the usage of a static
variable; just encapsulate it in a class like this.
class CStrTok
{
public:
CStrTok();
~CStrTok();
public:
LPSTR m_lpszNext;
char m_chDelimiter;
public:
LPSTR GetFirst(LPSTR lpsz, LPCSTR lpcszDelimiters);
LPSTR GetNext(LPCSTR lpcszDelimiters);
};
The variable m_lpszNext
is used to hold the next token to be parsed, and the variable m_chDelimiter
is used to hold the delimiter that was ending the current token, to be returned after the next call of GetNext
, so the class can be used in nested loops without any problems, as you can see:
CStrTok Usage
CStrTok StrTok[3];
StrTok[0].m_bDelimitersInSequence = true;
char* pRow = StrTok[0].GetFirst(pFileBuffer, "\r\n");
while(pRow)
{
char* pCol = StrTok[1].GetFirst(pRow, "\t");
while(pCol)
{
char* pToken = StrTok[2].GetFirst(pCol, " ,;");
while(pToken)
{
pToken = StrTok[2].GetNext(" ,;");
}
pCol = StrTok[1].GetNext("\t");
}
pRow = StrTok[0].GetNext("\r\n");
}
I think you will find it so easy to use.
Source code files
StrTok.cpp, StrTok.h
Thanks to...
I owe a lot to my colleagues for helping me in implementing and testing this code. (JAK)