Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / desktop / ATL

A Regular Expression Wrapper Using ATL in C++

4.50/5 (4 votes)
10 Dec 2008CPOL1 min read 34.9K   705  
An article on a easy use regular expression wrapper.

Introduction

Regular expression is widely used in data parsing, and analyzing. For example, regular expression can be used to parse all the links from a web page.

There are many regular expression libraries in C++. What I used is CAtlRegExp provided by ATL with Microsoft Visual Studio 2005.

Attention: CAtlRegExp is defined in atlrx.h which is only included in Visual Studio 2005. However, you can also use it in Visual Studio 2008 by copying atlrx.h to C:\Program Files\Microsoft Visual Studio 9.0\VC\atlmfc\include\ or the project folder.

Background

STL vector is used as the output because it is easy to use and fast to this situation. You may take a look at STL vector first.

Using the Code

You can get Regular Expression Syntax from CAtlRegExp Class. There is only one function in my code. The function parses the source and pushes the results to a vector.

C++
/*
 * Parameters
 *  [in] regExp: Value of type string which is the input regular expression.
 *  [in] caseSensitive: Value of type bool which indicate whether the parse is case
 *                      sensitive.
 *  [in] groupCount: Value of type int which is the group count of the regular expression.
 *  [in] source: Value of type string reference which is the source to parse.
 *  [out] result: Value of type vecotr of strings which is the output of the parse.
 *  [in] allowDuplicate: Value of type bool which indicates whether duplicate items
 *                       are added to the output result.
 *
 * Return Value
 *  Returns true if the function succeeds, or false otherwise.
 *
 * Remarks
 *  The output result is devided into groups.  User should get the groups according
 *  to the group count.  For example:
 *  1. RegExp = L"{ab}", source = L"abcabe", then result = L"ab", L"ab".
 *  2. RegExp = L"{ab}{cd}", source = L"abcdeabecd", then result = L"ab", L"cd", L"ab",
 *              L"cd".
*/
bool ParseRegExp(const wstring ®Exp,
                 bool caseSensitive,
                 int groupCount,
                 const wstring &source,
                 vector<wstring> &result,
                 bool allowDuplicate = false);

I think the comments have explained the usage clearly so let's go to some examples.

  1. Get product name from string product: Bowling ball; price: $199;
C++
wstring source = L"product: Bowling ball; price: $199; ";
wstring regExp = L"product: {.*?};";
vector<wstring> result;
if (ParseRegExp(regExp, false, 1, source, result)
    && result.size() > 0)
{
    wprintf(L"products name: %s\n", result[0].c_str());
}

Pretty simple, right?

  1. Let's see a complex one. Sometimes, we needs to parse the select in a web page. The HTML code is as follows.
HTML
<select name="imagesize" style="margin:2px 0" onchange="_isr_load(this)">
    <option value="/images?q=test&imgsz=" selected>All image sizes</option>
    <option value="/images?q=test&imgsz=huge" >Extra Large images</option>
    <option value="/images?q=test&imgsz=xxlarge" >Large images</option>
    <option value="/images?q=test&imgsz=small|medium|large|xlarge" >
        Medium images</option>

    <option value="/images?q=test&imgsz=icon" >Small images</option>
</select>

The source code is as follows.

C++
wstring source = ¡­;
wstring regExp = L"<select.*?>{.*?}</select>";
vector<wstring> optionsAllResult;
if (ParseRegExp(regExp, false, 1, source, optionsAllResult, false)
    && optionsAllResult.size() == 1)
{
    regExp = L"<option value=\"{.*?}\".*?>[\r\t\n ]*{.*?}[\r\t\n ]*</option>";
    vector<wstring> optionsResult;
    if (ParseRegExp(regExp, false, 2, optionsAllResult[0], optionsResult)
        && optionsResult.size() > 0
        && optionsResult.size() % 2 == 0)
    {
        for (vector<wstring>::size_type index = 0; index < optionsResult.size(); index += 2)
        {
            wprintf(L"Option: %s\n", optionsResult[index + 1].c_str());
            wprintf(L"Value: %s\n", optionsResult[index].c_str());
            wprintf(L"\n");
        }
    }
}

The output is:
    Option: All image sizes
    Value: /images?q=test&imgsz=

    Option: Extra Large images
    Value: /images?q=test&imgsz=huge

    Option: Large images
    Value: /images?q=test&imgsz=xxlarge

    Option: Medium images
    Value: /images?q=test&imgsz=small|medium|large|xlarge

    Option: Small images
    Value: /images?q=test&imgsz=icon

Points of Interest

I set the warning level of the compiler to Level 4 and true on the option "Treat warning as error." And it really helps me.

History

Initial version.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)