Introduction
Regular expression is widely used in data parsing, and analyzing. For example, regular expression can be used to parse all the links from a web page.
There are many regular expression libraries in C++. What I used is CAtlRegExp provided by ATL with Microsoft Visual Studio 2005.
Attention: CAtlRegExp is defined in atlrx.h which is only included in Visual Studio 2005. However, you can also use it in Visual Studio 2008 by copying atlrx.h to C:\Program Files\Microsoft Visual Studio 9.0\VC\atlmfc\include\ or the project folder.
Background
STL vector is used as the output because it is easy to use and fast to this situation. You may take a look at STL vector first.
Using the Code
You can get Regular Expression Syntax from CAtlRegExp Class. There is only one function in my code. The function parses the source and pushes the results to a vector.
bool ParseRegExp(const wstring ®Exp,
bool caseSensitive,
int groupCount,
const wstring &source,
vector<wstring> &result,
bool allowDuplicate = false);
I think the comments have explained the usage clearly so let's go to some examples.
- Get product name from string product: Bowling ball; price: $199;
wstring source = L"product: Bowling ball; price: $199; ";
wstring regExp = L"product: {.*?};";
vector<wstring> result;
if (ParseRegExp(regExp, false, 1, source, result)
&& result.size() > 0)
{
wprintf(L"products name: %s\n", result[0].c_str());
}
Pretty simple, right?
- Let's see a complex one. Sometimes, we needs to parse the select in a web page. The HTML code is as follows.
<select name="imagesize" style="margin:2px 0" onchange="_isr_load(this)">
<option value="/images?q=test&imgsz=" selected>All image sizes</option>
<option value="/images?q=test&imgsz=huge" >Extra Large images</option>
<option value="/images?q=test&imgsz=xxlarge" >Large images</option>
<option value="/images?q=test&imgsz=small|medium|large|xlarge" >
Medium images</option>
<option value="/images?q=test&imgsz=icon" >Small images</option>
</select>
The source code is as follows.
wstring source = ¡;
wstring regExp = L"<select.*?>{.*?}</select>";
vector<wstring> optionsAllResult;
if (ParseRegExp(regExp, false, 1, source, optionsAllResult, false)
&& optionsAllResult.size() == 1)
{
regExp = L"<option value=\"{.*?}\".*?>[\r\t\n ]*{.*?}[\r\t\n ]*</option>";
vector<wstring> optionsResult;
if (ParseRegExp(regExp, false, 2, optionsAllResult[0], optionsResult)
&& optionsResult.size() > 0
&& optionsResult.size() % 2 == 0)
{
for (vector<wstring>::size_type index = 0; index < optionsResult.size(); index += 2)
{
wprintf(L"Option: %s\n", optionsResult[index + 1].c_str());
wprintf(L"Value: %s\n", optionsResult[index].c_str());
wprintf(L"\n");
}
}
}
The output is:
Option: All image sizes
Value: /images?q=test&imgsz=
Option: Extra Large images
Value: /images?q=test&imgsz=huge
Option: Large images
Value: /images?q=test&imgsz=xxlarge
Option: Medium images
Value: /images?q=test&imgsz=small|medium|large|xlarge
Option: Small images
Value: /images?q=test&imgsz=icon
Points of Interest
I set the warning level of the compiler to Level 4 and true on the option "Treat warning as error." And it really helps me.
History
Initial version.