Introduction
Regular expression syntax is fairly similar across many environments. However, the way you use regular expressions varies greatly. For example, once you've crafted your regular expression, how do you use it to find a match or replace text? It's easy to find detailed API documentation, once you know what API to look up. Figuring out where to start is often the hardest part.
This article assumes you're familiar with regular expressions and want to work with regular expressions in C++ using the Technical Report 1 (TR1) proposed extensions to the C++ Standard Library. It's a quick start guide, briefly answering some of the first questions you're likely to ask. For more details, see Getting started with C++ TR1 regular expressions or dive into the documentation that comes with your implementation.
Quick Start Questions
Q: Where Can I Get TR1?
A: Support for TR1 extensions in Visual Studio 2008 is added as a feature pack. Other implementations include the Boost and Dinkumware. The GNU compiler gcc
added support for TR1 regular expressions in version 4.3.0.
Q: What Regular Expression Flavors are Supported?
A: It depends on your implementation. Visual Studio 2008 supports these options: basic
, extended
, ECMAScript
, awk
, grep
, egrep
.
Q: What Header Do I Include?
A: <regex>
Q: What Namespace are Things In?
A: std::tr1
This is the namespace for the regex
class and functions such as regex_search
. Flags are contained in the nested namespace std::tr1::regex_constants
.
Q: How Do I Do a Match?
A: Construct a regex
object and pass it to regex_search
.
For example:
std::string str = "Hello world";
std::tr1::regex rx("ello");
assert( regex_search(str.begin(), str.end(), rx) );
The function regex_search
returns true
because str
contains the pattern ello
. Note that regex_match
would return false
in the example above because it tests whether the entire string
matches the regular expression. regex_search
behaves more like most people expect when testing for a match.
Q: How Do I Retrieve a Match?
A: Use a form of regex_search
that takes a match_result
object as a parameter.
For example, the following code searches for <h>
tags and prints the level and tag contents.
std::tr1::cmatch res;
str = "<h2>Egg prices</h2>";
std::tr1::regex rx("<h(.)>([^<]+)");
std::tr1::regex_search(str.c_str(), res, rx);
std::cout << res[1] << ". " << res[2] << "\n";
This code would print 2. Egg prices
. The example uses cmatch
, a typedef
provided by the library for match_results<const char* cmatch>
.
Q: How Do I Do a Replace?
A: Use regex_replace
.
The following code will replace “world
” in the string “Hello world
” with “planet
”. The string str2
will contain “Hello planet
” and the string str
will remain unchanged.
std::string str = "Hello world";
std::tr1::regex rx("world");
std::string replacement = "planet";
std::string str2 = std::tr1::regex_replace(str, rx, replacement);
Note that regex_replace
does not change its arguments, unlike the Perl command s/world/planet/
. Note also that the third argument to regex_replace
must be a string
class and not a string
literal.
Q: How Do I Do a Global Replace?
A: The function regex_replace
does global replacements by default.
Q: How Do I Keep From Doing a Global Replace?
A: Use the format_first_only
flag with regex_replace
.
The fully qualified name for the flag is std::tr1::regex_constants::format_first_only
and would be the fourth argument to regex_replace
.
Q: How Do I Make a Regular Expression Case-insensitive?
A: Use the icase
flag as a parameter to the regex
constructor.
The fully qualified name of the flag is std::tr1::regex_constants::icase
.
History
- 22nd May, 2008: Initial post
- 23rd May, 2008: Added examples