Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Using Regular Expressions in C# .NET

0.00/5 (No votes)
13 Jul 2010 12  
Explains regular expressions, and some of the common ways to use them.

Abstract

Developers may need to compare and search strings in their code. Regular expressions are an older but valuable technique at the Developer’s disposal. Although the .NET Framework has in-built methods to search and sort strings, regular expressions should still be understood to ensure that code development is efficient. In this article, Anupam Banerji explains regular expressions, and some of the common ways to use them.

Introduction

The older computer languages were programmed with severe hardware restrictions. Memory usage was a key factor between competing algorithms. The algorithm with the smallest memory requirement would then be considered optimal, even if it were slower. PERL developers therefore created regular expressions to compare, search and modify strings. Regular expressions are a powerful, albeit complex method of manipulating large quantities of strings.

The memory usage problem has slowly disappeared. Memory these days is cheap, abundant and increases in capacity almost exponentially every few years. The efficiency of the algorithm is more important today than the amount of memory it consumes. Regular expressions are therefore still a valuable (although complex) method of writing good code.

Creating Regular Expressions

Regular expressions are an efficient way to process text. The following regular expression looks complicated to a beginner:

^\w+$ 

The PERL developer would smile. All this regular expression does is return the exact same word entered that the expression is compared to. The symbols look very difficult to understand, and are. The ^ symbol refers to the start of the string. The $ refers to the end of the string. The \w refers to the a whole word with the characters A-Z, a-z, 0-9 and underscore. The + is simply 0 or more repetitions. The regular expression would match:

test
testtest
test1
1test

Using Regular Expressions in C# .NET

The System.Text.RegularExpressions namespace contains the Regex class used to form and evaluate regular expressions. The Regex class contains static methods used to compare regular expressions against strings. The Regex class uses the IsMatch() static method to compare a string with a regular expression.

bool match = Regex.IsMatch
(string input, string pattern);

If writing C# code, the example above would be:

if (Regex.IsMatch("testtest", @"^\w+$"))
{
     // Do something here
}

Another useful static method is Match(), which returns a Match object with all matches in the input string. This is useful when more than one match exists in the input text. The following code results in more than one match:

string text = "first second";
string reg = @"^([\w]+) ([\w]+)$";

Match m = Regex.Match(text, reg, RegexOptions.CultureInvariant);

foreach (Group g in m.Groups)
{
    Console.WriteLine(g.Value);
}

The expression groups are entered in parentheses. The example above returns three groups; the entire text as the first match, the first word, and the second word. Expression groups are useful when text needs to broken down and grouped into several pieces of related text for storage or further manipulation.

A Quick Example

In this example, we validate an email address using regular expressions. My regular expression works:

^((([\w]+\.[\w]+)+)|([\w]+))@(([\w]+\.)+)([A-Za-z]{1,3})$

However, this isn’t the only expression used to validate email addresses. There are at least two other ways that I have come across. There are many more.

We write a small C# console application that takes some text as an input, and determines if the text is an email address.

using System.Text;
using System.Text.RegularExpressions;

string text = Console.ReadLine();
string reg = @"^((([\w]+\.[\w]+)+)|([\w]+))@(([\w]+\.)+)([A-Za-z]{1,3})$";

if (Regex.IsMatch(text, reg))
{
     Console.WriteLine("Email."); 
}
else
{
     Console.WriteLine("Not email."); 
}

Try this with a few real and fake email addresses and see if it works. Let me know if you find an error.

Documentation

Regular expressions are developed differently. The same task can be accomplished using many different expressions. Expressions created by a developer may be undecipherable by another.

This is why documenting regular expressions is a very important part of the development process. The expression code comments often span several lines, and is worth the effort in case your expression has unintended effects, or if another developer takes over your code. Enforcing good documentation standards for regular expressions will ensure that maintenance issues are minimal.

For example, if we document the regular expression for validating email addresses above, we would write comments like these:

// Validating email addresses
//
// @"^((([\w]+\.[\w]+)+)|([\w]+))@(([\w]+\.)+)([A-Za-z]{1,3})$"
//
// The expression has three expression 
// groups.
//
// 1. ((([\w]+\.[\w]+)+)|([\w]+))
//
// The LHS of the or clause states
// that there may be more than one
// sequence of two words with a .
// between them.
//
// The RHS of the or clause states
// that there may be a single word.
//
// 2. (([\w]+\.)+)
//
// This expression states that there
// may be as many
// words separated by a . between them
// as necessary.
//
// 3. ([A-Za-z]{1,3})
//
// This expression states that the
// last set of characters may be upper
// or lowercase letters. There must be
// a minimum of 1 and a maximum of 3.

This may be considered a long set of comments for a lot of development standards, but the expression has been broken down into expression groups. A new developer has very little difficulty in understanding the function and motivation behind writing the expression. This practice should be consistently enforced to avoid headaches when upgrading or debugging software.

Useful Regex Software

If you’ve used a shell script in *NIX, then you’ve used grep. Windows has the PowerGrep tool, which is similar to grep. PowerShell is a another tool which is built on the .NET Regular Expression engine, and has command line scripting utilities. Espresso by UltraPico (www.ultrapico.com) is a free Regular Expression Editor which you can use to build and test your regular expressions.

Conslusion

Regular expressions are an efficient way to search, identify and validate large quantities of text without having to write any comparisons. Although they may be complicated, writing and documenting regular expressions allows the developer to concentrate on more important parts of the implementation process. The use of several free and open source regular expression tools makes understanding and building regular expressions a worthwhile task.

To download this technical article in PDF format, go to the Coactum Solutions website at http://www.coactumsolutions.com/Articles.aspx.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here