Introduction
Regular Expressions are very much useful for validation checking. It's not a new technology; it originated in the UNIX environment, and is commonly used with the Perl language. Regular expressions are, however, supported by a number of .NET classes in the namespace System.Text.RegularExpressions
.
Its rules are same as the finite automata. Information regarding the main special characters or escape sequences that you can use are available in the MSDN.
Regular Expressions for Email Checking
Basic things to be understood in RegEx are:
- “*” matches 0 or more patterns.
- “?” matches single character.
- “^” for ignoring matches.
- “[]” for searching range patterns.
The rules for validating email IDs, and some valid and invalid examples are mentioned here:
- Email addresses must be start with a letter symbol. And any number of letters or digits or underscore (_) can be appended, and only a single dot (.) is allowed but other symbols and white spaces are not allowed.
- The name field of the address must end with either a letter or digit.
- If underscore or dote is used then before it, letters or digits must be used for a valid name.
- Dot can be used only once but underscore can be used multiple times.
Some examples:
miltoncse00@yahoo.com valid
2milton00@yahoo.com invalid
milton cse00@yahoo.com invalid(white space)
milton_cse@yahoo.com valid
milton_cse_00@yahoo.com valid
milton_cse_00_@yahoo.com invalid(_ before @)
milton.case_00_00@yahoo.com valid
milton.cas.e_00_00@yahoo.com invalid(double dote)
milton.cas.e_00_00@yahoo.co.in valid
milton.cas.e_00_00.@yahoo.co.in valid(dote before @)
miltoncse00@yahoo.com
miltoncse00 name portion
According to these rules and valid examples, we can draw a state diagram for valid name checking of email addresses:
Fig: state diagram for the naming portion
From the state diagram, the regular expression for the naming part is:
[a-z][a-z|0-9|]*([_][a-z|0-9]+)*([.][a-z|0-9]+([_][a-z|0-9]+)*)?
The rules for the email name portion (before @) can start with a letter. And any number of letters or digits can be appended and other symbols are not allowed.
So the regular expression for that part is:
[a-z][a-z|0-9|]*
After the dot (.) portion like (.com/.net), it can start with a letter and any number of letters or digits can be appended. If another dot portion is allowed then that can follow the same rule.
So the regular expression for that part is:
([a-z][a-z|0-9]*(\.[a-z][a-z|0-9]*)?)
Combining all these regular expression, the regular expression for email checking that satisfies the Yahoo! email rules will be:
^[a-z][a-z|0-9|]*([_][a-z|0-9]+)*([.][a-z|0-9]+([_][a-z|0-9]+)*)?
@[a-z][a-z|0-9|]*
\.([a-z][a-z|0-9]*(\.[a-z][a-z|0-9]*)?)$
The C# code that can find that matching is very simple, as illustrated bellow:
string pattern=@"^[a-z][a-z|0-9|]*([_][a-z|0-9]+)*([.][a-z|" +
@"0-9]+([_][a-z|0-9]+)*)?@[a-z][a-z|0-9|]*\.([a-z]" +
@"[a-z|0-9]*(\.[a-z][a-z|0-9]*)?)$";
System.Text.RegularExpressions.Match match =
Regex.Match(txtEmail.Text.Trim(), pattern, RegexOptions.IgnoreCase);
if(match.Success)
MessageBox.Show("Success");
else
MessageBox.Show("Fail");
So, we conclude that any validation problems that involve recursion, option, limitation is easier to solve with regular expressions than using other ways (like if
-elseif
-else
, while
condition). This can be represented in a state diagram that is very much easier and efficient to express and use.
My next article will be on auto ID generation for any table using stored procedures.