Email ID Validation

Syed BASHAR

4.51/5 (28 votes)

8 May 2006CPOL2 min read

2.4K

Email ID validation using regular expressions (Finite Automata example).

Download source files - 6.85 Kb

Introduction

Regular Expressions are very much useful for validation checking. It's not a new technology; it originated in the UNIX environment, and is commonly used with the Perl language. Regular expressions are, however, supported by a number of .NET classes in the namespace System.Text.RegularExpressions.

Its rules are same as the finite automata. Information regarding the main special characters or escape sequences that you can use are available in the MSDN.

Regular Expressions for Email Checking

Basic things to be understood in RegEx are:

“*” matches 0 or more patterns.
“?” matches single character.
“^” for ignoring matches.
“[]” for searching range patterns.

The rules for validating email IDs, and some valid and invalid examples are mentioned here:

Email addresses must be start with a letter symbol. And any number of letters or digits or underscore (_) can be appended, and only a single dot (.) is allowed but other symbols and white spaces are not allowed.
The name field of the address must end with either a letter or digit.
If underscore or dote is used then before it, letters or digits must be used for a valid name.
Dot can be used only once but underscore can be used multiple times.

Some examples:

miltoncse00@yahoo.com            valid
2milton00@yahoo.com              invalid
milton cse00@yahoo.com           invalid(white space)
milton_cse@yahoo.com             valid
milton_cse_00@yahoo.com          valid
milton_cse_00_@yahoo.com         invalid(_ before @)
milton.case_00_00@yahoo.com      valid

milton.cas.e_00_00@yahoo.com     invalid(double dote)
milton.cas.e_00_00@yahoo.co.in   valid
milton.cas.e_00_00.@yahoo.co.in  valid(dote before @)

miltoncse00@yahoo.com      
miltoncse00                      name portion

According to these rules and valid examples, we can draw a state diagram for valid name checking of email addresses:

Fig: state diagram for the naming portion

From the state diagram, the regular expression for the naming part is:

[a-z][a-z|0-9|]*([_][a-z|0-9]+)*([.][a-z|0-9]+([_][a-z|0-9]+)*)?

The rules for the email name portion (before @) can start with a letter. And any number of letters or digits can be appended and other symbols are not allowed.

So the regular expression for that part is:

[a-z][a-z|0-9|]*

After the dot (.) portion like (.com/.net), it can start with a letter and any number of letters or digits can be appended. If another dot portion is allowed then that can follow the same rule.

So the regular expression for that part is:

([a-z][a-z|0-9]*(\.[a-z][a-z|0-9]*)?)

Combining all these regular expression, the regular expression for email checking that satisfies the Yahoo! email rules will be:

^[a-z][a-z|0-9|]*([_][a-z|0-9]+)*([.][a-z|0-9]+([_][a-z|0-9]+)*)?
@[a-z][a-z|0-9|]*
\.([a-z][a-z|0-9]*(\.[a-z][a-z|0-9]*)?)$

The C# code that can find that matching is very simple, as illustrated bellow:

string pattern=@"^[a-z][a-z|0-9|]*([_][a-z|0-9]+)*([.][a-z|" + 
               @"0-9]+([_][a-z|0-9]+)*)?@[a-z][a-z|0-9|]*\.([a-z]" + 
               @"[a-z|0-9]*(\.[a-z][a-z|0-9]*)?)$";
System.Text.RegularExpressions.Match match = 
    Regex.Match(txtEmail.Text.Trim(), pattern, RegexOptions.IgnoreCase);

if(match.Success)
    MessageBox.Show("Success");
else
    MessageBox.Show("Fail");

So, we conclude that any validation problems that involve recursion, option, limitation is easier to solve with regular expressions than using other ways (like if-elseif-else, while condition). This can be represented in a state diagram that is very much easier and efficient to express and use.

My next article will be on auto ID generation for any table using stored procedures.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)