Contents
- So, what’s the agenda?
- Just in case if you are new comer, what is regex?
- 3 important regex commands
- Check if the user has entered shivkoirala?
- Let’s start with the first validation, enter character
which exists between a-g? - Enter characters between [a-g] with length of 3?
- Enter characters between [a-g] with maximum 3
characters and minimum 1 character? - How can I validate data with 8 digit fix numeric format like 91230456, 01237648 etc?
- How to validate numeric data with minimum length of 3 and maximum of 7, ex -123, 1274667, 87654?
- Validate invoice numbers which have formats like LJI1020, the first 3 characters are alphabets and remaining is 8 length number?
- Check for format INV190203 or inv820830, with first 3 characters alphabets case insensitive and remaining 8 length numeric?
- Can we see a simple validation for website URL’s?
- Let’s see if your BCD works for email validation?
- Validate numbers are between 0 to 25
- Validate a date with MM/DD/YYYY, YYYY/MM/DD and DD/MM/YYYY
- Short cuts
- Quick references for regex
- Regex time outfeature in .NET 4.5
Also view video on regex by clicking below image.
Regex has been the most popular and easiest way of writing validations. The only big problem with regex has been the cryptic syntax. Developers who are working on projects with complicated validation always refer some kind of cheat sheet to remember the syntaxes and commands.
In this article we will try to understand what regex is and how to remember those cryptic syntaxes easily.
FYI :- This article will use c# language to demonstrate regex implementation , so in case you are using other languages , syntaxes can change accordingly.
You can watch my .NET interview questions and answers videos on various sections like WCF, SilverLight, LINQ, WPF, Design patterns, Entity framework etc.
Regex or regular expression helps us describe complex patterns in texts. Once you have described these patterns you can use them to do searching, replacing, extracting and modifying text data.
Below is a simple sample of regex. The first step is to import the namespace for regular expressions.
using System.Text.RegularExpressions;
The next thing is to create the regex object with the pattern. The below pattern specifies to search for alphabets between a-z with 10 length.
Regex obj = new Regex("[a-z]{10}");
Finally search the pattern over the data to see if there are matches. In case the pattern is matching the ‘IsMatch’ will return true.
MessageBox.Show(obj.IsMatch("shivkoirala").ToString());
The best way to remember regex syntax is by remembering three things Bracket, caret and Dollars.
B | There are 3 types of brackets used in regular expression
Square brackets “[“and Curly “{“ brackets.
Square brackets specify the character which needs to be matched while curly brackets specify how many characters. “(“ for grouping.
We will understand the same as we move ahead in this article.
|
C | caret “^” marks the start of a pattern.^ may appear at the beginning of a pattern to require the match to occur at the very beginning of a line. For example, ^xyz matches xyz123 but not 123xyz. |
D | Dollar “$” marks the end of a pattern.$ may appear at the end of a pattern to require the match to occur at the very end of a line. For example, pqr$ matches 123pqr but not pqr123. |
Caret (^) and dollar sign ($) indicate the pattern to the beginning or end of the string being searched.The two anchors may be combined. For example, ^pqr$ matches only pqr. Any characters after or before it will make the pattern invalid.
Now once you know the above three syntaxes you are ready to write any validation in the world. For instance the below validation shows how the above three entities fit together.
- The above regex pattern will only take characters which lies between ‘a’ to ‘z’. The same is marked with square bracket to define the range.
- The curly bracket's indicates the minimum and maximum length.
- Finally caret sign at the start of regex pattern and dollar at the end of regex pattern specifies the start and end of the pattern to make the validation more rigid.
So now using the above 3 commands let’s implement some regex validation.
shivkoirala
[a-g]
[a-g]{3}
[a-g]{1,3}
^[0-9]{8}$
We need to just tweak the first validation with adding a comma and defining the minimum and maximum length inside curly brackets.
^[0-9]{3,7}$
First 3 character validation |
^[a-z]{3}
|
8 length number validation |
[0-9]{8}
|
Now butting the whole thing together.
^[a-z]{3}[0-9]{7}$
In the previous question the regex validator will only validate first 3 characters of the invoice number if it is in small letters. If you put capital letters it will show as invalid. To ensure that the first 3 letters are case insensitive we need to use ^[a-zA-Z]{3} for character validation.
Below is how the complete regex validation looks like.
^[a-zA-Z]{3}[0-9]{7}$
Steps | Regex |
Step 1 :- Check is www exist |
^www.
|
Step 2 :-The domain name should be atleast 1 character and maximum character will be 15. |
. [a-z]{1,15}
|
Step 3 :-Finally should end with .com or .org |
. (com|org)$
|
^www[.][a-z]{1,15}[.](com|org)$
Steps | Regex |
Step 1 :- Email can start with alphanumeric with minimum 1 character and maximum 10 character. , followed by at the rate (@) |
^[a-zA-Z0-9]{1,10}@
|
Step 2 :-The domain name after the @ can be alphanumeric with minimum 1 character and maximum 10 character , followed by a “.” |
[a-zA-Z]{1,10}.
|
Step 3 :-Finally should end with .com or .org |
.(com|org)$
|
^[a-zA-Z0-9]{1,10}@[a-zA-Z]{1,10}.(com|org)$
^(([0-9])|([0-1][0-9])|([0-2][0-5]))$
Steps
| Regex
| Description
|
Let check for DD. First DD has a range of 1-29 ( feb) , 1-30 (small months) , 1-31 (long month) .
So for DD 1-9 or 01-09
| [1-9]|0[1-9]
| This allow user to enter value between 1 to 9 or 01 to 09.
|
Now also adding DD check of 10 to 19
| [1-9]|1[0-9]
| This allows user to enter the value between 01 to 19.
|
Now adding to above DD check of 20 to 29
| [1-9]|1[0-9]|2[0-9]
| This allows user to enter the value between 01 to 29.
|
Now adding to above DD check of 30 to 31
| [1-9]|1[0-9]|2[0-9]|3[0-1]
| Finally user can enter value between 01 to 31.
|
Now for seperator it can be a / , -
| [/ . -]
| This allows user to seperate date by defining seperator.
|
Now same applying for MM
| [1-9]|0[1-9]|1[0-2]
| This allow user to enter month value between 01 to 12.
|
Then for a YY
| 1[9][0-9][0-9]|2[0][0-9][0-9]
| allow user enter the year value between 1900 to 2099.
|
^([1-9]|0[1-9]|1[0-9]|2[0-9]|3[0-1])[- / .]([1-9]|0[1-9]|1[0-2])[- / .](1[9][0-9][0-9]|2[0][0-9][0-9])$ for "DD/MM/YYYY"
To get MM/DD/YYYY use the following regex pattern.
^([1-9]|0[1-9]|1[0-2])[- / .]([1-9]|0[1-9]|1[0-9]|2[0-9]|3[0-1])[- / .](1[9][0-9][0-9]|2[0][0-9][0-9])$
And finally to get YYYY/MM/DD use the following regex pattern.
^(1[9][0-9][0-9]|2[0][0-9][0-9])[- / .]([1-9]|0[1-9]|1[0-2])[- / .]([1-9]|0[1-9]|1[0-9]|2[0-9]|3[0-1])$
You can also use the below common shortcut commands to shorten your regex validation.
Actual commands | Shortcuts |
[0-9] | \d |
[a-z][0-9][_] | \w |
O or more occurrences | * |
1 or more occurrences | + |
0 or 1 occurrence | ? |
Great concise cheat sheet http://www.dijksterhuis.org/csharp-regular-expression-operator-cheat-sheet/
The time taken to evaluate a regex expression is directly proportional to the complexity and the number of characters of the regular expression. In simple words if you have many characters it would take more time for parsing.
This typical behavior of regex parsing can be exploited by hackers to hang your site forever by making a DOS attack. Below is a nice facebook .NET regex video which demonstrates how regex DOS attack looks like.
To overcome this problem in .NET 4.5 they have introduced regex timeout feature, you can read about the same from here
By putting a timeout you can overcome the regex DOS attack.
For further reading do watch the below interview preparation videos and step by step video series.