Special digits verification against regular expressions.
Digits in Regular Expression
Developers in most cases use regular expressions to verify number inputs.
There are two ways to match a digit via regular expression:
- [0-9]
matches an Arabic numeral, i.e. 0,1,2,3,4,5,6,7,8,9;
- \d
matches a Unicode number.
In addition to Arabic numerals Unicode contains more than 300 numbers from different cultures:
0: 0,٠,۰,߀,०,০,੦,૦,୦,௦,౦,೦,൦,๐,໐,༠,၀,႐,០,᠐,᥆,᧐,᭐,᮰,᱀,᱐,꘠,꣐,꤀,꩐,0
1: 1,١,۱,߁,१,১,੧,૧,୧,௧,౧,೧,൧,๑,໑,༡,၁,႑,១,᠑,᥇,᧑,᭑,᮱,᱁,᱑,꘡,꣑,꤁,꩑,1
2: 2,٢,۲,߂,२,২,੨,૨,୨,௨,౨,೨,൨,๒,໒,༢,၂,႒,២,᠒,᥈,᧒,᭒,᮲,᱂,᱒,꘢,꣒,꤂,꩒,2
3: 3,٣,۳,߃,३,৩,੩,૩,୩,௩,౩,೩,൩,๓,໓,༣,၃,႓,៣,᠓,᥉,᧓,᭓,᮳,᱃,᱓,꘣,꣓,꤃,꩓,3
4: 4,٤,۴,߄,४,৪,੪,૪,୪,௪,౪,೪,൪,๔,໔,༤,၄,႔,៤,᠔,᥊,᧔,᭔,᮴,᱄,᱔,꘤,꣔,꤄,꩔,4
5: 5,٥,۵,߅,५,৫,੫,૫,୫,௫,౫,೫,൫,๕,໕,༥,၅,႕,៥,᠕,᥋,᧕,᭕,᮵,᱅,᱕,꘥,꣕,꤅,꩕,5
6: 6,٦,۶,߆,६,৬,੬,૬,୬,௬,౬,೬,൬,๖,໖,༦,၆,႖,៦,᠖,᥌,᧖,᭖,᮶,᱆,᱖,꘦,꣖,꤆,꩖,6
7: 7,٧,۷,߇,७,৭,੭,૭,୭,௭,౭,೭,൭,๗,໗,༧,၇,႗,៧,᠗,᥍,᧗,᭗,᮷,᱇,᱗,꘧,꣗,꤇,꩗,7
8: 8,٨,۸,߈,८,৮,੮,૮,୮,௮,౮,೮,൮,๘,໘,༨,၈,႘,៨,᠘,᥎,᧘,᭘,᮸,᱈,᱘,꘨,꣘,꤈,꩘,8
9: 9,٩,۹,߉,९,৯,੯,૯,୯,௯,౯,೯,൯,๙,໙,༩,၉,႙,៩,᠙,᥏,᧙,᭙,᮹,᱉,᱙,꘩,꣙,꤉,꩙,9
Using these Unicode numbers it is possible to test number inputs on correctness.
For example, in most cases it is expected that a phone number will contain only Arabic numbers. It is easy to check by providing special symbols, for example, "١٢٣" instead of "123", or some Indian Unicode numbers: ०
(0), १
(1), २
(2), etc.
Below is an example of a valid Microsoft check for an Azure account recovery phone:
Code to generate special digit symbols
.NET considers[0-9]
and \d
as different expressions, below is the C# script to find all Unicode numbers:
var stringBuilder = new StringBuilder();
var digitRegex = new Regex(@"\d");
var charDigitGroups = Enumerable.Range(Char.MinValue, Char.MaxValue)
.Select(Convert.ToChar)
.Where(ch => digitRegex.IsMatch(ch.ToString()))
.GroupBy(ch => Char.GetNumericValue(ch));
foreach (var charGroup in charDigitGroups)
{
string joinedValues = String.Join(",", charGroup);
string rowResult = String.Concat(charGroup.Key.ToString(), ": ", joinedValues);
stringBuilder.AppendLine(rowResult);
}
Some languages like JavaScript do not support Unicode in regular expressions by default, so there \d
is the same as [0-9].
Nevertheless it is useful to check applications on the Unicode digital input independent on the realization details.
Related Links