Validating email addresses is difficult, because the specification what constitutes a valid email address allows for an immense variety. Trying to translate these specifications into proper Regex leads to huge and complicated expressions which still might not cover all cases. This article describes how email address validation can be reasonably done and provides a production quality code WPF TextBox, including letting the user enter only valid keystrokes, controlling if an email address must be entered (Required field) and alerting the user if he tries to close the window without saving the changes made.
Table of Contents
Writing a WPF control which validates email addresses is a challenge, because nearly all Unicode characters and many formats are allowed. The specifications allow for a great variety, but in reality only few formats are used, because the exotic ones might get rejected by some email software (clients like Outlook, servers like Exchange). The control must be flexible enough to meet your requirements, i.e., if you need a strict control with precise formatting or if you just want to alert the user if he keyed in a strange email address. Of course, best is if you prevent the user from making invalid input and control which keystrokes he can make. Since this control is part of WpfWindowLib
, it doesn't let the user save the data if the email address is required and missing. It also informs the user if he tries to close the window without saving the data.
The relevant specification can be found at RFC 5322: Internet Message Format: Address Specification.
It defines a valid email address in 2 steps:
1. Address
This higher level specifies, among others, that an Address
can consist of a display-name
and an Addr-Spec
, which is the part we usually mean when we talk about email addresses. The Address
could look like this:
John Doe<John.Doe@example.com>
- "John Doe" is the display name. It is not used for the routing of the email, but allows the email address to be shown in a more user friendly form.
- "<John.Doe@example.com>": is called in the specification
angle-addr
and contains addr-spec
(the email address used for routing the email) in angle brackets.
The display name is optional. If there is no display name, angle brackets are neither needed.
2. addr-spec
This part describes what usually is called an email address:
John.Doe@example.com
Basically, the address has two parts:
local-part
@ domain-part
The domain-part
is the internet DNS address of the email server, while local-part
is the name of the "mailbox" within that email server. The domain-part is quite well defined and needs to be understood for the routing of email addresses by everyone, while the exact meaning of the local-part is defined by the receiving email server software and the sender does not necessarily need to understand the local-part structure. The specification wants to give the email server as much freedom as possible, which makes it hard to validate if an email address is actually correct.
The only commonly agreed requirement for a valid email address is:
There must be two parts separated by exactly one '@
'.
But even this simple specification is not always correct, because the following is also a valid email address:
"John@Doe"@example.com
The first '@
' is in a quoted string. All visible ASCII characters (i.e., from 0x21 to 0x7E) are allowed to be used in the local-part
when they are quoted. They can be between 2 quotes '"' or a single special character might be preceded by a backslash '\':
John\@Doe@example.com
(this is the same address like the one above with quoted strings).
(Inspired by https://en.wikipedia.org/wiki/Email_address#Examples and RFC 3696 Application Techniques for Checking and Transformation of Names: Restrictions on email addresses)
simple@example.com
Domain should contain a '.
', because the root domain cannot be the address of the email server. Of course, as to most rules, there is also an exception: example@localhost
. This domain address is not for the Internet but the company's internal network.
x@example.com
One-letter local-part
is ok.
John.Doe@example.com
The period '.
' character follows some special rules: it cannot be the first or the last letter of the local-part
or domain-part
and there cannot be 2 consecutive dots like '..'.
Gmail treats 'John.Doe
' and 'JohnDoe
' as the same address. As mentioned above, it is up to the receiving email server how it wants to interpret the addresses of its mailboxes. But this poses a problem when the email address is used to identify individuals, like a login page. Will it assume that John.Doe
@example.com and JohnDoe
@example.com are 2 different people ?
-minus-sign-@example.com
_under_score_@example.com
Hyphens '-
' and underscores '_
' are everywhere accepted
John.Doe+Filter@example.com
Might be legal, but does all email software understand this? Some email server will use "John.Doe
" as the actual mailbox name and ignore the '+' and whatever follows it until the '@'.
Actually, all of these characters are legal in the local-part
! # $ % & ' * + - / = ? ^ _ ` . { | } ~ But not all email software might accept them. Poor O’Leary@example.com, some email clients just send it to OLeary@example.com.
Display Names
The '<
' and '>
' characters are not in the above list, because they have a special meaning, they separate a 'display name' from the real 'emailaddress':
My Name <MyName@Example.com>
A display name can only be before the emailaddress in angle brackets. It can contain the same characters like an email address and blanks.
Comments
Also the brackets '(' are missing ')' in the list above. They enclose comments:
Name(Comment1)<(Comment2)Name(Comment3)@(Comment4)Example.com(Comment5>(Comment2)
Comments are in round brackets . A comment can contain any printable ASCII character except '(', ')' and '\'. This is strictly according to RFC5322, but I guess a lot of email software will not interpret it properly. Some even use it as display name, as described above, which is wrong. If possible, don't use comments.
Quoting
" "@example.org
Space between the quotes is the name of the mailbox. Between 2 double quotes, nearly anything goes. Which makes validating difficult.
"john..doe"@example.org
quoted double dot, which are not allowed without quotes
Some\@saple@example.com
The first @
should be treated as a simple character and not as the control character separating local-part
from domain-part
.
John\ Doe@example.com
Spaces are only allowed when quoted (in RFC parlance, quoted also means a leading backslash '\'.
John.\\Doe@example.com
The first back slash '\' makes the second back slash '\' an ordinary character.
Some Strange Looking Valid Addresses
"very.(),:;<>[]\".VERY.\"very@\ \"very\".unusual"@strange.example.com
mailhost!username@example.org
Bangified host route used for uucp mailers
user%example.com@example.org
% escaped mail route to user@example.com via example.org
Domain-Part Requirements
The domain-part
is a DNS hostname, consisting of letters, digits, hyphens and dots. In seldom cases, an IP address can be used instead, enclosed by square brackets:
John.Doe@[192.168.0.0]
user@[IPv6:2001:db8::1]
Use of none ASCII Characters (UTF8)
rfc6530, rfc6531, rfc6532 specify how any UTF character can be used for email addresses, if the email software supports it. Since emails often are forwarded from server to server to server ..., it might very well be that one of these servers does not support UTF8 and an error message is returned to the sender that the email could not be delivered. For this reason, it is safer not to use UTF8 for email addresses, although the following examples are actually valid addresses:
Pelé@example.com
δοκιμή@παράδειγμα.δοκιμή
我買@屋企.香港
संपर्क@डाटामेल.भारतारत
Especially troublesome is the domain-part
, because the actual IP address must be looked up from a DNS server, which might not support UTF8. For that reason, PunyCode was invented, which encodes UTF8 in pure ASCII, which then can also be handled by a none UTF8 domain server. But does your email software support Punycode?
As mentioned before, the goal is to use email addresses which will not cause troubles. Using UTF8 based addresses is asking for trouble. If you must, accept UTF8, but your life is much easier if you don't.
Length Restrictions
In addition to restrictions on syntax, there is a length limit on email addresses. That limit is a maximum of 64 characters (UTF8 bytes) in the local-part
and a maximum of 255 characters (UTF8 bytes) in the domain-part
for a total length of 320 characters.
As mentioned above, there are many valid structures for email addresses. But there are also illegal ones:
Abc.example.com
No @ character
A@b@c@example.com
Only one @
is allowed outside quoted strings, except when it is quoted
john..doe@example..com
double dots '..
' are not allowed in neither part
a"b(c)d,e:f;g<h>i[j\k]l@example.com
None of the special characters in this local-part are allowed outside quotation marks
just"not"right@example.com
Quoted strings must be dot separated or the only element making up the local-part
1234567890123456789012345678901234567890123456789012345678901234+x@example.com
Local part is longer than 64 characters
- Warn the user if he enters a strange looking email address. Most likely it is a typo, but it might also be just a strange looking but valid email address. Let the user decide.
- Help the user to limit the number of errors he can make by reducing the character set he can enter. If possible, avoid UTF8.
It is very hard, some say nearly impossible, to write an email validation that correctly identifies all legal and all illegal email addresses. And even if that would be possible, it still doesn't mean that the email address actually works. The only way to verify that is to send an email and to wait for a reply. Therefore, it is better just to use a relative simple validation to alert the user if something looks strange so that typos are found, but leave the final decision to the user.
<wwl:CheckedWindow x:Class="Samples.SampleWindow"
xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
xmlns:wwl="clr-namespace:WpfWindowsLib;assembly=WpfWindowsLib"
SizeToContent="WidthAndHeight">
<StackPanel>
<Label Content="Name (required)"/>
<wwl:CheckedTextBox x:Name="NameTextBox" MinWidth="100" IsRequired="True"/>
<Label Content="Email (not required)"/>
<wwl:EmailTextBox x:Name="EmailTextBox" MinWidth="100"/>
<Button x:Name="SaveButton" Content="_Save"/>
</StackPanel>
</wwl:CheckedWindow>
The upper window displayed is the data entry window with a Name TextBox
, where the user has to enter some data before saving, and the EmailTextBox
. The TextBox
needs a name, but the user has not entered any data yet, that's why the background is khaki. That is also the reason why the Save Button is disabled. The user has entered an email address. The user tried then to close the window without saving the data. A second window with a warning opened and the Email TextBox
's background got light green, to show the user which data has changed. For a detailed explanation how this works, see my article, Base WPF window functionality for data entry.
In this screenshot, the user has entered a name. The Save button is therefore enabled. The user clicks the Save Button, but gets a warning because the email address looks strange (no '.
' in the domain-part
) and can then decide if the email address should be saved or if some further editing is needed.
In many cases, you don't need to configure anything. You might want to set IsRequired
in XAML or, if the user wants to edit some existing data, call Initialise()
passing the existing email address and isRequired
as parameters from code behind.
Instance Properties
Some properties can be set individually for every PhoneTextBox
:
IsRequired
(DependencyProperty
): Needs the user to provide this control with a value ? MaxLength
(DependencyProperty
): The maximum number of characters that can be manually entered into the text box.
Static Properties
Some properties apply for every PhoneTextBox
and are therefore declared as static
:
AsciiSpecialChars
(string
): Characters allowed additionally to letters and digits in the local-part
of the email address. Default: ".@-_+
". To allow more characters, assign your own string or call EmailTextBox.SetExtendedAsciiSpecialChars()
or EmailTextBox.SetExtendedQuotedAsciiSpecialChars()
. IsBlankAllowed
: Set to true
if user should be able to key in a blank. IsInternationalCharSetAllowed
: Set to true
if user should be able to use Unicode characters greater 0x7F. IsValidEmailChar
(Func<char, bool>
): Gets called to validate if the character the user just entered is allowed in the local-part
of an email address. Assign your own function if you want to use a different validation. IsValidDnsChar
(Func<char, bool>
): Gets called to validate if the character the user just entered is allowed in the domain-part
of an email address. Assign your own function if you want to use a different validation. IsValidEmail
(Func<string, bool>
): Gets called to validate the complete email address once the keyboard focus is leaving the EmailTextBox
. Assign your own function if you want to use a different validation. ShowLooksStrangeWarning
(Func<EmailTextBox, bool>
): Gets called when IsValidEmail
detects a problem. Assign your own function if you want to display the problem differently.
The latest version is available from Github: https://github.com/PeterHuberSg/WpfWindowsLib.
Download or clone everything to your PC, which gives you a solution WpfWindowsLib
with the following projects:
WpfWindowsLib
: (.Dll) to be referenced from your other solutions, contains EmailTextBox
Samples
: WPF Core application showing all WpfWindowsLib
controls WpfWindowsLibTest
: with few WpfWindowsLib
unit tests
- 19th March, 2020: Initial version