Introduction
I had a requirement to format country specific data string
s like national id, phones and postal codes. .NET Framework provides several formatting options for numbers and dates but formatting options for string
s are limited. The options available to format various data types using string.Format()
with lots of examples can be found here. Another way to format string
s is to use regular expression such as follows:
Regex.Replace("1112224444", @"(\d{3})(\d{3})(\d{4})", "$1-$2-$3");
A good explanation for formatting string
s using regular expression can be found here. Although this provides a flexible way to format string
data, the regular expression can easily get quite complicated and especially if these patterns are going to be provided by the end users, it isn't too user friendly to specify the format using regular expressions.
This article presents a way to format string
s by specifying format patterns in an intuitive way using specifiers for digits and letters. Besides invoking the format method directly, the source code includes a way to invoke it via String
extension method and via a custom formatter in string.format()
.
Using the Code
Format can be specified using #
to represent a digit and *
to represent a letter. If a string
needs to have these characters, \
can be used as an escape character. There are no other special characters and all other characters are displayed as is in output.
The source includes a class CustomStringFormatter
that exposes formatting capability in multiple ways:
string formattedString = CustomStringFormatter.Format
(inputString, formattingPattern, out cleanedInput);
string formattedString = inputString.FormatString(formattingPattern, out cleanedInput);
string formattedString = string.Format(new CustomStringFormatter(),
"{0:" + formattingPattern + "}", inputString);
Some examples are as follows:
Input | Format | Output |
1234567890 | (###) ###-#### | (123) 456-7890 |
87712345 | (06) ## ## ## ## | (06) 87 71 23 45 |
1234XY | #### ** | 1234 XY |
XX-123(MN) | XX\\###**-\#\* | XX\123MN-#* |
Points of Interest
Format picks the characters matching the digit and letter pattern from the input string
and takes the formatting character from the format string
. In case the input string
has extra characters, only the first matching characters are taken, rest all characters are ignored. It’s an error when input string
doesn't contain enough characters matching the format digit/letter pattern and a blank string
is returned as output in such cases. Additionally, the data part extracted from the input is available through an output parameter.
Formatting a string
mainly involves a validation setup where you would want to ascertain 'what' needs to be formatted and then format it using formatting characters indicating the 'how-to-format' of formatting. This is a simplified case where both the steps are specified using one format, allowing only a loose validation before input is formatted. It would however be possible to first validate the input based on a validation pattern (thus allowing a more rigorous validation), extract the matching part and then apply the formatting pattern on that.
History
- 24th July, 2011: Initial version