Introduction
This project means to enable the number to/from words conversion for different cultures. So far, the English and simplified Chinese are supported, but you can expand it to support other cultures by expanding the base class.
Background
I have tried to find a solution to convert a number to a string
in words, readable and follow the pattern people accustomed with, but found nothing really usable.
The main reason may be that one digit may be appeared as different words in different circumstances following some special rules.
A bigger challenge is how to retrieve the number from a paragraph, where different words are grouped together with a more complicated pattern.
But it is anyway possible with different rules applied, thus after failing to get such tools, I composed this tool and hope people who have the same requirements can benefit from it.
Currently, only integer number are supported to be converted to/from plain English or simplified Chinese. It's enough for my usage at the moment, you are welcomed to expand it to support float numbers, or for different cultures.
Using the Code
General of the Base Converter
The codes are based on NumberWordConverter abstract
class, which define basic rules and delegates for number/words conversion.
The "public string Space
" property will be inserted between words.
The Dictionary<int, List<string>> NumberNameDict
is used to contain different names of a number/digits, and the first string
of the List<string>
will be used to represent the number by default. You must define it for each converter:
NumberNameDict = new Dictionary<int, List<string>>
{
{0, new List<string>{"zero"}},
{1, new List<string>{"one"}},
{2, new List<string>{"two"}}, ...
{20, new List<string>{"twenty", "score", "scores"}},
{30, new List<string>{"thirty"}},
{90, new List<string>{"ninety"}},
{100, new List<string>{"hundred", "hundreds"}},
{1000, new List<string>{"thousand", "thousands"}},
{1000000, new List<string>{"million", "millions"}},
{1000000000, new List<string>{"billion", "billions"}} }
And a correspondent dictionary WordNameDict
is used to contain all words for reverse translation (word->number/digit), and it is generated by:
- Each digit to different words as
string[]
, to plural format if needed;
List<string> sections = new List<string>();
int remained = number;
for (int i = 0; i < groupNums.Count; i ++ )
{
if (remained < groupNums[i])
continue;
int whole = remained / groupNums[i];
sections.Add(toWords(whole));
if (ToPlural != null && whole != 1)
sections.Add(ToPlural(NumberNameDict[groupNums[i]][0]));
else
sections.Add(NumberNameDict[groupNums[i]][0]);
remained -= whole * groupNums[i];
if (remained != 0 && NeedInsertAnd(number, remained))
sections.Add(AndWords[0]);
}
if (remained != 0)
sections.Add(toWords(remained));
- Combine the words together as a single string by insertion of
WhiteSpace
if needed.
StringBuilder sb = new StringBuilder();
for (int i = 0; i < sections.Count-1; i++)
{
sb.Append(sections[i] + Space);
}
sb.Append(sections.Last());
return sb.ToString();
To get number from string
, instead of direct word to digit, a stack<int>
is used for parsing. It is very tricky considering:
- The group name shall generally be aligned from larger to smaller
- A larger group name following a smaller one means a compound group.
- A smaller group name following a larger one means the previous part is a sound unit.
protected int fromWords(string[] sectors)
{
int result = 0, current, lastGroup=1, temp, maxGroup=1;
Stack<int> stack = new Stack<int>();
foreach (string s in sectors)
{
if (AllWords.Contains(s))
{
if (AndWords.Contains(s))
continue;
if (WordNameDict.ContainsKey(s))
{
current = WordNameDict[s];
if (groupNums.Contains(current))
{
if(current>= maxGroup)
{
temp = stack.Pop();
while (stack.Count!= 0)
{
temp += stack.Pop();
};
temp *= current;
stack.Push(temp);
maxGroup *= current;
lastGroup = 1;
}
else if (current > lastGroup)
{
temp = 0;
while(stack.Peek() < current)
{
temp += stack.Pop();
};
temp *= current;
stack.Push(temp);
lastGroup = current;
}
else
{
temp = stack.Pop();
temp *= current;
stack.Push(temp);
lastGroup = current;
}
}
else
{
stack.Push(current);
}
}
}
else
throw new Exception();
}
do
{
result += stack.Pop();
} while (stack.Count != 0);
return result;
}
To parse the string
to get number, the tryParse()
is recommended.
protected virtual bool tryParse(string numberInWords, out int result)
{
result = -1;
try
{
string words = IsCaseSensitive ? numberInWords.ToLower() : numberInWords;
string[] sectors = split(words);
var contained = from s in sectors
where AllWords.Contains(s)
select s;
result = fromWords(contained.ToArray());
return true;
}
catch
{
return false;
}
}
Converter For English
Within the package, only English and simplified Chinese are supported. The number digits may need to be converted to plural format. There are tools available in NET 4.0, alternatively, there is a simple tool I found from http://coreex.googlecode.com/svn-history/r195/branches/development/Source/CoreEx.Common/Extensions/Pluralizer.cs, and the public Func<string, string> ToPlural
refers to Pluralizer.ToPlural
.
To get a more friendly output, I have defined three enum
s within the WordsFormat
:
public enum WordsFormat
{
CapitalOnFirst = 0,
LowCaseOnly = 1,
UpperCaseOnly = 2
}
Thus the converted words string
can be available by calling:
protected virtual bool tryParse(string numberInWords, out int result)
{
result = -1;
try
{
string words = IsCaseSensitive ? numberInWords.ToLower() : numberInWords;
string[] sectors = split(words);
var contained = from s in sectors
where AllWords.Contains(s)
select s;
result = fromWords(contained.ToArray());
return true;
}
catch
{
return false;
}
}
Converter for Simplified Chinese
There are several sets of Words/Characters for each number digits, thus I define a special function for the Number to String conversion.
When the default conversion of " 234002052" to words is "二亿三千四百万零二千零五十二", if the samples is set to "佰零壹贰叁肆拾", then all words will be replaced with the preferred ones contained with the samples.
private string toWords(int number, string samples)
{
string result = ToWords(number);
foreach (char ch in samples)
{
if (allCharacters.Contains(ch) && WordNameDict.ContainsKey(ch.ToString()))
{
int digit = WordNameDict[ch.ToString()];
if (digit > 9 && !groupNums.Contains(digit))
continue;
string digitStr = NumberNameDict[digit][0];
if (digitStr.Length != 1 || digitStr[0] == ch)
continue;
result = result.Replace(digitStr[0], ch);
}
}
return result;
}
Try the Sample
A console project is included, you can run it to see the result as below:
5: 五 ==> 5
20: 廿 ==> 20
21: 二十一 ==> 21
99: 九十九 ==> 99
100: 一百 ==> 100
102: 一百零二 ==> 102
131: 一百三十一 ==> 131
356: 三百五十六 ==> 356
909: 九百零九 ==> 909
1000: 一千 ==> 1000
1021: 一千零二十一 ==> 1021
2037: 二千零三十七 ==> 2037
12345: 一万二千三百四十五 ==> 12345
31027: 三万一千零二十七 ==> 31027
40002: 四万零二 ==> 40002
90010: 九万零一十 ==> 90010
100232300: 一亿零二十三万二千三百 ==> 100232300
234002052: 二亿三千四百万零二千零五十二 ==> 234002052
5: five ==> 5
20: twenty ==> 20
21: twenty-one ==> 21
99: ninety-nine ==> 99
100: one hundred ==> 100
102: one hundred and two ==> 102
131: one hundred and thirty-one ==> 131
356: three hundreds and fifty-six ==> 356
909: nine hundreds and nine ==> 909
1000: one thousand ==> 1000
1021: one thousand and twenty-one ==> 1021
2037: two thousands and thirty-seven ==> 2037
12345: twelve thousands three hundreds and forty-five ==> 12345
31027: thirty-one thousands and twenty-seven ==> 31027
40002: forty thousands and two ==> 40002
90010: ninety thousands and ten ==> 90010
100232300: one hundred millions two hundreds and
thirty-two thousands three hundreds ==> 100232300
234002052: two hundreds and thirty-four millions
two thousands and fifty-two ==> 234002052
572030013: 五亿七千贰佰零叁万零壹拾叁 ==> 572030013
234002052: 贰亿叁千肆佰万零贰千零五拾贰 ==> 234002052
5: Five ==> 5
20: Twenty ==> 20
21: Twenty One ==> 21
99: Ninety Nine ==> 99
100: One Hundred ==> 100
102: One Hundred And Two ==> 102
131: One Hundred And Thirty One ==> 131
356: Three Hundreds And Fifty Six ==> 356
909: Nine Hundreds And Nine ==> 909
1000: One Thousand ==> 1000
1021: One Thousand And Twenty One ==> 1021
2037: Two Thousands And Thirty Seven ==> 2037
12345: Twelve Thousands Three Hundreds And Forty Five ==> 12345
31027: Thirty One Thousands And Twenty Seven ==> 31027
40002: Forty Thousands And Two ==> 40002
90010: Ninety Thousands And Ten ==> 90010
100232300: One Hundred Millions Two Hundreds And
Thirty Two Thousands Three Hundreds ==> 100232300
234002052: Two Hundreds And Thirty Four Millions
Two Thousands And Fifty Two ==> 234002052
第壹佰零八 张 = 108
Points of Interest
The package can be optimized further by providing a uniformed output choices, I may update it when I am not so busy.
History
- 21st November, 2011: Initial post