Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Generic Number to/from Word Converter

0.00/5 (No votes)
21 Nov 2011 2  
Converter of Arabic number to/from strings

Introduction

This project means to enable the number to/from words conversion for different cultures. So far, the English and simplified Chinese are supported, but you can expand it to support other cultures by expanding the base class.

Background

I have tried to find a solution to convert a number to a string in words, readable and follow the pattern people accustomed with, but found nothing really usable.

The main reason may be that one digit may be appeared as different words in different circumstances following some special rules.

A bigger challenge is how to retrieve the number from a paragraph, where different words are grouped together with a more complicated pattern.

But it is anyway possible with different rules applied, thus after failing to get such tools, I composed this tool and hope people who have the same requirements can benefit from it.

Currently, only integer number are supported to be converted to/from plain English or simplified Chinese. It's enough for my usage at the moment, you are welcomed to expand it to support float numbers, or for different cultures.

Using the Code

General of the Base Converter

The codes are based on NumberWordConverter abstract class, which define basic rules and delegates for number/words conversion.

The "public string Space" property will be inserted between words.

The Dictionary<int, List<string>> NumberNameDict is used to contain different names of a number/digits, and the first string of the List<string> will be used to represent the number by default. You must define it for each converter:

 NumberNameDict = new Dictionary<int, List<string>>
            {
                {0, new List<string>{"zero"}},
                {1, new List<string>{"one"}},
                {2, new List<string>{"two"}}, ...
                {20, new List<string>{"twenty", "score", "scores"}},
                {30, new List<string>{"thirty"}},
                {90, new List<string>{"ninety"}},
                {100, new List<string>{"hundred", "hundreds"}},
                {1000, new List<string>{"thousand", "thousands"}},
                {1000000, new List<string>{"million", "millions"}},
                {1000000000, new List<string>{"billion", "billions"}}//,
                //{1000000000000, new List<string>{"Trillion", "Trillions"}}
            }  

And a correspondent dictionary WordNameDict is used to contain all words for reverse translation (word->number/digit), and it is generated by:

  • Each digit to different words as string[], to plural format if needed;
List<string> sections = new List<string>();
int remained = number;

for (int i = 0; i < groupNums.Count; i ++ )
{
    if (remained < groupNums[i])
        continue;

    int whole = remained / groupNums[i];
    sections.Add(toWords(whole));

    if (ToPlural != null && whole != 1)
        sections.Add(ToPlural(NumberNameDict[groupNums[i]][0]));
    else
        sections.Add(NumberNameDict[groupNums[i]][0]);

    remained -= whole * groupNums[i];

    if (remained != 0 && NeedInsertAnd(number, remained))
    //if(remained != 0 && remained < 100)
        sections.Add(AndWords[0]);
}

if (remained != 0)
    sections.Add(toWords(remained));
  • Combine the words together as a single string by insertion of WhiteSpace if needed.
StringBuilder sb = new StringBuilder();

for (int i = 0; i < sections.Count-1; i++)
{
   sb.Append(sections[i] + Space);
}
sb.Append(sections.Last());

return sb.ToString();

To get number from string, instead of direct word to digit, a stack<int> is used for parsing. It is very tricky considering:

  • The group name shall generally be aligned from larger to smaller
  • A larger group name following a smaller one means a compound group.
  • A smaller group name following a larger one means the previous part is a sound unit.
/// <summary>
/// Function to get number from split words.
/// </summary>
/// <param name="sectors">Words for each digits of the number</param>
/// <returns>The number</returns>
protected int fromWords(string[] sectors)
{
    int result = 0, current, lastGroup=1, temp, maxGroup=1;
    Stack<int> stack = new Stack<int>();

    foreach (string s in sectors)
    {
        if (AllWords.Contains(s))
        {
            if (AndWords.Contains(s))
            continue;

            if (WordNameDict.ContainsKey(s))
            {
                current = WordNameDict[s];

                if (groupNums.Contains(current))
                {
                    //The current group is higher than any existed group, 
                    //thus the digits shall be increased: by Multiply!!!!
                    if(current>= maxGroup)
                    {
                        temp = stack.Pop();
                        while (stack.Count!= 0)
                        {
                            temp += stack.Pop();
                        };
                        temp *= current;
                        stack.Push(temp);
                        maxGroup *= current;
                        lastGroup = 1;
                    }
                    //The current group is higher than the last group, thus shall be add
                    else if (current > lastGroup)
                    {
                        temp = 0;

                        while(stack.Peek() < current)
                        {
                            temp += stack.Pop();
                        };

                        temp *= current;
                        stack.Push(temp);
                        lastGroup = current;
                    }
                    else
                    {
                        temp = stack.Pop();
                        temp *= current;
                        stack.Push(temp);
                        lastGroup = current;
                    }
                }
                else
                {
                    stack.Push(current);
                }
            }
        }
        else
            throw new Exception();
     }

     do
     {
        result += stack.Pop();
     } while (stack.Count != 0);

     return result;
}

To parse the string to get number, the tryParse() is recommended.

/// <summary>
/// The main function to try to retrieve number from string of words.
/// </summary>
/// <param name="numberInWords">The original word string of number</param>
/// <param name="result">The converted number if successful</param>
/// <returns>TRUE if parse successfully.</returns>
protected virtual bool tryParse(string numberInWords, out int result)
{
    result = -1;

    try
    {
         string words = IsCaseSensitive ? numberInWords.ToLower() : numberInWords;

         string[] sectors = split(words);

         var contained = from s in sectors
                         where AllWords.Contains(s)
                         select s;

         result = fromWords(contained.ToArray());
         return true;
     }
     catch
     {
         return false;
     }
}  

Converter For English

Within the package, only English and simplified Chinese are supported. The number digits may need to be converted to plural format. There are tools available in NET 4.0, alternatively, there is a simple tool I found from http://coreex.googlecode.com/svn-history/r195/branches/development/Source/CoreEx.Common/Extensions/Pluralizer.cs, and the public Func<string, string> ToPlural refers to Pluralizer.ToPlural.

To get a more friendly output, I have defined three enums within the WordsFormat:

/// <summary>
/// Define the output format of the words from number
/// </summary>
public enum WordsFormat
{
    CapitalOnFirst = 0,
    LowCaseOnly = 1,
    UpperCaseOnly = 2
}

Thus the converted words string can be available by calling:

/// <summary>
/// The main function to try to retrieve number from string of words.
/// </summary>
/// <param name="numberInWords">The original word string of number</param>
/// <param name="result">The converted number if successful</param>
/// <returns>TRUE if parse successfully.</returns>
protected virtual bool tryParse(string numberInWords, out int result)
{
    result = -1;

    try
    {
        string words = IsCaseSensitive ? numberInWords.ToLower() : numberInWords;

        string[] sectors = split(words);

        var contained = from s in sectors
                        where AllWords.Contains(s)
                        select s;

        result = fromWords(contained.ToArray());
        return true;
    }
    catch
    {
        return false;
    }
} 

Converter for Simplified Chinese

There are several sets of Words/Characters for each number digits, thus I define a special function for the Number to String conversion.

When the default conversion of " 234002052" to words is "二亿三千四百万零二千零五十二", if the samples is set to "佰零壹贰叁肆拾", then all words will be replaced with the preferred ones contained with the samples.

/// <summary>
/// ToWord() for Chinese culture.
/// </summary>
/// <summary>
/// Function to convert number to string of words with predefined characters.
/// </summary>
/// <param name="number">The number</param>
/// <param name="samples">
/// The characters shall be used to replace the default ones.
/// <example>
/// For example, 234002052 by default will be converted to "二亿三千四百万零二千零五十二",
///     but if the samples is set to "佰零壹贰叁肆拾", 
/// then the output will be "贰亿叁千肆佰万零贰千零五拾贰"
///     any characters appeared in the samples will replace the default ones, 
/// thus "贰" will replace any "二"s for digit of "2".
/// </example>
/// </param>
/// <returns>The converted string in words.</returns>
private string toWords(int number, string samples)
{
    string result = ToWords(number);

    foreach (char ch in samples)
    {
        if (allCharacters.Contains(ch) && WordNameDict.ContainsKey(ch.ToString()))
        {
            int digit = WordNameDict[ch.ToString()];
            if (digit > 9 && !groupNums.Contains(digit))
                continue;

            string digitStr = NumberNameDict[digit][0];

            if (digitStr.Length != 1 || digitStr[0] == ch)
                continue;

            result = result.Replace(digitStr[0], ch);
        }
    }

    return result;
} 

Try the Sample

A console project is included, you can run it to see the result as below:

5: 五  ==> 5
20: 廿  ==> 20
21: 二十一  ==> 21
99: 九十九  ==> 99
100: 一百  ==> 100
102: 一百零二  ==> 102
131: 一百三十一  ==> 131
356: 三百五十六  ==> 356
909: 九百零九  ==> 909
1000: 一千  ==> 1000
1021: 一千零二十一  ==> 1021
2037: 二千零三十七  ==> 2037
12345: 一万二千三百四十五  ==> 12345
31027: 三万一千零二十七  ==> 31027
40002: 四万零二  ==> 40002
90010: 九万零一十  ==> 90010
100232300: 一亿零二十三万二千三百  ==> 100232300
234002052: 二亿三千四百万零二千零五十二  ==> 234002052
5: five  ==> 5
20: twenty  ==> 20
21: twenty-one  ==> 21
99: ninety-nine  ==> 99
100: one hundred  ==> 100
102: one hundred and two  ==> 102
131: one hundred and thirty-one  ==> 131
356: three hundreds and fifty-six  ==> 356
909: nine hundreds and nine  ==> 909
1000: one thousand  ==> 1000
1021: one thousand and twenty-one  ==> 1021
2037: two thousands and thirty-seven  ==> 2037
12345: twelve thousands three hundreds and forty-five  ==> 12345
31027: thirty-one thousands and twenty-seven  ==> 31027
40002: forty thousands and two  ==> 40002
90010: ninety thousands and ten  ==> 90010
100232300: one hundred millions two hundreds and 
thirty-two thousands three hundreds  ==> 100232300
234002052: two hundreds and thirty-four millions 
two thousands and fifty-two  ==> 234002052
572030013: 五亿七千贰佰零叁万零壹拾叁  ==> 572030013
234002052: 贰亿叁千肆佰万零贰千零五拾贰  ==> 234002052
5: Five  ==> 5
20: Twenty  ==> 20
21: Twenty One  ==> 21
99: Ninety Nine  ==> 99
100: One Hundred  ==> 100
102: One Hundred And Two  ==> 102
131: One Hundred And Thirty One  ==> 131
356: Three Hundreds And Fifty Six  ==> 356
909: Nine Hundreds And Nine  ==> 909
1000: One Thousand  ==> 1000
1021: One Thousand And Twenty One  ==> 1021
2037: Two Thousands And Thirty Seven  ==> 2037
12345: Twelve Thousands Three Hundreds And Forty Five  ==> 12345
31027: Thirty One Thousands And Twenty Seven  ==> 31027
40002: Forty Thousands And Two  ==> 40002
90010: Ninety Thousands And Ten  ==> 90010
100232300: One Hundred Millions Two Hundreds And 
Thirty Two Thousands Three Hundreds  ==> 100232300
234002052: Two Hundreds And Thirty Four Millions 
Two Thousands And Fifty Two  ==> 234002052
第壹佰零八 张 = 108

Points of Interest

The package can be optimized further by providing a uniformed output choices, I may update it when I am not so busy.

History

  • 21st November, 2011: Initial post

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here