Introduction
Recently, someone in the Lounge complained that sorting the following text was difficult:
- First
- Second
- Third
- ...
- Eleventh
The idea was that he wanted to sort the list above by their numeric equivalent. Well, it's actually NOT all that difficult once you convert the text to their actual numeric values, and even that conversion isn't really difficult. Here is my solution, which is probably one of many ways to approach the problem. If you have another way to do it, please feel free to publish an article.
Assumptions
This code utilizes recursion and extension methods. It is assumed that you are already familiar with the concepts behind these language features.
Background
As with any other programming problem, you typically start with a vague list of requirements, a hazy user story, and (usually) very specific results requirements - traits often associated with homework assignments, but that also, more often than not, reflect real-world conditions during your employment. This problem follows that paradigm. No programmer that is worth a damn (and left to his own devices) is going to provide a solution that merely meets the stated requirements. In this case, the example data stopped at "eleventh", but that doesn't mean squat. You simply MUST assume that the incoming data is not going to be limited to the provided samples, and besides, you may as well code for as many contingencies as is possible (and reasonable).
Code
To implement this solution, I created three classes. The first one - NumberDictionary - defines a dictionary of terms and their associated decimal values.
public static class NumberDictionary
{
public static Dictionary<string, decimal=""> Numbers = new Dictionary<string, decimal="">()
{
{"ZERO", 0}, {"TEN", 10}, {"HUNDRED", 100},
{"FIRST", 1}, {"ELEVEN", 11}, {"THOUSAND", 1000},
{"ONE", 1}, {"TWELF", 12}, {"MILLION", 1000000},
{"SECOND", 2}, {"TWELVE", 12}, {"BILLION", 1000000000},
{"TWO", 2}, {"THIRTEEN", 13}, {"MILLIARD", 1000000000},
{"THIRD", 3}, {"FOURTEEN", 14}, {"TRILLION", 1000000000000},
{"THREE", 3}, {"FIFTEEN", 15}, {"QUADRILLION", 1000000000000000},
{"FOUR", 4}, {"SIXTEEN", 16}, {"BILLIARD", 1000000000000000},
{"FIF", 5}, {"SEVENTEEN", 17}, {"QUINTILLION", 1000000000000000000},
{"FIVE", 5}, {"EIGHTEEN", 18},
{"SIX", 6}, {"NINETEEN", 19},
{"SEVEN", 7}, {"TWENTY", 20},
{"EIGH", 8}, {"THIRTY", 30},
{"NIN", 9}, {"FORTY", 40},
{"NINE", 9}, {"FIFTY", 50},
{"SIXTY", 60},
{"SEVENTY", 70},
{"EIGHTY", 80},
{"NINETY", 90},
};
}
</string,></string,>
People in the US might have noticed the weird ones - "BILLIARD" and "MILLIARD". These terms are used in long-scale countries, where billion and trillion are interpreted quite differently. If you want a lot of info about these and other differences, go to this Wiki page.
Text Handling
This functionality is encompassed in a static string extension class.
It made sense to me to make the outward-facing interface a string extension method. This is a simple method which normalizes the text and converts each text component into a numeric value. You may have already noticed that I'm using a decimal type. I'm doing this because of its ungodly max value which FAR exceeds the currently coded maximum possible value of 999 trillion.
There's really nothing special about this code. I merely replace some stuff, remove some stuff, and add some stuff (don't you love to hear programmers trivialize code?) so that the text is as normalized as I can reasonable make it. Once that's done, I convert each component "word" into its numeric equivalent by trying to find it in the numbers Dictionary. If I can't find even one of the component words in the dictionary, an exception is thrown. Once we're past the normalization/parsing part, we do the math to arrive at the numeric value.
UPDATE, 13 Feb 2015 - Apparently, everybody except the US interprets the value of billion and trillion incorrectly (grin). The difference is referred to as short-scale (US) or long-scale (GBR, et al). To appease the guys that mentioned it, I came up with a solution. I moved the numbers dictionary to its own class, and added a method here to adjust for long scale regions. This new method is called before we do anything else. I had neither the time nor desire to include all long-scale regions, so I left it as a very minor exercise to the programmer to add his/her 3-letter ISO region code to the longScaleRegions string. I also added some code to replace "AND" with a space, in case someone tries to do something like "one hundred and five".
public static class ExtendString
{
private static IntList values;
private static void AdjustForLongScale()
{
string longScaleRegions = "GBR,";
if (longScaleRegions.Contains(RegionInfo.CurrentRegion.ThreeLetterISORegionName))
{
numbers["BILLION"] = numbers["MILLION"] * numbers["MILLION"];
numbers["TRILLION"] = numbers["BILLION"] * numbers["MILLION"];
}
}
public static decimal Translate(this string text)
{
ExtendString.AdjustForLongScale();
text = text.ToUpper().Trim();
string trimChars = "TH";
text = text.Replace("TY", "TY ");
text = text.Replace("-"," ").Replace("_"," ").Replace("."," ").Replace(",", " ");
text = text.Replace(" AND", " ");
if (text.EndsWith(trimChars))
{
text = text.TrimEnd(trimChars.ToArray());
}
text = text.Replace(" ", " ");
values = new IntList();
string[] parts = text.Split(' ');
foreach (string numberText in parts)
{
if (numbers.Keys.Contains(numberText))
{
values.Add(numbers[numberText]);
}
else
{
throw new Exception("Not a number (might be spelled wrong)");
}
}
return values.NumericValue;
}
}
The Math
I created a class derived from List<decimal> that contains the parsed numeric values, and performs operations on those values. This effectively separates and contains what I consider to be discrete functionality. It also goes a long way toward keeping the extension method free of clutter. Given the textm "four hundred twenty nine thousand six hundred six", the contents of the list starts out like this after parsing:
- [0] 4
- [1] 100
- [2] 20
- [3] 9
- [4] 1000
- [5] 6
- [6] 100
- [7] 6
Given the domain of the problem, we should never really have many more that 15 or so elements in the list (more if you want to support more than 999 trillion), so we don't have to concern ourselves with the size of this list.
The first thing we do is break the list up into smaller chunks. This just makes it easier to manage. There's a"mini list" for each major value break, and are named to indicate their numeric purpose.
private List<decimal> trillions = new List<decimal>();
private List<decimal> billions = new List<decimal>();
private List<decimal> millions = new List<decimal>();
private List<decimal> thousands = new List<decimal>();
private List<decimal> hundreds = new List<decimal>();
NOTE: Notice that the mini-lists don't inlcude anything over a trillion, and don't include the long-scale-specific terms. Feel free to add them if you want to.
When the text has been parsed, the extension method retrieves the IntList.NumericValue property. Several actions are taken in order to retrieve the expected numeric value.
public decimal NumericValue
{
get
{
this.BuildMiniLists();
this.DoMiniMaths();
this.AddMiniValues();
decimal value = this.Sum(x=>x);
return value;
}
}
Each of the mini lists are populated (if necessary). We do this by looking for the big value and copying it - and all of the receding values - to the appropriate mini-list.
UPDATE - 13 Feb 2015 - I changed the call to BuildMiniList to refer to the NumberDictionary values. This only made sense since you could be in a long-scale region where the values can change for some of the dictionary numbers.
private void BuildMiniLists()
{
this.BuildMiniList(this.trillions, NumberDictionary.Numbers["TRILLION"]);
this.BuildMiniList(this.billions, NumberDictionary.Numbers["BILLION"]);
this.BuildMiniList(this.millions, NumberDictionary.Numbers["MILLION"]);
this.BuildMiniList(this.thousands, NumberDictionary.Numbers["THOUSAND"]);
this.BuildMiniList(this.hundreds, NumberDictionary.Numbers["HUNDRED"]);
}
private void BuildMiniList(List<decimal> list, decimal amount) {
int index = this.IndexOf(amount);
{
decimal[] values;
values = new decimal[index+1];
this.CopyTo(0, values, 0, index+1);
list.AddRange(values);
for (int i = index; i >= 0; i--)
{
this.RemoveAt(i);
}
}
}
Next, math is performed on each mini list. The DoMath method is recursive, and iterates through the specified list looking for this to multiply or add. Using our example above, the thousands mini list will look like this:
- [0] 4
- [1] 100
- [2] 20
- [3] 9
- [4] 1000
The DoMath method below will build the appropriate value by reading each value and either adding or multiplying it into the resulting value. The processing will do this:
((4 * 100) + 20 + 9) * 1000) = 429000
After process, the mini list will look like this:
- [0] 0
- [1] 0
- [2] 0
- [3] 0
- [4] 429000
private void DoMiniMaths()
{
this.DoMath(this.trillions, 0);
this.DoMath(this.billions, 0);
this.DoMath(this.millions, 0);
this.DoMath(this.thousands, 0);
this.DoMath(this.hundreds, 0);
}
private void DoMath(List<decimal> list, int lastBigIndex)
{
if (list.Count > 0)
{
decimal rollingValue = 0;
decimal[] big = new decimal[]{100,1000,1000000,1000000000,1000000000000};
for (int j = 0; j < list.Count; j++)
{
if (big.Contains(list[j]))
{
rollingValue = Math.Max(1, rollingValue) * list[j];
list[j] = rollingValue;
for (int i = lastBigIndex; i < j; i++)
{
list[i] = 0;
}
lastBigIndex = j;
if (j < list.Count - 1)
{
this.DoMath(list, lastBigIndex);
}
}
else
{
rollingValue += list[j];
}
}
}
}
After the math has been performed on each mini-list, we sum up the values inside each list, and add each sum back to the parent list.
private void AddMiniValues()
{
this.Add(this.trillions.Sum(x=>x));
this.Add(this.billions.Sum(x=>x));
this.Add(this.millions.Sum(x=>x));
this.Add(this.thousands.Sum(x=>x));
this.Add(this.hundreds.Sum(x=>x));
}
After summing the mini-lists, the parent list should have the following contents:
- [0] 6
- [1] 100
- [2] 6
- [3] 429000
When the values in the parent list are summed, we arrive at the result of "429606". Now that we've written all of that code, I leave it as an exercise for the programmer to associate the resulting numeric value with the text for sorting.
Using the Code
Usage goes something like this (from a console app):
class Program
{
static void Main(string[] args)
{
Translate("twelfth");
Translate("First");
Translate("One Hundredth");
Translate("five thousand seven Hundred thirtysecond");
Translate("four hundred twenty nine thousand");
Translate("four hundred twenty nine thousand six hundred six");
Translate("fortyfive");
Translate("twenty seven million two hundred thirtyfour thousand one");
Console.ReadKey();
}
static void Translate(string text)
{
decimal value = text.Translate();
Console.WriteLine(string.Format("{0} - {1:#,##0}", text, value));
}
}
Points of Interest
Nothing particularly wacky or unexpected occurred while writing this code, but it was interesting and allowed me to do some mental exercise outside my normal work-a-day life. I am programmer.
History
- 27 Feb 2018%nbsp; Fixed some spelling mistakes and partial HTML tags.
- 13 Feb 2015 (B) Added support for long-scale regions
- 13 Feb 2015 (A) Formatted a code block and fixed some misspellings.
- 12 Feb 2015 Initial release.