Extended Number to Numeral (Number Spelling) Converter

Shvetsov Evgeniy

2.33/5 (2 votes)

25 May 2016MIT11 min read

18.4K

Numbers (positive AND negative integral/fractional) to english/russian words

Introduction

The subject is pretty self-descriptive. Most important thing about it is that it uses generalized strategy to process numbers. While currently the module can convert numbers to the two different languages: russian (cyrillic) AND english (latinic), with support of the two english dialects: american AND british, in future such construction allows to extend the module to support even more cyrillic AND/OR latinic languages.

Background

Here i will list some helpful links to the online conversion tools, which you can use to check spelling (including the module output).

English:

http://www.webmath.com/saynum.html
http://www.mathcats.com/explore/reallybignumbers.html

English AND russian:

http://eng5.ru/en/numbers_translation
http://prutzkow.com/numbers/index_en.htm

By the way, i found some bugs in this tools, so part of their output might be incorrect.

Also, information about numerals in different languages:

English: http://en.wikipedia.org/wiki/English_numerals

Russian:

http://masterrussian.com/numbers/Russian_Numbers.htm
http://www.russianlessons.net/lessons/lesson2_main.php

Scales:

https://en.wikipedia.org/wiki/Names_of_large_numbers
https://en.wikipedia.org/wiki/Long_and_short_scales

Using the code

LocaleSettings struct used to configure the conversion:

C++

// Enables some language very specific rules for numbers spelling
//  (like pronouncing four-digit numbers in US & UK Eng.)
bool verySpecific = false;
bool positiveSign = false; // add positive sign [for positive nums]
// Если целая часть равна нулю, то она может не читаться: 0.75 (.75) – point seventy five
bool shortFormat  = false; // skip mention zero int. / fract. part
bool foldFraction = false; // try find repeated pattern & treat it
ELocale locale = ELocale::L_EN_GB;
size_t precison = size_t(LDBL_DIG); // max. digits count (<= 'LDBL_DIG')

Flags:

1) verySpecific

For ENG GB, ENG US:

- replaces zero / nought with the 'o' letter (1.02 = "one point o two")

- enables specific handling of four-digit numbers with non-zero hundreds: they are often named using multiples of "hundred" AND combined with tens AND/OR ones ("one thousand one", "eleven hundred three", "twelve hundred twenty-five", "four thousand forty-two", or "ninety-nine hundred ninety-nine" etc)

* for ENG GB this style is common for multiples of 100 between 1,000 AND 2,000 (e.g. 1,500 as "fifteen hundred") BUT NOT for higher numbers.

2) positiveSign: enables addition of explicit 'positive' / 'plus' / 'плюс' signature for the numbers > 0

Examples:

1.3 = "plus one point three" [EN GB]

1.181818181818 = "плюс одна целая и восемнадцать в периоде" [RU + foldFraction]

3) shortFormat: skip mentioning unexisting integral OR fractional part of the number

Examples:

0.0 = "zero" [EN US]

0.01 = "point zero one" [EN US]

999000.0 = "nine hundred and ninety-nine thousand" [EN GB]

4) foldFraction: [ONLY for fractions] enables mechanism of finding repeated digits pattern in the fractional part of a number AND (if found) shortening it to the first occurrence with addition of periodic signature.

Examples:

EN GB + verySpecific:

-7289.120912091209 = "minus seven thousand two hundred and eighty-nine point one two o nine repeating"

EN US + positiveSign:

28364768.07310731 = "positive twenty-eight million three hundred sixty-four thousand seven hundred sixty-eight point zero seven three one to infinity"

Options:

1) precison: maximum count of digits (in the fractional part) to process. Result number representation would be rounded to the last digit. Can be zero. Limited to the LDBL_DIG value. Trailing zeroes in the result number are ignored.

2) locale: selected language OR language dialect. Value selected from the ELocale enumeration (old C++ enum, NOT new C++11 enum class). Can have the following values:

C++

L_RU_RU, // Russian Federation Russian
L_EN_US, // United States English
L_EN_GB, // United Kingdom English

The flags AND options can be combined in ANY combination, BUT some flags (OR options) can be ignored OR reinterpreted in some cases.

Example: verySpecific + positiveSign + shortFormat + foldFraction

0.0034013401 = "plus o point o o three four o one repeating" [EN GB]

As you can see, despite the shortFormat flag is set, zero integral part is NOT ignored.

Function call interface + short description:

C++

// 'ReserveBeforeAdding' can be used to DISABLE possible 'trade-space-for-time' optimization
template<class TStrType, const bool ReserveBeforeAdding = true>
// "Number to the numeric format string" (321 -> "three hundred twenty-one")
// Accpets negative numbers AND fractions
// Complexity: linear in the number's digit count
static bool numToNumFormatStr(long double num, TStrType& str,
                              LocaleSettings& localeSettings =
                                LocaleSettings::DEFAULT_LOCALE_SETTINGS,
                              const char** const errMsg = nullptr) {

errMsg pointer can be used to get an error message (as a static const. POD C str.), explaining of what exaclty happened, if something occasionally goes wrong.

As you can see, different container types are supported here, all of them, however, should met the requirements:

C++

'TStrType' SHOULD support operator '+=', 'empty' AND 'size' methods

Function adds numeral text to the existing content of the str, delimiting it with the spacer if the container is not empty at the start of a function's work.

Conversion stages description

There are a total of four main steps.

1) checking of the incoming value & treating it's sign

auto negativeNum = false;
if (num < 0.0L) {
  negativeNum = true;
  num = -num; // revert
}
//// Check borders
static const auto VAL_UP_LIMIT_ = 1e100L; // see 'getOrderStr'
if (num >= VAL_UP_LIMIT_) {
  if (errMsg) *errMsg = "too big value";
  return false;
}
if (ELocale::L_RU_RU == localeSettings.locale) { // for rus. lang. ONLY
  static const auto VAL_LOW_LIMIT_RU_ = 10.0L / VAL_UP_LIMIT_;
  if (num && num < VAL_LOW_LIMIT_RU_) {
    if (errMsg) *errMsg = "too low value";
    return false;
  }
}
//// Treat sign
const auto delimiter = DEFAULT_DELIMITER;
auto getSignStr = [](const ELocale locale, const bool positive) throw() -> const char* {
  switch (locale) {
    case ELocale::L_EN_US: return positive ? "positive" : "negative";
    case ELocale::L_EN_GB: return positive ? "plus" : "minus";
    case ELocale::L_RU_RU: return positive ? "плюс" : "минус";
  }
  assert(false); // locale error
  // Design / implementation error, NOT runtime error!
  return "<locale error [" MAKE_STR_(__LINE__) "]>"; // works OK in GCC
};
if (negativeNum || (localeSettings.positiveSign && num)) { // add sign
  if (!str.empty()) str += delimiter; // if needed
  str += getSignStr(localeSettings.locale, !negativeNum);
}
if (truncated::ExecIfPresent(str)) { // check if truncated
  if (errMsg) *errMsg = "too short buffer"; return false;
}

VAL_UP_LIMIT_ is involved here because of the getOrderStr language-specific morphological lambda limitations for the russian language. This (AND others) labmda will be presented later in this article.

truncated::ExecIfPresent is a special conditional optimization for the StaticallyBufferedString-like classes (if provided as a storage). It use Exec-If-Present idiom.

2) getting the number representation as a char. array & analysing it

C++

static const size_t MAX_DIGIT_COUNT_ = size_t(LDBL_DIG);
// Normalized form (mantissa is a 1 digit ONLY):
//  first digit (one of 'MAX_DIGIT_COUNT_') + '.' + [max. digits AFTER '.' - 1] + 'e+000'
//   [https://en.wikipedia.org/wiki/Scientific_notation#Normalized_notation]
static const size_t MAX_STR_LEN_ = 6U + MAX_DIGIT_COUNT_;

// +24 to be on a safe side in case if NOT normalized form (unlikely happen) + for str. terminator
static const size_t BUF_SIZE_ = AUTO_ADJUST_MEM(MAX_STR_LEN_ + 24U, 8U);
char strBuf[BUF_SIZE_];
// 21 digits is max. for 'long double' [https://msdn.microsoft.com/ru-ru/library/4hwaceh6.aspx]
//  (20 of them can be AFTER decimal point in the normalized scientific notation)
if (localeSettings.precison > MAX_DIGIT_COUNT_) localeSettings.precison = MAX_DIGIT_COUNT_;
const ptrdiff_t len = sprintf(strBuf, "%.*Le", localeSettings.precison, num); // scientific format
// On failure, a negative number is returned
if (len < static_cast<decltype(len)>(localeSettings.precison)) {
  if (errMsg) *errMsg = "number to string convertion failed";
  return false;
}

sprintf is used here because of, comparing to the naive conversion way (applying series of simple arithmetic operations, like *, / AND %), it gives no (OR almost no) precision penalty (however, involving extra performance overhead). Function assumes that the resulted (received from sprintf) representation will be in normalized form of scientific notation, but the code was designed (though NOT tested) to work even if the resulted output would not be normalized.

Analyzation process consists of gathering information about number representation (like exponent value in the scientific notation) AND separating char. array on parts (by adjusting specific pointers, like fractPartEnd).

C++

char* currSymbPtr;    // ptr. used to iterate over the numeric str.
char* fractPartStart; // in the original scientific representation
char* fractPartEnd;   // past the end [will point to the str. terminator, replacing the exp. sign]
long int expVal;      // 3 for '1.0e3'
auto fractPartLen = ptrdiff_t();
size_t intPartLen; // real len.
size_t intPartBonusOrder; // of the current digit
size_t fractPartLeadingZeroesCount; // extra zeroes count BEFORE first meaning digit
static const auto DECIMAL_DELIM_ = '.'; // [decimal separator / decimal mark] to use
auto analyzeScientificNotationRepresentation = [&]() throw() {
  currSymbPtr = strBuf + len - size_t(1U); // from the end to start (<-)
  //// Get exp.
  static const auto EXP_SYMB_ = 'e';
  while (EXP_SYMB_ != *currSymbPtr) {
    --currSymbPtr; // rewind to the exp. start
    assert(currSymbPtr > strBuf);
  }
  fractPartEnd = currSymbPtr;
  *currSymbPtr = '\0'; // break str.: 2.22044604925031310000e+016 -> 2.22044604925031310000 +016
  const char* errMsg;
  const auto result = strToL(expVal, currSymbPtr + size_t(1U), errMsg);
  assert(result);
  //// Get int. part len.
  fractPartStart = currSymbPtr - localeSettings.precison;
  intPartLen = fractPartStart - strBuf;
  assert(intPartLen);
  if (localeSettings.precison) --intPartLen; // treat zero fract. precison ('1e0')
  assert((currSymbPtr - strBuf - int(localeSettings.precison) - 1) >= 0);
  assert(localeSettings.precison ? DECIMAL_DELIM_ == *(strBuf + intPartLen) : true);
  //// Finishing analyse (partition the number): get int. part real len.
  if (expVal < 0L) { // negative exp.
    if (static_cast<size_t>(-expVal) >= intPartLen) { // NO int. part
      fractPartLeadingZeroesCount = -(expVal + static_cast<long int>(intPartLen));
      intPartLen = size_t(); // skip processing int. part
    } else { // reduce int. part
      intPartLen += expVal; // decr. len.
      fractPartLeadingZeroesCount = size_t();
    }
    intPartBonusOrder = size_t();
    if (localeSettings.precison) // if fract. part exists [in the scientific represent.]
      --fractPartLen; // move delim. into the fract part., so reduce it length
  } else { // non-negative exp.: incr. len.
    const auto additive =
      std::min<decltype(localeSettings.precison)>(expVal, localeSettings.precison);
    intPartLen += additive;
    fractPartLeadingZeroesCount = size_t();
    intPartBonusOrder = expVal - additive;
  }
};
analyzeScientificNotationRepresentation();
// Rewind to the fract. start [BEFORE getting fract. part real len.]
currSymbPtr = strBuf + intPartLen +
(expVal > decltype(expVal)() ? size_t(1U) : size_t()); // 1.23e1 = 12.3e0 [move right +1]

After the main analysis is finished, fractional part (if exist) of the number will be precisely inspected to determine if there are meaningless trailing zeros presented AND (if required) if the fractional part consist of some repeated pattern.

C++

auto fractPartTrailingZeroesCount = size_t(), fractPartAddedCount = size_t();
char* fractPartRealStart;
auto folded = false; // true if repeated pattern founded
auto calcFractPartRealLen = [&]() throw() {
  if (DECIMAL_DELIM_ == *currSymbPtr) ++currSymbPtr; // skip delimiter when it separtes ('1.1e0')
  assert(fractPartEnd >= currSymbPtr); // 'currSymbPtr' SHOULD now be a real fract. part start
  fractPartRealStart = currSymbPtr;
  fractPartLen += fractPartEnd - currSymbPtr; // 'fractPartLen' CAN be negative BEFORE addition
  assert(fractPartLen >= ptrdiff_t()); // SHOULD NOT be negative now
  if (!fractPartLen) return; // NO fract. part
  //// Skip trailing zeroes
  auto fractPartCurrEnd = fractPartEnd - size_t(1U); // will point to the last non-zero digit symb.
  while ('0' == *fractPartCurrEnd && fractPartCurrEnd >= currSymbPtr) --fractPartCurrEnd;
  assert(fractPartCurrEnd >= strBuf); // SHOULD NOT go out of the buf.
  fractPartTrailingZeroesCount = fractPartEnd - fractPartCurrEnd - size_t(1U);
  assert(fractPartLeadingZeroesCount >= size_t() &&
         fractPartLen >= static_cast<ptrdiff_t>(fractPartTrailingZeroesCount));
  fractPartLen -= fractPartTrailingZeroesCount;
  //// Fraction folding (if needed)
  if (fractPartLen > size_t(1U) && localeSettings.foldFraction) {
    //// Remove delim. (if needed)
    assert(fractPartStart && fractPartStart > strBuf); // SHOULD be setted (delim. founded)
    if (fractPartRealStart < fractPartStart) { // move: "12.1e-1" -> "1 21e-1"
      currSymbPtr = fractPartStart - size_t(1U);
      assert(*currSymbPtr == DECIMAL_DELIM_);
      while (currSymbPtr > fractPartRealStart)
        *currSymbPtr-- = *(currSymbPtr - size_t(1U)); // reversed move
      *currSymbPtr = '\0';
      fractPartRealStart = currSymbPtr + size_t(1U); // update, now SHOULD point to the new real start
      assert(fractPartLen);
    }
    //// Actual folding (if needed)
    if (fractPartLen > size_t(1U)) {
      const auto patternLen = tryFindPattern(fractPartRealStart, fractPartLen);
      if (patternLen) {
        fractPartLen = patternLen; // actual folding (reduce fract. part len. to the pattern. len)
        folded = true;
      }
    }
  }
};
// We are NOT using 'modfl' to get part values trying to optimize by skipping zero parts
calcFractPartRealLen(); // update len.
assert(fractPartLen ? localeSettings.precison : true);
const auto fractPartWillBeMentioned = fractPartLen || !localeSettings.shortFormat;
currSymbPtr = strBuf; // start from the beginning, left-to-right (->)

Recognition of the repeated pattern (which may be presented in a fractional part) performed by the step-by-step sequential scanning.

C++

// Return nullptr if a pattern of such a len. is EXISTS (returns last NOT matched occurrence else)
auto testPattern = [](const char* const str, const char* const strEnd,
                      const size_t patternSize) throw() {
  assert(str); // SHOULD NOT be nullptr
  auto equal = true;
  auto nextOccurance = str + patternSize;
  while (true) {
    if (memcmp(str, nextOccurance, patternSize)) return nextOccurance; // NOT macthed
    nextOccurance += patternSize;
    if (nextOccurance >= strEnd) return decltype(nextOccurance)(); // ALL matched, return nullptr
  }
};

// Retruns pattern size if pattern exist, 0 otherwise
// TO DO: add support for advanced folding: 1.25871871 [find repeated pattern NOT ONLY from start]
//  [in cycle: str+1, str+2, ...; get pattern start, pattern len. etc in 'tryFindPatternEx']
//   ['сто двадцать целых двадцать пять до периода и шестьдесят семь в периоде']
//    [controled by 'enableAdvancedFolding' new option]]
auto tryFindPattern = [&](const char* const str, const size_t totalLen) throw() {
  const size_t maxPatternLen = totalLen / size_t(2U);
  auto const strEnd = str + totalLen; // past the end
  for (auto patternSize = size_t(1U); patternSize <= maxPatternLen; ++patternSize) {
    if (totalLen % patternSize) continue; // skip invalid dividers [OPTIMIZATION]
    if (!testPattern(str, strEnd, patternSize)) return patternSize;
  }
  return size_t();
};

For example, having 1.23452345 number, first we test if the fractional part consists only of repeated 2 (no), then if only of repeated 23 (wrong again), 234 is next (nope), AND finally 2345 hit the spot. Such inspection performed if only fractional part exist AND only by the explicit request of the user (disabled by default).

3) processing integral part of the number

This is the first step, when all preparation is finished AND where the real processing starts.

C++

processDigitsPart(intPartLen, getIntSubPartSize(), intPartBonusOrder, false);
if (truncated::ExecIfPresent(str)) { // check if truncated
  if (errMsg) *errMsg = "too short buffer"; return false;
}
if (intPartLen) { // if int. part exist
  assert(currSymbPtr > strBuf);
  intPartLastDigit = *(currSymbPtr - ptrdiff_t(1)) - '0';
  assert(intPartLastDigit > ptrdiff_t(-1) && intPartLastDigit < ptrdiff_t(10));
  if (intPartLen > size_t(1U)) { // there is also prelast digit
    auto intPartPreLastDigitPtr = currSymbPtr - ptrdiff_t(2);
    if (DECIMAL_DELIM_ == *intPartPreLastDigitPtr) --intPartPreLastDigitPtr; // skip delim.: 2.3e1
    assert(intPartPreLastDigitPtr >= strBuf); // check borders
    intPartPreLastDigit = *intPartPreLastDigitPtr - '0';
    assert(intPartPreLastDigit > ptrdiff_t(-1) && intPartPreLastDigit < ptrdiff_t(10));
  }
}
strLenWithoutFractPart = str.size(); // remember (for future use)
intPartAddedCount = addedCount;
addedCount = decltype(addedCount)(); // reset

Both integral AND fractional parts are processed by the processDigitsPart generic processing lambda. This unified processing strategy will be presented later in this article.

After the main processing, two additional internal parameters: intPartLastDigit AND intPartPreLastDigit are also determined - they are required for a russian language processing, to choose an appropriate ending for the int. part AND for a fraction delimiter:

5.1 = "пять целых одна десятая"

1.5 = "одна целая пять десятых"

1 = "один" [shortFormat]

4) processing fractional part of the number

C++

if (fractPartLen) {
  addFractionDelimiter();
  addFractionPrefix(); // if needed
  currSymbPtr = fractPartRealStart; // might be required if folded [in SOME cases]
}
processDigitsPart(fractPartLen, getFractSubPartSize(localeSettings), size_t(), true);
if (addedCount) { // smth. added (even if zero part)
  fractPartAddedCount = addedCount;
  //// Add specific ending (if needed, like 'десятимиллионная')
  assert(fractPartLen >= decltype(fractPartLen)());
  size_t fractPartLastDigitOrderExt = fractPartLeadingZeroesCount + fractPartLen;
  if (!fractPartLastDigitOrderExt) fractPartLastDigitOrderExt = size_t(1U); // at least one
  addFractionEnding(fractPartLastDigitOrderExt);
}
assert(totalAddedCount); // SHOULD NOT be zero
if (truncated::ExecIfPresent(str)) { // check if truncated
  if (errMsg) *errMsg = "too short buffer"; return false;
} return true;

addFractionDelimiter is another generic processing lambda, while addFractionPrefix is a language-specific processing lambda (this types of lambdas will be soon described more precisely).

addFractionDelimiter is obviously used to add fraction separator.

addFractionPrefix is used to add some language-specific content before starting an actual processing of the fractional part. For example, for english language it is leading zeros - in the scientific notation they might NOT be presented in the processed char. array: 0.0037 would be represented as "3.7e-3" (normalized form), so those zeros would NOT be processed during the main processing cycle AND so have to be added elsewhere.

There are three groups of lambdas, which was't described yet AND which is used durring the convertion process:

1) language-specific lambdas: their run time behavior is heavily based on the selected language

a) morphological lambdas: provides morphems of the selected language

b) processing lambdas: used to configure generic processing lambdas based on the language

2) generic processing lambdas: their internal logic is totally independent from the selected language, however, their execution process are configured by the language-specific processing lambdas

Now we'll talk about all those functions.

Language-specific morphological lambdas

In fact, this functions represents the exact language. They provide a morphems used to construct the resulted numeral.

Each word can have up to a 3 morphems (affixes) in addition to the root:

1) prefix: placed before the stem of a word
2) infix: inserted inside a word stem
OR
interfix: [linkage] placed in between two morphemes AND does NOT have a semantic meaning
3) postfix: (suffix OR ending) placed after the stem of a word

Word = [prefix]<root>[infix / interfix][postfix (suffix, ending)]

Each function returns root AND can optionally provide infix AND/OR postfix.

Do NOT consider, however, the returned values to be the root / the postfix etc in the exact linguistic meaning (as a morphemes gained from the correct AND proper morphological analysis). Consider them to be a "root" / a "postfix" specific to the current project.

1) getZeroOrderNumberStr

Returns numerals for numbers 0 - 9 (step 1) in the form of root + postfix.

Examples: "th" + "ree" (3), "вос" + "емь" (8)

C++

auto getZeroOrderNumberStr = [&](const size_t currDigit, const size_t order, const char*& postfix,
                                 const LocaleSettings& localeSettings) throw() -> const char* {
  static const char* const EN_TABLE[] = // roots
    {"", "one", "tw", "th", "fo", "fi", "six", "seven", "eigh", "nine"};
  static const char* const EN_POSTFIXES[] = // endings
    {"", "", "o", "ree", "ur", "ve", "", "", "t", ""};
  static const char* const RU_TABLE[] =
    {"нол", "од", "дв", "тр", "четыр", "пят", "шест", "сем", "вос", "девят"};
  static const char* const RU_POSTFIXES[] = // восЕМЬ восЬМИ восЕМЬЮ
    // одИН одНОГО одНОМУ одНИМ; двА двУХ двУМ двУМЯ; трИ трЕМЯ; четырЕ четырЬМЯ четырЁХ
    {"ь", "ин", "а", "и", "е", "ь", "ь", "ь", "емь", "ь"};
  // НолЬ нолЯ нолЮ; пятЬ пятЬЮ пятЕРЫХ; шестЬ шестЬЮ шестИ; семЬ семИ семЬЮ; девятЬ девятЬЮ девятИ
  static_assert(sizeof(EN_TABLE) == sizeof(RU_TABLE) && sizeof(EN_TABLE) == sizeof(EN_POSTFIXES) &&
                sizeof(RU_TABLE) == sizeof(RU_POSTFIXES) &&
                size_t(10U) == std::extent<decltype(EN_TABLE)>::value,
                "Tables SHOULD have the same size (10)");
  assert(currDigit < std::extent<decltype(EN_TABLE)>::value); // is valid digit?
  switch (localeSettings.locale) {
    case ELocale::L_EN_US: case ELocale::L_EN_GB:
      postfix = EN_POSTFIXES[currDigit];
      if (!currDigit) { // en.wikipedia.org/wiki/Names_for_the_number_0_in_English
        // American English:
        //  zero:       number by itself, decimals, percentages, phone numbers, some fixed expressions
        //  o (letter): years, addresses, times and temperatures
        //  nil:        sports scores
        if (localeSettings.verySpecific) return "o"; // 'oh'
        return localeSettings.locale == ELocale::L_EN_US ? "zero" : "nought";
      }
      return EN_TABLE[currDigit];
    case ELocale::L_RU_RU:
      postfix = "";
      switch (order) {
        case size_t(0U): // last digit ['двадцать две целых ноль десятых']
          // Один | одНА целая ноль десятых | одна целая одНА десятая
          if (!fractPartWillBeMentioned) break;
        case size_t(3U): // тысяч[?]
          switch (currDigit) {
            case size_t(1U): postfix = "на"; break; // 'ста двадцать одНА тысяча'
            case size_t(2U): postfix = "е"; break; // 'ста двадцать двЕ тысячи' []
          }
        break;
      }
      if (!*postfix) postfix = RU_POSTFIXES[currDigit]; // if NOT setted yet
      return RU_TABLE[currDigit];
  }
  assert(false); // locale error
  return "<locale error [" MAKE_STR_(__LINE__) "]>";
};

2) getFirstOrderNumberStr

Returns numerals for numbers 10 - 19 (step 1) AND 20 - 90 (step 10) in the form of root + infix + postfix.

Example: "дв" + "адцат" + "ь" (20)

C++

auto getFirstOrderNumberStr = [&](const size_t currDigit, const size_t prevDigit,
                                  const char*& infix, const char*& postfix,
                                  const LocaleSettings& localeSettings) throw() -> const char* {
  //// Sub. tables: 10 - 19 [1]; Main tables: 20 - 90 [10]
  
  static const char* const EN_SUB_TABLE[] = {"ten", "eleven"}; // exceptions [NO infixes / postfixes]
  static const char* const EN_SUB_INFIXES[] = // th+ir+teen; fo+ur+teen; fi+f+teen
    {"", "", "", "ir", "ur", "f", "", "", "", ""};
  #define ESP_ "teen" // EN_SUB_POSTFIX
  static const char* const EN_SUB_POSTFIXES[] = // tw+elve ["a dozen"]; +teen ALL others
    {"", "", "elve", ESP_, ESP_, ESP_, ESP_, ESP_, ESP_, ESP_}; // +teen of ALL above 2U (twelve)
  static const char* const EN_MAIN_INFIXES[] = // tw+en+ty ["a score"]; th+ir+ty; fo+r+ty; fi+f+ty
    {"", "", "en", "ir", "r", "f", "", "", "", ""}; // +ty ALL

  #define R23I_ "дцат" // RU_20_30_INFIX [+ь]
  #define RT1I_ "на" R23I_ // RU_TO_19_INFIX [на+дцат+ь]
  static const char* const RU_SUB_INFIXES[] = // +ь; одиннадцатЬ одиннадцатИ одиннадцатЬЮ
    // ДесятЬ десятИ десятЬЮ; од и надцат ь / тр и надцат ь; дв е надцат ь; вос ем надцат ь
    {"", "ин" RT1I_, "е" RT1I_, "и" RT1I_, RT1I_, RT1I_, RT1I_, RT1I_, "ем" RT1I_, RT1I_};

  // ДвадцатЬ двадцатЬЮ двадцатЫЙ двадцатОМУ двадцатИ; семьдесят BUT семидесяти!
  #define R5T8I_ "ьдесят" // RU_50_TO_80_INFIX [NO postfix]
  static const char* const RU_MAIN_INFIXES[] = // дв а дцат ь; тр и дцат ь; пят шест сем +ьдесят
    {"", "", "а" R23I_, "и" R23I_, "", R5T8I_, R5T8I_, R5T8I_, "ем" R5T8I_, ""}; // вос ем +ьдесят
  static const char* const RU_MAIN_POSTFIXES[] = // дв а дцат ь; тр и дцат ь; пят шест сем +ьдесят
    {"", "", "ь", "ь", "", "", "", "", "", "о"}; // сорок; вос ем +ьдесят; девяност о девяност а

  static_assert(sizeof(EN_SUB_INFIXES) == sizeof(EN_MAIN_INFIXES) &&
                sizeof(EN_SUB_POSTFIXES) == sizeof(RU_MAIN_POSTFIXES) &&
                sizeof(RU_SUB_INFIXES) == sizeof(RU_MAIN_INFIXES), "Tables SHOULD have the same size");
  assert(prevDigit < std::extent<decltype(EN_SUB_POSTFIXES)>::value); // is valid digits?
  assert(currDigit < std::extent<decltype(EN_SUB_POSTFIXES)>::value);
  switch (localeSettings.locale) {
    case ELocale::L_EN_US: case ELocale::L_EN_GB:
      switch (prevDigit) {
        case size_t(1U): // ten - nineteen
          infix = EN_SUB_INFIXES[currDigit], postfix = EN_SUB_POSTFIXES[currDigit];
          if (currDigit < size_t(2U)) return EN_SUB_TABLE[currDigit]; // exceptions
        break;
        default: // twenty - ninety
          assert(!prevDigit && currDigit > size_t(1U));
          infix = EN_MAIN_INFIXES[currDigit], postfix = "ty"; // +ty for ALL
        break;
      }
    break;
    case ELocale::L_RU_RU:
      switch (prevDigit) {
        case size_t(1U): // десять - девятнадцать
          infix = RU_SUB_INFIXES[currDigit], postfix = "ь"; // +ь for ALL
          if (!currDigit) return "десят";
        break;
        default: // двадцать - девяносто
          assert(currDigit > size_t(1U));
          infix = RU_MAIN_INFIXES[currDigit], postfix = RU_MAIN_POSTFIXES[currDigit];
          switch (currDigit) {
            case size_t(4U): return "сорок"; // сорокА
            case size_t(9U): return "девяност"; // девяностО девяностЫХ девяностЫМ
          }
        break;
      }
    break;
    default: assert(false); // locale error
      return "<locale error [" MAKE_STR_(__LINE__) "]>";
  } // END switch (locale)
  const char* tempPtr;
  return getZeroOrderNumberStr(currDigit, size_t(), tempPtr, localeSettings);
};

3) getSecondOrderNumberStr

Returns numerals for numbers 100 - 900 (step 100) in the form of root + infix + postfix.

Examples: "fi" + "ve" + " hundred" (500), "дв" + "е" + "сти" (200)

C++

// 100 - 900 [100]
auto getSecondOrderNumberStr = [&](const size_t currDigit, const char*& infix, const char*& postfix,
                                   const LocaleSettings& localeSettings) throw() -> const char* {
  static const char* const RU_POSTFIXES[] =
    {"", "", "сти", "ста", "ста", "сот", "сот", "сот", "сот", "сот"};
  static_assert(size_t(10U) == std::extent<decltype(RU_POSTFIXES)>::value,
                "Table SHOULD have the size of 10");
  assert(currDigit && currDigit < std::extent<decltype(RU_POSTFIXES)>::value);
  switch (localeSettings.locale) {
    case ELocale::L_EN_US: case ELocale::L_EN_GB:
      postfix = " hundred";
      return getZeroOrderNumberStr(currDigit, size_t(), infix, localeSettings);
    case ELocale::L_RU_RU:
      postfix = RU_POSTFIXES[currDigit];
      switch (currDigit) {
        case size_t(1U): infix = ""; return "сто"; break;
        case size_t(2U): {
            const char* temp;
            infix = "е"; //ALWAYS 'е'
            return getZeroOrderNumberStr(currDigit, size_t(), temp, localeSettings); // дв е сти
          }
      }
      return getZeroOrderNumberStr(currDigit, size_t(), infix, localeSettings);
  } // END switch (locale)
  assert(false); // locale error
  return "<locale error [" MAKE_STR_(__LINE__) "]>";
};

4) getOrderStr: returns name of the large number based on its order

Uses short scale for the english language (both american AND british).

C++

// Up to 10^99 [duotrigintillions]
auto getOrderStr = [](size_t order, const size_t preLastDigit, const size_t lastDigit,
                      const char*& postfix, const LocaleSettings& localeSettings)
                      throw() -> const char* {
  // https://en.wikipedia.org/wiki/Names_of_large_numbers
  static const char* const EN_TABLE[] = // uses short scale (U.S., part of Canada, modern British)
    {"", "thousand", "million", "billion", "trillion", "quadrillion", "quintillion", "sextillion",
     "septillion", "octillion", "nonillion", "decillion", "undecillion", "duodecillion" /*10^39*/,
     "tredecillion", "quattuordecillion", "quindecillion", "sedecillion", "septendecillion",
     "octodecillion", "novemdecillion ", "vigintillion", "unvigintillion", "duovigintillion",
     "tresvigintillion", "quattuorvigintillion", "quinquavigintillion", "sesvigintillion",
     "septemvigintillion", "octovigintillion", "novemvigintillion", "trigintillion" /*10^93*/,
     "untrigintillion", "duotrigintillion"};
  // https://ru.wikipedia.org/wiki/Именные_названия_степеней_тысячи
  static const char* const RU_TABLE[] = // SS: short scale, LS: long scale
    {"", "тысяч", "миллион", "миллиард" /*SS: биллион*/, "триллион" /*LS: биллион*/,
     "квадриллион" /*LS: биллиард*/, "квинтиллион" /*LS: триллион*/,
     "секстиллион" /*LS: триллиард*/, "септиллион" /*LS: квадриллион*/, "октиллион", "нониллион",
     "дециллион", "ундециллион", "додециллион", "тредециллион", "кваттуордециллион" /*10^45*/,
     "квиндециллион", "седециллион", "септдециллион", "октодециллион", "новемдециллион",
     "вигинтиллион", "анвигинтиллион", "дуовигинтиллион", "тревигинтиллион", "кватторвигинтиллион",
     "квинвигинтиллион", "сексвигинтиллион", "септемвигинтиллион", "октовигинтиллион" /*10^87*/,
     "новемвигинтиллион", "тригинтиллион", "антригинтиллион", "дуотригинтиллион"}; // 10^99
  static_assert(sizeof(EN_TABLE) == sizeof(RU_TABLE), "Tables SHOULD have the same size");
  static const size_t MAX_ORDER_ =
    (std::extent<decltype(EN_TABLE)>::value - size_t(1U)) * size_t(3U); // first empty

  static const char* const RU_THOUSAND_POSTFIXES[] = // десять двадцать сто двести тысяч
    // Одна тысячА | две три четыре тысячИ | пять шесть семь восемь девять тысяч
    {"", "а", "и", "и", "и", "", "", "", "", ""};
  static const char* const RU_MILLIONS_AND_BIGGER_POSTFIXES[] = // один миллион; два - четыре миллионА
    // Пять шесть семь восемь девять миллионОВ [миллиардОВ триллионОВ etc]
    // Десять двадцать сто двести миллионОВ миллиардОВ etc
    {"ов", "", "а", "а", "а", "ов", "ов", "ов", "ов", "ов"};
  static_assert(size_t(10U) == std::extent<decltype(RU_THOUSAND_POSTFIXES)>::value &&
                size_t(10U) == std::extent<decltype(RU_MILLIONS_AND_BIGGER_POSTFIXES)>::value,
                "Tables SHOULD have the size of 10");
  switch (localeSettings.locale) {
    case ELocale::L_EN_US: case ELocale::L_EN_GB:
      postfix = "";
      if (size_t(2U) == order) return "hundred"; // 0U: ones, 1U: tens
      order /= 3U; // 0 - 1: empty, 3 - 5: thousands, 6 - 8: millions, 9 - 11: billions etc
      assert(order < std::extent<decltype(EN_TABLE)>::value);
      return EN_TABLE[order]; // [0, 33]
    case ELocale::L_RU_RU:
      assert(preLastDigit < size_t(10U) && lastDigit < size_t(10U));
      if (size_t(3U) == order) { // determine actual postfix first
        if (size_t(1U) != preLastDigit) {
          postfix = RU_THOUSAND_POSTFIXES[lastDigit];
        } else postfix = ""; // 'тринадцать тысяч'
      } else if (order > size_t(3U)) { // != 3U
        if (size_t(1U) == preLastDigit) { // десять одиннадцать+ миллионОВ миллиардОВ etc
          postfix = "ов";
        } else postfix = RU_MILLIONS_AND_BIGGER_POSTFIXES[lastDigit];
      }
      order /= 3U; // 6 - 8: миллионы, 9 - 11: миллиарды etc
      assert(order < std::extent<decltype(RU_TABLE)>::value);
      return RU_TABLE[order]; // [0, 33]
  }
  assert(false); // locale error
  return "<locale error [" MAKE_STR_(__LINE__) "]>";
};

5) getFractionDelimiter

Returns POD C str., which represents the fractional separator used in the selected language.

C++

// 'intPartPreLastDigit' AND 'intPartLastDigit' CAN be negative (in case of NO int. part)
auto getFractionDelimiter = [](const ptrdiff_t intPartPreLastDigit, const ptrdiff_t intPartLastDigit,
                               const char*& postfix, const bool folded,
                               const LocaleSettings& localeSettings) throw() -> const char* {
  assert(intPartPreLastDigit < ptrdiff_t(10) && intPartLastDigit < ptrdiff_t(10));
  postfix = "";
  switch (localeSettings.locale) {
    case ELocale::L_EN_US: case ELocale::L_EN_GB: return "point"; // also 'decimal'
    case ELocale::L_RU_RU: // "целые" НЕ употребляются в учебниках!
      if (intPartLastDigit < ptrdiff_t() && localeSettings.shortFormat) return ""; // NO int. part
      if (folded) postfix = "и";
      return ptrdiff_t(1) == intPartLastDigit ?
        (ptrdiff_t(1) == intPartPreLastDigit ? "целых" : "целая") : // одинадцать целЫХ | одна целАЯ
        "целых"; // ноль, пять - девять целЫХ; две - четыре целЫХ; десять цел ых
  }
  assert(false); // locale error
  return "<locale error [" MAKE_STR_(__LINE__) "]>";
};

6) getFoldedFractionEnding

If the number had a fractional part with the repeated pattern, which was folded, this specific ending would be added to the end of the numerical string, to indicate pattern reoccurrence.

C++

auto getFoldedFractionEnding = [](const LocaleSettings& localeSettings) throw() {
  // Also possibly 'continuous', 'recurring'; 'reoccurring' (Australian)
  switch (localeSettings.locale) {
    case ELocale::L_EN_US: return "to infinity"; // also 'into infinity', 'to the infinitive'
    case ELocale::L_EN_GB: return "repeating"; // also 'repeated'
    case ELocale::L_RU_RU: return "в периоде";
  }
  assert(false); // locale error
  return "<locale error [" MAKE_STR_(__LINE__) "]>";
};

Generic processing lambdas

As i have already said, this one are language-independent AND used to process both integral AND fractional parts of the number (one per time).

1) processDigitsPart: main processing cycle

C++

size_t intPartAddedCount, strLenWithoutFractPart;
// Strategy used to process both integral AND fractional parts of the number
// 'digitsPartSize' is a total part. len. in digits (i. e. 1 for 4, 3 for 123, 6 for 984532 etc)
//  [CAN be zero in some cases]
// 'partBonusOrder' will be 3 for 124e3, 9 for 1.2e10, 0 for 87654e0 etc
// 'fractPart' flag SHOULD be true if processing fraction part
auto processDigitsPart = [&](size_t digitsPartSize, const size_t digitsSubPartSize,
                             size_t partBonusOrder, const bool fractPart) {
  currDigit = size_t(), prevDigit = size_t(); // reset
  if (digitsPartSize) {
    assert(digitsSubPartSize); // SHOULD be NOT zero
    size_t currDigitsSubPartSize =
      (digitsPartSize + partBonusOrder) % digitsSubPartSize; // 2 for 12561, 1 for 9 etc
    if (!currDigitsSubPartSize) currDigitsSubPartSize = digitsSubPartSize; // if zero remanider
    // Will be 2 for '12.34e4' ('1234e2' = '123 400' - two last unpresented zeroes); 1 for 1e1
    auto subPartOrderExt = size_t(); // used ONLY for a last subpart

    // OPTIMIZATION HINT: redesign to preallocate for the whole str., NOT for a diffirent parts? 
    if (ReserveBeforeAdding) // optimization [CAN acquire more / less space then really required]
      str.reserve(str.length() + estimatePossibleLength(digitsPartSize, fractPart, localeSettings));
    do {
      if (currDigitsSubPartSize > digitsPartSize) { // if last AND unnormal [due to the '%']
        subPartOrderExt = currDigitsSubPartSize - digitsPartSize;
        partBonusOrder -= subPartOrderExt;
        currDigitsSubPartSize = digitsPartSize; // correct
      }
      digitsPartSize -= currDigitsSubPartSize;
      processDigitsSubPart(currDigitsSubPartSize, digitsSubPartSize,
                           digitsPartSize + partBonusOrder, subPartOrderExt, fractPart);
      currDigitsSubPartSize = digitsSubPartSize; // set default [restore]
    } while (digitsPartSize);
  }
  auto mentionZeroPart = [&]() {
    if (!str.empty()) str += delimiter;
    const char* postfix;
    str += getZeroOrderNumberStr(size_t(), size_t(), postfix, localeSettings);
    str += postfix;
    ++totalAddedCount;
  };
  if (!addedCount) { // NO part
    if (!localeSettings.shortFormat || folded) { // NOT skip mention zero parts
      if (fractPart) {
        addFractionDelimiter(); // 'ноль целых'
      } else intPartLastDigit = ptrdiff_t(); // now. IS int. part
      mentionZeroPart();
      ++addedCount;
    } else if (fractPart) { // short format AND now processing fraction part
      assert(!folded); // NO fract. part - SHOULD NOT be folded
      assert(strLenWithoutFractPart <= str.size()); // SHOULD NOT incr. len.
      if (!intPartAddedCount) { // NO int. part [zero point zero -> zero] <EXCEPTION>
        mentionZeroPart(); // do NOT incr. 'addedCount'!!
      }
    }
  }
};

This function takes a part of the number, for example, 1278 from 1278.45 AND process it by the subparts of the speicified size (currently 3, 2 OR 1). Considering digitsSubPartSize = 2, there will be two such subparts: 12 AND 78. Each such subpart is processed by the other generic processing lambda: processDigitsPart (see below).

In fact, processDigitsPart performs a series of calls to the processDigitsPart function, correctly separating the part on subparts, until the are no more subparts remains, also performing special action in the end, if there are nothing was actually added (in order to correctly process numbers like 0.0 with the shortFormat flag turned ON AND some other specific cases).

This function also use estimatePossibleLength language-specific processing lambda (will be described later) AND addFractionDelimiter generic processing lambda (already mentioned, will be precisely described later).

2) processDigitsSubPart: subprocessing cycle

Process subpart, received from the parent cycle (processDigitsPart). Both two this functions are closures, which actually aren't processing any real number, they are, of course, processing the strBuf char. array, which was previously filled by the sprintf function during stage 1 of the conversion (see 'Conversion stages description' section above).

C++

auto addedCount = size_t(); // during processing curr. part
auto emptySubPartsCount = size_t();
// Part order is an order of the last digit of the part (zero for 654, 3 for 456 of the 456654 etc)
// Part (integral OR fractional) of the number is consists of the subparts of specified size
//  (usually 3 OR 1; for ENG.: 3 for int. part., 1 for fract. part)
// 'subPartOrderExt' SHOULD exists ONLY for a LAST subpart
auto processDigitsSubPart = [&](const size_t currDigitsSubPartSize,
                                const size_t normalDigitsSubPartSize,
                                const size_t order, size_t subPartOrderExt, const bool fractPart) {
  assert(currDigitsSubPartSize && currDigitsSubPartSize <= size_t(3U));
  auto currAddedCount = size_t(); // reset
  auto emptySubPart = true; // true if ALL prev. digits of the subpart is zero
  prevDigit = std::decay<decltype(prevDigit)>::type(); // reset
  for (size_t subOrder = currDigitsSubPartSize - size_t(1U);;) {
    if (DECIMAL_DELIM_ != *currSymbPtr) { // skip decimal delim.
      currDigit = *currSymbPtr - '0'; // assuming ANSI ASCII
    PPOCESS_DIGIT_:
      assert(*currSymbPtr >= '0' && currDigit < size_t(10U));
      emptySubPart &= !currDigit;
      processDigitOfATriad(subOrder + subPartOrderExt, order, currAddedCount,
                           normalDigitsSubPartSize, fractPart);
      if (subPartOrderExt) { // treat unpresented digits [special service]
        --subPartOrderExt;
        prevDigit = currDigit;
        currDigit = std::decay<decltype(currDigit)>::type(); // remove ref. from type
        goto PPOCESS_DIGIT_; // don't like 'goto'? take a nyan cat here: =^^=
      }
      if (!subOrder) { // zero order digit
        ++currSymbPtr; // shift to the symb. after the last in an int. part
        break;
      }
      --subOrder, prevDigit = currDigit;
    }
    ++currSymbPtr;
  }
  if (emptySubPart) ++emptySubPartsCount; // update stats
  // Add order str. AFTER part (if exist)
  if (currAddedCount && normalDigitsSubPartSize >= minDigitsSubPartSizeToAddOrder) {
    const char* postfix;
    auto const orderStr = getOrderStr(order, prevDigit, currDigit, postfix, localeSettings);
    assert(orderStr && postfix);
    if (*orderStr) { // if NOT empty (CAN be empty for zero order [EN, RU])
      assert(str.size()); // NOT zero
      str += delimiter, str += orderStr, str += postfix;
      ++currAddedCount;
    }
  }
  addedCount += currAddedCount;
};

This function calls processDigitOfATriad language-specific processing lambda for the each digit in the processed subpart.

As it is obvious of the name AND listing of a function, it usually used to process subparts of size = 3. Actually, it can process subparts of size 1, 2, OR 3 (AND all those sizes are really required at some point).

When all digits of the subpart are processed, function appends order string (like "thousand") if it is needed. This event occurs only if we process subparts of at least minDigitsSubPartSizeToAddOrder size, which is setted by the call to a getMinDigitsSubPartSizeToAddOrder language-specific processing lambda (would be presented in the next section of an article).

3) addFractionDelimiter

A very simple function, used to correctly separate integral AND fractional parts of the number.

C++

auto intPartPreLastDigit = ptrdiff_t(-1), intPartLastDigit = ptrdiff_t(-1); // NO part by default
auto addFractionDelimiter = [&]() {
  const char* postfix;
  auto const fractionDelim =
    getFractionDelimiter(intPartPreLastDigit, intPartLastDigit, postfix, folded, localeSettings);
  if (*fractionDelim) { // if NOT empty
    if (!str.empty()) str += delimiter;
    str += fractionDelim;
  }
  if (*postfix) {
    if (*fractionDelim) str += delimiter;
    str += postfix;
  }
};

Language-specific processing lambdas

Final pack of lambdas, used during the processing.

The following ones are used to configure the conversion strategy, based on the selected language.

1) getMinDigitsSubPartSizeToAddOrder

Returns the minimal subpart size, for which an order string (like "hundred" OR "thousand" for english) should be appended during the conversion.

For example, for english again, when processing 1256 by subparts of size = 2, we would append "hundred" after 12, while processing the same number by subparts of size = 1, we would append nothing.

C++

auto getMinDigitsSubPartSizeToAddOrder = [](const LocaleSettings& localeSettings) throw() {
  switch (localeSettings.locale) {
    case ELocale::L_EN_US: case ELocale::L_EN_GB: return size_t(2U); // hundreds
    case ELocale::L_RU_RU: return size_t(3U); // тысячи
  }
  assert(false); // locale error
  return size_t();
};

2) getSpecificCaseSubPartSize

Returns the subpart size, when there is some specific processing required. You can see the samples of such specific cases in the function's listing.

C++

// Returns zero (NOT set, undefined) if NOT spec. case
auto getSpecificCaseSubPartSize = [](const long double& num,
                                     const LocaleSettings& localeSettings) throw() {
  switch (localeSettings.locale) {
    /*
    In American usage, four-digit numbers with non-zero hundreds
    are often named using multiples of "hundred"
    AND combined with tens AND/OR ones:
    "One thousand one", "Eleven hundred three", "Twelve hundred twenty-five",
    "Four thousand forty-two", or "Ninety-nine hundred ninety-nine"
    */
    case ELocale::L_EN_US:
      if (num < 10000.0L) {
        bool zeroTensAndOnes;
        const auto hundreds =
          MathUtils::getDigitOfOrder(size_t(2U), static_cast<long long int>(num), zeroTensAndOnes);
        if (hundreds && !zeroTensAndOnes) return size_t(2U); // if none-zero hundreds
      }
    break;
    // In British usage, this style is common for multiples of 100 between 1,000 and 2,000
    //  (e.g. 1,500 as "fifteen hundred") BUT NOT for higher numbers
    case ELocale::L_EN_GB:
      if (num >= 1000.0L && num < 2001.0L) {
        // If ALL digits of order below 2U [0, 1] is zero
        if (!(static_cast<size_t>(num) % size_t(100U))) return size_t(2U); // if is multiples of 100
      }
    break;
  }
  return size_t();
};

3) getIntSubPartSize

Returns the subpart size, when processing an integral part of the number.

C++

auto getIntSubPartSize = [&]() throw() {
  auto subPartSize = size_t();
  if (localeSettings.verySpecific)
    subPartSize = getSpecificCaseSubPartSize(num, localeSettings); // CAN alter digits subpart size
  if (!subPartSize) { // NOT set previously
    switch (localeSettings.locale) { // triads by default
      // For eng. numbers step = 1 can be ALSO used: 64.705 — 'six four point seven nought five'
      case ELocale::L_EN_US: case ELocale::L_EN_GB: case ELocale::L_RU_RU: subPartSize = size_t(3U);
    }
  }
  return subPartSize;
};

4) getFractSubPartSize

Returns the subpart size, when processing a fractional part of the number.

C++

auto getFractSubPartSize = [](const LocaleSettings& localeSettings) throw() {
  switch (localeSettings.locale) {
    case ELocale::L_EN_US: case ELocale::L_EN_GB:
      // Step = 2 OR 3 can be ALSO used: 14.65 - 'one four point sixty-five'
      return size_t(1U); // point one two seven
    case ELocale::L_RU_RU: return size_t(3U); // сто двадцать семь сотых
  }
  assert(false); // locale error
  return size_t();
};

5) estimatePossibleLength

A heuristic function, used to predict the possible length of the string, that would represent the targeted number's part. It used to optionally preallocate memory for the provided storage, before an actual processing begins, in order to reduce an overall execution time (optimization).

C++

// Currently there is NO specific handling for 'short format' AND 'very specific' options
auto estimatePossibleLength = [](const size_t digitsPartSize, const bool fractPart,
                                 const LocaleSettings& localeSettings) throw() {
  // If processing by the one digit per time; EN GB uses 'nought' instead of 'zero'
  static const auto EN_US_AVG_CHAR_PER_DIGIT_NAME_ = size_t(4U); // 40 / 10 ['zero' - 'nine']
  static size_t AVG_SYMB_PER_DIGIT_[ELocale::COUNT]; // for ALL langs; if processing by triads

  struct ArrayIniter { // 'AVG_SYMB_PER_DIGIT_' initer
    ArrayIniter() throw() {
      //// All this value is a result of the statistical analysis
      AVG_SYMB_PER_DIGIT_[ELocale::L_EN_GB] = size_t(10U); // 'one hundred and twenty two thousand'
      AVG_SYMB_PER_DIGIT_[ELocale::L_EN_US] = size_t(9U);  // 'one hundred twenty two thousand'
      AVG_SYMB_PER_DIGIT_[ELocale::L_RU_RU] = size_t(8U);  // 'сто двадцать две тысячи'
    }
  }; static const ArrayIniter INITER_; // static init. is a thread safe in C++11

  static const auto RU_DELIM_LEN_ = size_t(5U); // "целых" / "целая"
  // Frequent postfixes (up to trillions: 'десятитриллионных')
  static const auto RU_MAX_FREQ_FRACT_POSTFIX_LEN_ = size_t(17U);

  switch (localeSettings.locale) {
    case ELocale::L_EN_US: case ELocale::L_EN_GB:
      if (!fractPart) return AVG_SYMB_PER_DIGIT_[localeSettings.locale] * digitsPartSize;
      // For the fract part [+1 for the spacer]
      return (EN_US_AVG_CHAR_PER_DIGIT_NAME_ + size_t(1U)) * digitsPartSize;
    case ELocale::L_RU_RU: // RU RU processes fract. part by the triads (like an int. part)
      {
        size_t len_ = AVG_SYMB_PER_DIGIT_[ELocale::L_RU_RU] * digitsPartSize;
        if (fractPart && digitsPartSize) len_ += RU_DELIM_LEN_ + RU_MAX_FREQ_FRACT_POSTFIX_LEN_;
        return len_;
      }
  }
  assert(false); // locale error
  return size_t();
};

Next ones does some language-specific action.

6) addFractionPrefix

Used for a fractional part preprocessing.

For english language it adds leading zeroes, which could otherwise be missed, due to the format (scientific representation) of data in the basic char. array. Does nothing for the russian language.

C++

auto addFractionPrefix = [&]() {
  switch (localeSettings.locale) {
    case ELocale::L_EN_US: case ELocale::L_EN_GB: // 'nought nought nought' for 1.0003
      {
        const char* postfix;
        for (auto leadingZeroIdx = size_t(); leadingZeroIdx < fractPartLeadingZeroesCount;) {
          assert(str.size()); // NOT empty
          str += delimiter;
          str += getZeroOrderNumberStr(size_t(), leadingZeroIdx, postfix, localeSettings);
          str += postfix;
          ++leadingZeroIdx;
        }
        return;
      }
    case ELocale::L_RU_RU: return; // NO specific prefix
  }
  assert(false); // locale error
};

7) addFractionEnding

Used to do a fraction postprocessing.

For russian language it appends specific ending (like "десятимиллионная") based on the order (of magnitude) of a fractional part (AND on some other params, like a two last digits). Does nothing for the english language.

C++

size_t currDigit, prevDigit;
// 'order' is an order of the last digit of a fractional part + 1 (1 based idx.)
//  [1 for the first, 2 for the second etc]
auto addFractionEnding = [&](const size_t orderExt) {
  if (folded) { // add postifx for the folded fraction
    auto const ending = getFoldedFractionEnding(localeSettings);
    if (*ending) { // if NOT empty
      str += delimiter;
      str += ending;
    }
    return;
  }
  //// Add 'normal' postifx
  switch (localeSettings.locale) {
    case ELocale::L_EN_US: case ELocale::L_EN_GB: break; // NO specific ending currently
    case ELocale::L_RU_RU: {
        auto toAdd = "";
        //// Add prefix / root
        assert(orderExt); // SHOULD NOT be zero
        const size_t subOrder = orderExt % size_t(3U);
        switch (subOrder) { // zero suborder - empty prefix
          case size_t(1U): // ДЕСЯТ ая(ых) | ДЕСЯТ И тысячная(ых) ДЕСЯТ И миллиардная(ых)
            toAdd = orderExt < size_t(3U) ? "десят" : "десяти"; break;
          case size_t(2U): // СОТ ая(ых) | СТО тысячная(ых) СТО миллиардная(ых)
            toAdd = orderExt < size_t(3U) ? "сот" : "сто"; break;
        }
        if (*toAdd) {
          str += delimiter;
          str += toAdd;
        }
        //// Add root (if NOT yet) + part of the postfix (if needed)
        if (orderExt > size_t(2U)) { // from 'тысяч н ая ых'
          if (!*toAdd) str += delimiter; // deim. is NOT added yet
          const char* temp;
          str += getOrderStr(orderExt, size_t(), size_t(), temp, localeSettings);
          str += "н"; // 'десят И тысяч Н ая ых', 'сто тысяч Н ая ых'
        }
        //// Add postfix
        assert(prevDigit < size_t(10U) && currDigit < size_t(10U));
        if (size_t(1U) == prevDigit) { // одинадцать двенадцать девятнадцать сотЫХ десятитысячнЫХ
          toAdd = "ых";
        } else { // NOT 1U prev. digit
          if (size_t(1U) == currDigit) {
            toAdd = "ая"; // одна двадцать одна десятАЯ, тридцать одна стотысячнАЯ
          } else toAdd = "ых"; // ноль десятых; двадцать две тридцать пять девяносто девять тясячнЫХ
        }
        str += toAdd;
      }
    break;
    default: // locale NOT present
      assert(false); // locale error
      str += "<locale error [" MAKE_STR_(__LINE__) "]>";
  }
};

8) processDigitOfATriad

This is 1 of the 3 main processing functions (along with the processDigitsPart AND processDigitsSubPart). Used to process individual digits from the subpart of size up to 3 (a triad), so the subOrder is a digit index within the subpart, which can be [0, 2]: zero for 9 in 639, 2 for 6 in the same subpart. order is an actual order of magnitude of the current digit (3 for 8 in 208417).

C++

// Also for 'and' in EN GB
const auto minDigitsSubPartSizeToAddOrder = getMinDigitsSubPartSizeToAddOrder(localeSettings);
auto totalAddedCount = size_t();
// ONLY up to 3 digits
auto processDigitOfATriad = [&](const size_t subOrder, const size_t order, size_t& currAddedCount,
                                const size_t normalDigitsSubPartSize, const bool fractPart) {
  auto addFirstToZeroOrderDelim = [&]() {
    char delim_;
    switch (localeSettings.locale) { // choose delim.
      case ELocale::L_EN_US: case ELocale::L_EN_GB: delim_ = '-'; break; // 'thirty-four'
      case ELocale::L_RU_RU: default: delim_ = delimiter; break; // 'тридцать четыре'
    }
    str += delim_;
  };
  auto addDelim = [&](const char delim) {
    if (ELocale::L_EN_GB == localeSettings.locale) {
      // In AMERICAN English, many students are taught NOT to use the word "and"
      //  anywhere in the whole part of a number
      if (totalAddedCount && normalDigitsSubPartSize >= minDigitsSubPartSizeToAddOrder) {
        str += delim;
        str += ENG_GB_VERBAL_DELIMITER;
      }
    }
    str += delim;
  };
  assert(subOrder < size_t(3U) && prevDigit < size_t(10U) && currDigit < size_t(10U));
  const char* infix, *postfix;
  switch (subOrder) {
    case size_t(): // ones ('three' / 'три') AND numbers like 'ten' / 'twelve'
      if (size_t(1U) == prevDigit) { // 'ten', 'twelve' etc
        if (!str.empty()) addDelim(delimiter); // if needed
        str += getFirstOrderNumberStr(currDigit, prevDigit, infix, postfix, localeSettings);
        str += infix, str += postfix;
        ++currAddedCount, ++totalAddedCount;
      } else if (currDigit || size_t(1U) == normalDigitsSubPartSize) { // prev. digit is NOT 1
        //// Simple digits like 'one'
        if (prevDigit) { // NOT zero
          assert(prevDigit > size_t(1U));
          addFirstToZeroOrderDelim();
        } else if (!str.empty()) addDelim(delimiter); // prev. digit IS zero
        str += getZeroOrderNumberStr(currDigit, order, postfix, localeSettings);
        str += postfix;
        ++currAddedCount, ++totalAddedCount;
      }
    break;

    case size_t(1U): // tens ['twenty' / 'двадцать']
      if (currDigit > size_t(1U)) { // numbers like ten / twelve would be proceeded later
        if (!str.empty()) addDelim(delimiter); // if needed
        str += getFirstOrderNumberStr(currDigit, size_t(), infix, postfix, localeSettings);
        str += infix, str += postfix;
        ++currAddedCount, ++totalAddedCount;
      } // if 'currDigit' is '1U' - skip (would be proceeded later)
    break;

    case size_t(2U): // hundred(s?)
      if (!currDigit) break; // zero = empty
      if (!str.empty()) str += delimiter; // if needed
      switch (localeSettings.locale) {
        case ELocale::L_EN_US: case ELocale::L_EN_GB: // 'three hundred'
          str += getZeroOrderNumberStr(currDigit, order, postfix, localeSettings);
          str += postfix;
          str += delimiter;
          {
            const char* postfix_; // NO postfix expected, just a placeholder var.
            str += getOrderStr(size_t(2U), size_t(0U), currDigit, postfix_, localeSettings);
            assert(postfix_ && !*postfix_);
          }
        break;
        case ELocale::L_RU_RU: // 'триста'
          str += getSecondOrderNumberStr(currDigit, infix, postfix, localeSettings);
          str += infix, str += postfix;
        break;
      }
      ++currAddedCount, ++totalAddedCount;
    break;
  } // 'switch (subOrder)' END
};

Tests

There are over 4k lines of tests (over 380 test cases) in the ConvertionUtilsTests module (see "TESTS" folder).

Test using Ideone online compiler:

C++

...

#include <iostream>
#include <string>

int main() {
  std::string str;
  ConvertionUtils::LocaleSettings localeSettings;
  auto errMsg = "";
  std::cout.precision(LDBL_DIG);
  
  auto num = 6437268689.4272L;
  localeSettings.locale = ConvertionUtils::ELocale::L_EN_US;
  ConvertionUtils::numToNumFormatStr(num, str, localeSettings, &errMsg);
  std::cout << num << " =>\n " << str << std::endl << std::endl;
  
  num = 1200.25672567L;
  str.clear();
  localeSettings.locale = ConvertionUtils::ELocale::L_EN_GB;
  localeSettings.foldFraction = true;
  localeSettings.verySpecific = true;
  ConvertionUtils::numToNumFormatStr(num, str, localeSettings, &errMsg);
  std::cout << num << " =>\n " << str << std::endl << std::endl;
  
  num = 1.0000300501L;
  str.clear();
  localeSettings.locale = ConvertionUtils::ELocale::L_RU_RU;
  ConvertionUtils::numToNumFormatStr(num, str, localeSettings, &errMsg);
  std::cout << num << " =>\n " << str << std::endl << std::endl;
  
  num = 9432654671318.0e45L;
  str.clear();
  localeSettings.shortFormat = true;
  localeSettings.locale = ConvertionUtils::ELocale::L_RU_RU;
  ConvertionUtils::numToNumFormatStr(num, str, localeSettings, &errMsg);
  std::cout << num << " =>\n " << str;
  
  return 0;
}

Result:

6437268689.4272 =>
 six billion four hundred thirty-seven million two hundred sixty-eight thousand six hundred eighty-nine point four two seven two

1200.25672567 =>
 twelve hundred point two five six seven repeating

1.0000300501 =>
 одна целая триста тысяч пятьсот одна десятимиллиардная

9.432654671318e+57 =>
 девять октодециллионов четыреста тридцать два септдециллиона шестьсот пятьдесят четыре седециллиона шестьсот семьдесят один квиндециллион триста восемнадцать кваттуордециллионо

Points of Interest

Developed strategy allows to extend module to support other languages, like spanish, for example: 0.333333333333 = "cero coma treinta y tres periodico".

The class is using FuncUtils, MathUtils, MacroUtils AND MemUtils modules.

This module [ConvertionUtils] is just a small part of the library, which uses C++11 features and which I am working under now, I decided to make it a public property.

If you saw ANY errors in the processing, please notify me here in the comments AND/OR on the GitHub.

History

License

This article, along with any associated source code and files, is licensed under The MIT License