Introduction
The subject is pretty self-descriptive. Most important thing about it is that it uses generalized strategy to process numbers
. While currently the module can convert numbers to the two different languages: russian (cyrillic) AND english (latinic), with support of the two english dialects: american AND british, in future such construction allows to extend the module to support even more cyrillic AND/OR latinic languages.
Background
Here i will list some helpful links to the online conversion tools
, which you can use to check spelling (including the module output).
English:
http://www.webmath.com/saynum.html
http://www.mathcats.com/explore/reallybignumbers.html
English AND russian:
http://eng5.ru/en/numbers_translation
http://prutzkow.com/numbers/index_en.htm
By the way, i found some bugs in this tools, so part of their output might be incorrect.
Also, information about numerals
in different languages:
English: http://en.wikipedia.org/wiki/English_numerals
Russian:
http://masterrussian.com/numbers/Russian_Numbers.htm
http://www.russianlessons.net/lessons/lesson2_main.php
Scales:
https://en.wikipedia.org/wiki/Names_of_large_numbers
https://en.wikipedia.org/wiki/Long_and_short_scales
Using the code
LocaleSettings
struct used to configure the conversion:
bool verySpecific = false;
bool positiveSign = false; bool shortFormat = false; bool foldFraction = false; ELocale locale = ELocale::L_EN_GB;
size_t precison = size_t(LDBL_DIG);
Flags:
1) verySpecific
For ENG GB, ENG US:
- replaces zero / nought with the 'o' letter (1.02 = "one point o two")
- enables specific handling of four-digit numbers with non-zero hundreds: they are often named using multiples of "hundred" AND combined with tens AND/OR ones ("one thousand one", "eleven hundred three", "twelve hundred twenty-five", "four thousand forty-two", or "ninety-nine hundred ninety-nine" etc)
* for ENG GB this style is common for multiples of 100 between 1,000 AND 2,000 (e.g. 1,500 as "fifteen hundred") BUT NOT for higher numbers.
2) positiveSign
: enables addition of explicit 'positive' / 'plus' / 'плюс' signature for the numbers > 0
Examples:
1.3 = "plus one point three" [EN GB]
1.181818181818 = "плюс одна целая и восемнадцать в периоде" [RU + foldFraction
]
3) shortFormat
: skip mentioning unexisting integral OR fractional part of the number
Examples:
0.0 = "zero" [EN US]
0.01 = "point zero one" [EN US]
999000.0 = "nine hundred and ninety-nine thousand" [EN GB]
4) foldFraction
: [ONLY for fractions] enables mechanism of finding repeated digits pattern in the fractional part of a number AND (if found) shortening it to the first occurrence with addition of periodic signature.
Examples:
EN GB + verySpecific
:
-7289.120912091209 = "minus seven thousand two hundred and eighty-nine point one two o nine repeating"
EN US + positiveSign
:
28364768.07310731 = "positive twenty-eight million three hundred sixty-four thousand seven hundred sixty-eight point zero seven three one to infinity"
Options:
1) precison
: maximum count of digits (in the fractional part) to process. Result number representation would be rounded to the last digit. Can be zero. Limited to the LDBL_DIG value. Trailing zeroes in the result number are ignored.
2) locale
: selected language OR language dialect. Value selected from the ELocale
enumeration (old C++ enum, NOT new C++11 enum class). Can have the following values:
L_RU_RU, L_EN_US, L_EN_GB,
The flags AND options can be combined in ANY combination, BUT some flags (OR options) can be ignored OR reinterpreted in some cases.
Example: verySpecific + positiveSign + shortFormat + foldFraction
0.0034013401 = "plus o point o o three four o one repeating" [EN GB]
As you can see, despite the shortFormat
flag is set, zero integral part is NOT ignored.
Function call interface + short description:
template<class TStrType, const bool ReserveBeforeAdding = true>
static bool numToNumFormatStr(long double num, TStrType& str,
LocaleSettings& localeSettings =
LocaleSettings::DEFAULT_LOCALE_SETTINGS,
const char** const errMsg = nullptr) {
errMsg
pointer can be used to get an error message (as a static const. POD C str.), explaining of what exaclty happened, if something occasionally goes wrong.
As you can see, different container types are supported here, all of them, however, should met the requirements:
'TStrType' SHOULD support operator '+=', 'empty' AND 'size' methods
Function adds numeral text to the existing content of the str
, delimiting it with the spacer if the container is not empty at the start of a function's work.
Conversion stages description
There are a total of four main steps.
1)
checking of the incoming value & treating it's sign
auto negativeNum = false;
if (num < 0.0L) {
negativeNum = true;
num = -num;
}
static const auto VAL_UP_LIMIT_ = 1e100L;
if (num >= VAL_UP_LIMIT_) {
if (errMsg) *errMsg = "too big value";
return false;
}
if (ELocale::L_RU_RU == localeSettings.locale) {
static const auto VAL_LOW_LIMIT_RU_ = 10.0L / VAL_UP_LIMIT_;
if (num && num < VAL_LOW_LIMIT_RU_) {
if (errMsg) *errMsg = "too low value";
return false;
}
}
const auto delimiter = DEFAULT_DELIMITER;
auto getSignStr = [](const ELocale locale, const bool positive) throw() -> const char* {
switch (locale) {
case ELocale::L_EN_US: return positive ? "positive" : "negative";
case ELocale::L_EN_GB: return positive ? "plus" : "minus";
case ELocale::L_RU_RU: return positive ? "плюс" : "минус";
}
assert(false);
return "<locale error [" MAKE_STR_(__LINE__) "]>";
};
if (negativeNum || (localeSettings.positiveSign && num)) {
if (!str.empty()) str += delimiter;
str += getSignStr(localeSettings.locale, !negativeNum);
}
if (truncated::ExecIfPresent(str)) {
if (errMsg) *errMsg = "too short buffer"; return false;
}
VAL_UP_LIMIT_
is involved here because of the getOrderStr
language-specific morphological lambda limitations for the russian language. This (AND others) labmda will be presented later in this article.
truncated::ExecIfPresent
is a special conditional optimization for the StaticallyBufferedString-like classes (if provided as a storage). It use Exec-If-Present idiom.
2)
getting the number representation as a char. array & analysing it
static const size_t MAX_DIGIT_COUNT_ = size_t(LDBL_DIG);
static const size_t MAX_STR_LEN_ = 6U + MAX_DIGIT_COUNT_;
static const size_t BUF_SIZE_ = AUTO_ADJUST_MEM(MAX_STR_LEN_ + 24U, 8U);
char strBuf[BUF_SIZE_];
if (localeSettings.precison > MAX_DIGIT_COUNT_) localeSettings.precison = MAX_DIGIT_COUNT_;
const ptrdiff_t len = sprintf(strBuf, "%.*Le", localeSettings.precison, num); if (len < static_cast<decltype(len)>(localeSettings.precison)) {
if (errMsg) *errMsg = "number to string convertion failed";
return false;
}
sprintf
is used here because of, comparing to the naive conversion way (applying series of simple arithmetic operations, like *, / AND %), it gives no (OR almost no) precision penalty (however, involving extra performance overhead). Function assumes that the resulted (received from sprintf
) representation will be in normalized form of scientific notation, but the code was designed (though NOT tested) to work even if the resulted output would not be normalized.
Analyzation process consists of gathering information about number representation (like exponent value in the scientific notation) AND separating char. array on parts (by adjusting specific pointers, like fractPartEnd
).
char* currSymbPtr; char* fractPartStart; char* fractPartEnd; long int expVal; auto fractPartLen = ptrdiff_t();
size_t intPartLen; size_t intPartBonusOrder; size_t fractPartLeadingZeroesCount; static const auto DECIMAL_DELIM_ = '.'; auto analyzeScientificNotationRepresentation = [&]() throw() {
currSymbPtr = strBuf + len - size_t(1U); static const auto EXP_SYMB_ = 'e';
while (EXP_SYMB_ != *currSymbPtr) {
--currSymbPtr; assert(currSymbPtr > strBuf);
}
fractPartEnd = currSymbPtr;
*currSymbPtr = '\0'; const char* errMsg;
const auto result = strToL(expVal, currSymbPtr + size_t(1U), errMsg);
assert(result);
fractPartStart = currSymbPtr - localeSettings.precison;
intPartLen = fractPartStart - strBuf;
assert(intPartLen);
if (localeSettings.precison) --intPartLen; assert((currSymbPtr - strBuf - int(localeSettings.precison) - 1) >= 0);
assert(localeSettings.precison ? DECIMAL_DELIM_ == *(strBuf + intPartLen) : true);
if (expVal < 0L) { if (static_cast<size_t>(-expVal) >= intPartLen) { fractPartLeadingZeroesCount = -(expVal + static_cast<long int>(intPartLen));
intPartLen = size_t(); } else { intPartLen += expVal; fractPartLeadingZeroesCount = size_t();
}
intPartBonusOrder = size_t();
if (localeSettings.precison) --fractPartLen; } else { const auto additive =
std::min<decltype(localeSettings.precison)>(expVal, localeSettings.precison);
intPartLen += additive;
fractPartLeadingZeroesCount = size_t();
intPartBonusOrder = expVal - additive;
}
};
analyzeScientificNotationRepresentation();
currSymbPtr = strBuf + intPartLen +
(expVal > decltype(expVal)() ? size_t(1U) : size_t());
After the main analysis is finished, fractional part (if exist) of the number will be precisely inspected to determine if there are meaningless trailing zeros presented AND (if required) if the fractional part consist of some repeated pattern.
auto fractPartTrailingZeroesCount = size_t(), fractPartAddedCount = size_t();
char* fractPartRealStart;
auto folded = false; auto calcFractPartRealLen = [&]() throw() {
if (DECIMAL_DELIM_ == *currSymbPtr) ++currSymbPtr; assert(fractPartEnd >= currSymbPtr); fractPartRealStart = currSymbPtr;
fractPartLen += fractPartEnd - currSymbPtr; assert(fractPartLen >= ptrdiff_t()); if (!fractPartLen) return; auto fractPartCurrEnd = fractPartEnd - size_t(1U); while ('0' == *fractPartCurrEnd && fractPartCurrEnd >= currSymbPtr) --fractPartCurrEnd;
assert(fractPartCurrEnd >= strBuf); fractPartTrailingZeroesCount = fractPartEnd - fractPartCurrEnd - size_t(1U);
assert(fractPartLeadingZeroesCount >= size_t() &&
fractPartLen >= static_cast<ptrdiff_t>(fractPartTrailingZeroesCount));
fractPartLen -= fractPartTrailingZeroesCount;
if (fractPartLen > size_t(1U) && localeSettings.foldFraction) {
assert(fractPartStart && fractPartStart > strBuf); if (fractPartRealStart < fractPartStart) { currSymbPtr = fractPartStart - size_t(1U);
assert(*currSymbPtr == DECIMAL_DELIM_);
while (currSymbPtr > fractPartRealStart)
*currSymbPtr-- = *(currSymbPtr - size_t(1U)); *currSymbPtr = '\0';
fractPartRealStart = currSymbPtr + size_t(1U); assert(fractPartLen);
}
if (fractPartLen > size_t(1U)) {
const auto patternLen = tryFindPattern(fractPartRealStart, fractPartLen);
if (patternLen) {
fractPartLen = patternLen; folded = true;
}
}
}
};
calcFractPartRealLen(); assert(fractPartLen ? localeSettings.precison : true);
const auto fractPartWillBeMentioned = fractPartLen || !localeSettings.shortFormat;
currSymbPtr = strBuf;
Recognition of the repeated pattern (which may be presented in a fractional part) performed by the step-by-step sequential scanning.
auto testPattern = [](const char* const str, const char* const strEnd,
const size_t patternSize) throw() {
assert(str); auto equal = true;
auto nextOccurance = str + patternSize;
while (true) {
if (memcmp(str, nextOccurance, patternSize)) return nextOccurance; nextOccurance += patternSize;
if (nextOccurance >= strEnd) return decltype(nextOccurance)(); }
};
auto tryFindPattern = [&](const char* const str, const size_t totalLen) throw() {
const size_t maxPatternLen = totalLen / size_t(2U);
auto const strEnd = str + totalLen; for (auto patternSize = size_t(1U); patternSize <= maxPatternLen; ++patternSize) {
if (totalLen % patternSize) continue; if (!testPattern(str, strEnd, patternSize)) return patternSize;
}
return size_t();
};
For example, having 1.23452345 number, first we test if the fractional part consists only of repeated 2 (no), then if only of repeated 23 (wrong again), 234 is next (nope), AND finally 2345 hit the spot. Such inspection performed if only fractional part exist AND only by the explicit request of the user (disabled by default).
3)
processing integral part of the number
This is the first step, when all preparation is finished AND where the real processing starts.
processDigitsPart(intPartLen, getIntSubPartSize(), intPartBonusOrder, false);
if (truncated::ExecIfPresent(str)) { if (errMsg) *errMsg = "too short buffer"; return false;
}
if (intPartLen) { assert(currSymbPtr > strBuf);
intPartLastDigit = *(currSymbPtr - ptrdiff_t(1)) - '0';
assert(intPartLastDigit > ptrdiff_t(-1) && intPartLastDigit < ptrdiff_t(10));
if (intPartLen > size_t(1U)) { auto intPartPreLastDigitPtr = currSymbPtr - ptrdiff_t(2);
if (DECIMAL_DELIM_ == *intPartPreLastDigitPtr) --intPartPreLastDigitPtr; assert(intPartPreLastDigitPtr >= strBuf); intPartPreLastDigit = *intPartPreLastDigitPtr - '0';
assert(intPartPreLastDigit > ptrdiff_t(-1) && intPartPreLastDigit < ptrdiff_t(10));
}
}
strLenWithoutFractPart = str.size(); intPartAddedCount = addedCount;
addedCount = decltype(addedCount)();
Both integral AND fractional parts are processed by the processDigitsPart
generic processing lambda. This unified processing strategy will be presented later in this article.
After the main processing, two additional internal parameters: intPartLastDigit
AND intPartPreLastDigit
are also determined - they are required for a russian language processing, to choose an appropriate ending for the int. part AND for a fraction delimiter:
5.1 = "пять целых одна десятая"
1.5 = "одна целая пять десятых"
1 = "один" [shortFormat
]
4)
processing fractional part of the number
if (fractPartLen) {
addFractionDelimiter();
addFractionPrefix(); currSymbPtr = fractPartRealStart; }
processDigitsPart(fractPartLen, getFractSubPartSize(localeSettings), size_t(), true);
if (addedCount) { fractPartAddedCount = addedCount;
assert(fractPartLen >= decltype(fractPartLen)());
size_t fractPartLastDigitOrderExt = fractPartLeadingZeroesCount + fractPartLen;
if (!fractPartLastDigitOrderExt) fractPartLastDigitOrderExt = size_t(1U); addFractionEnding(fractPartLastDigitOrderExt);
}
assert(totalAddedCount); if (truncated::ExecIfPresent(str)) { if (errMsg) *errMsg = "too short buffer"; return false;
} return true;
addFractionDelimiter
is another generic processing lambda, while addFractionPrefix
is a language-specific processing lambda (this types of lambdas will be soon described more precisely).
addFractionDelimiter
is obviously used to add fraction separator.
addFractionPrefix
is used to add some language-specific content before starting an actual processing of the fractional part. For example, for english language it is leading zeros - in the scientific notation they might NOT be presented in the processed char. array: 0.0037 would be represented as "3.7e-3" (normalized form), so those zeros would NOT be processed during the main processing cycle AND so have to be added elsewhere.
There are three groups of lambdas, which was't described yet AND which is used durring the convertion process:
1) language-specific lambdas
: their run time behavior is heavily based on the selected language
a) morphological lambdas
: provides morphems of the selected language
b) processing lambdas
: used to configure generic processing lambdas
based on the language
2) generic processing lambdas
: their internal logic is totally independent from the selected language, however, their execution process are configured by the language-specific processing lambdas
Now we'll talk about all those functions.
Language-specific morphological lambdas
In fact, this functions represents the exact language. They provide a morphems used to construct the resulted numeral.
Each word can have up to a 3 morphems (affixes) in addition to the root:
1)
prefix: placed before the stem of a word
2)
infix: inserted inside a word stem
OR
interfix: [linkage] placed in between two morphemes AND does NOT have a semantic meaning
3)
postfix: (suffix OR ending) placed after the stem of a word
Word = [prefix]<root>[infix / interfix][postfix (suffix, ending)]
Each function returns root AND can optionally provide infix AND/OR postfix.
Do NOT consider, however, the returned values to be the root / the postfix etc in the exact linguistic meaning (as a morphemes gained from the correct AND proper morphological analysis). Consider them to be a "root" / a "postfix" specific to the current project.
1) getZeroOrderNumberStr
Returns numerals for numbers 0 - 9 (step 1) in the form of root + postfix.
Examples: "th" + "ree" (3), "вос" + "емь" (8)
auto getZeroOrderNumberStr = [&](const size_t currDigit, const size_t order, const char*& postfix,
const LocaleSettings& localeSettings) throw() -> const char* {
static const char* const EN_TABLE[] = {"", "one", "tw", "th", "fo", "fi", "six", "seven", "eigh", "nine"};
static const char* const EN_POSTFIXES[] = {"", "", "o", "ree", "ur", "ve", "", "", "t", ""};
static const char* const RU_TABLE[] =
{"нол", "од", "дв", "тр", "четыр", "пят", "шест", "сем", "вос", "девят"};
static const char* const RU_POSTFIXES[] = {"ь", "ин", "а", "и", "е", "ь", "ь", "ь", "емь", "ь"};
static_assert(sizeof(EN_TABLE) == sizeof(RU_TABLE) && sizeof(EN_TABLE) == sizeof(EN_POSTFIXES) &&
sizeof(RU_TABLE) == sizeof(RU_POSTFIXES) &&
size_t(10U) == std::extent<decltype(EN_TABLE)>::value,
"Tables SHOULD have the same size (10)");
assert(currDigit < std::extent<decltype(EN_TABLE)>::value); switch (localeSettings.locale) {
case ELocale::L_EN_US: case ELocale::L_EN_GB:
postfix = EN_POSTFIXES[currDigit];
if (!currDigit) { if (localeSettings.verySpecific) return "o"; return localeSettings.locale == ELocale::L_EN_US ? "zero" : "nought";
}
return EN_TABLE[currDigit];
case ELocale::L_RU_RU:
postfix = "";
switch (order) {
case size_t(0U): if (!fractPartWillBeMentioned) break;
case size_t(3U): switch (currDigit) {
case size_t(1U): postfix = "на"; break; case size_t(2U): postfix = "е"; break; }
break;
}
if (!*postfix) postfix = RU_POSTFIXES[currDigit]; return RU_TABLE[currDigit];
}
assert(false); return "<locale error [" MAKE_STR_(__LINE__) "]>";
};
2) getFirstOrderNumberStr
Returns numerals for numbers 10 - 19 (step 1) AND 20 - 90 (step 10) in the form of root + infix + postfix.
Example: "дв" + "адцат" + "ь" (20)
auto getFirstOrderNumberStr = [&](const size_t currDigit, const size_t prevDigit,
const char*& infix, const char*& postfix,
const LocaleSettings& localeSettings) throw() -> const char* {
static const char* const EN_SUB_TABLE[] = {"ten", "eleven"}; static const char* const EN_SUB_INFIXES[] = {"", "", "", "ir", "ur", "f", "", "", "", ""};
#define ESP_ "teen" // EN_SUB_POSTFIX
static const char* const EN_SUB_POSTFIXES[] = {"", "", "elve", ESP_, ESP_, ESP_, ESP_, ESP_, ESP_, ESP_}; static const char* const EN_MAIN_INFIXES[] = {"", "", "en", "ir", "r", "f", "", "", "", ""};
#define R23I_ "дцат" // RU_20_30_INFIX [+ь]
#define RT1I_ "на" R23I_ // RU_TO_19_INFIX [на+дцат+ь]
static const char* const RU_SUB_INFIXES[] = {"", "ин" RT1I_, "е" RT1I_, "и" RT1I_, RT1I_, RT1I_, RT1I_, RT1I_, "ем" RT1I_, RT1I_};
#define R5T8I_ "ьдесят" // RU_50_TO_80_INFIX [NO postfix]
static const char* const RU_MAIN_INFIXES[] = {"", "", "а" R23I_, "и" R23I_, "", R5T8I_, R5T8I_, R5T8I_, "ем" R5T8I_, ""}; static const char* const RU_MAIN_POSTFIXES[] = {"", "", "ь", "ь", "", "", "", "", "", "о"};
static_assert(sizeof(EN_SUB_INFIXES) == sizeof(EN_MAIN_INFIXES) &&
sizeof(EN_SUB_POSTFIXES) == sizeof(RU_MAIN_POSTFIXES) &&
sizeof(RU_SUB_INFIXES) == sizeof(RU_MAIN_INFIXES), "Tables SHOULD have the same size");
assert(prevDigit < std::extent<decltype(EN_SUB_POSTFIXES)>::value); assert(currDigit < std::extent<decltype(EN_SUB_POSTFIXES)>::value);
switch (localeSettings.locale) {
case ELocale::L_EN_US: case ELocale::L_EN_GB:
switch (prevDigit) {
case size_t(1U): infix = EN_SUB_INFIXES[currDigit], postfix = EN_SUB_POSTFIXES[currDigit];
if (currDigit < size_t(2U)) return EN_SUB_TABLE[currDigit]; break;
default: assert(!prevDigit && currDigit > size_t(1U));
infix = EN_MAIN_INFIXES[currDigit], postfix = "ty"; break;
}
break;
case ELocale::L_RU_RU:
switch (prevDigit) {
case size_t(1U): infix = RU_SUB_INFIXES[currDigit], postfix = "ь"; if (!currDigit) return "десят";
break;
default: assert(currDigit > size_t(1U));
infix = RU_MAIN_INFIXES[currDigit], postfix = RU_MAIN_POSTFIXES[currDigit];
switch (currDigit) {
case size_t(4U): return "сорок"; case size_t(9U): return "девяност"; }
break;
}
break;
default: assert(false); return "<locale error [" MAKE_STR_(__LINE__) "]>";
} const char* tempPtr;
return getZeroOrderNumberStr(currDigit, size_t(), tempPtr, localeSettings);
};
3) getSecondOrderNumberStr
Returns numerals for numbers 100 - 900 (step 100) in the form of root + infix + postfix.
Examples: "fi" + "ve" + " hundred" (500), "дв" + "е" + "сти" (200)
auto getSecondOrderNumberStr = [&](const size_t currDigit, const char*& infix, const char*& postfix,
const LocaleSettings& localeSettings) throw() -> const char* {
static const char* const RU_POSTFIXES[] =
{"", "", "сти", "ста", "ста", "сот", "сот", "сот", "сот", "сот"};
static_assert(size_t(10U) == std::extent<decltype(RU_POSTFIXES)>::value,
"Table SHOULD have the size of 10");
assert(currDigit && currDigit < std::extent<decltype(RU_POSTFIXES)>::value);
switch (localeSettings.locale) {
case ELocale::L_EN_US: case ELocale::L_EN_GB:
postfix = " hundred";
return getZeroOrderNumberStr(currDigit, size_t(), infix, localeSettings);
case ELocale::L_RU_RU:
postfix = RU_POSTFIXES[currDigit];
switch (currDigit) {
case size_t(1U): infix = ""; return "сто"; break;
case size_t(2U): {
const char* temp;
infix = "е"; return getZeroOrderNumberStr(currDigit, size_t(), temp, localeSettings); }
}
return getZeroOrderNumberStr(currDigit, size_t(), infix, localeSettings);
} assert(false); return "<locale error [" MAKE_STR_(__LINE__) "]>";
};
4) getOrderStr
: returns name of the large number based on its order
Uses short scale for the english language (both american AND british).
auto getOrderStr = [](size_t order, const size_t preLastDigit, const size_t lastDigit,
const char*& postfix, const LocaleSettings& localeSettings)
throw() -> const char* {
static const char* const EN_TABLE[] = {"", "thousand", "million", "billion", "trillion", "quadrillion", "quintillion", "sextillion",
"septillion", "octillion", "nonillion", "decillion", "undecillion", "duodecillion" ,
"tredecillion", "quattuordecillion", "quindecillion", "sedecillion", "septendecillion",
"octodecillion", "novemdecillion ", "vigintillion", "unvigintillion", "duovigintillion",
"tresvigintillion", "quattuorvigintillion", "quinquavigintillion", "sesvigintillion",
"septemvigintillion", "octovigintillion", "novemvigintillion", "trigintillion" ,
"untrigintillion", "duotrigintillion"};
static const char* const RU_TABLE[] = {"", "тысяч", "миллион", "миллиард" , "триллион" ,
"квадриллион" , "квинтиллион" ,
"секстиллион" , "септиллион" , "октиллион", "нониллион",
"дециллион", "ундециллион", "додециллион", "тредециллион", "кваттуордециллион" ,
"квиндециллион", "седециллион", "септдециллион", "октодециллион", "новемдециллион",
"вигинтиллион", "анвигинтиллион", "дуовигинтиллион", "тревигинтиллион", "кватторвигинтиллион",
"квинвигинтиллион", "сексвигинтиллион", "септемвигинтиллион", "октовигинтиллион" ,
"новемвигинтиллион", "тригинтиллион", "антригинтиллион", "дуотригинтиллион"}; static_assert(sizeof(EN_TABLE) == sizeof(RU_TABLE), "Tables SHOULD have the same size");
static const size_t MAX_ORDER_ =
(std::extent<decltype(EN_TABLE)>::value - size_t(1U)) * size_t(3U);
static const char* const RU_THOUSAND_POSTFIXES[] = {"", "а", "и", "и", "и", "", "", "", "", ""};
static const char* const RU_MILLIONS_AND_BIGGER_POSTFIXES[] = {"ов", "", "а", "а", "а", "ов", "ов", "ов", "ов", "ов"};
static_assert(size_t(10U) == std::extent<decltype(RU_THOUSAND_POSTFIXES)>::value &&
size_t(10U) == std::extent<decltype(RU_MILLIONS_AND_BIGGER_POSTFIXES)>::value,
"Tables SHOULD have the size of 10");
switch (localeSettings.locale) {
case ELocale::L_EN_US: case ELocale::L_EN_GB:
postfix = "";
if (size_t(2U) == order) return "hundred"; order /= 3U; assert(order < std::extent<decltype(EN_TABLE)>::value);
return EN_TABLE[order]; case ELocale::L_RU_RU:
assert(preLastDigit < size_t(10U) && lastDigit < size_t(10U));
if (size_t(3U) == order) { if (size_t(1U) != preLastDigit) {
postfix = RU_THOUSAND_POSTFIXES[lastDigit];
} else postfix = ""; } else if (order > size_t(3U)) { if (size_t(1U) == preLastDigit) { postfix = "ов";
} else postfix = RU_MILLIONS_AND_BIGGER_POSTFIXES[lastDigit];
}
order /= 3U; assert(order < std::extent<decltype(RU_TABLE)>::value);
return RU_TABLE[order]; }
assert(false); return "<locale error [" MAKE_STR_(__LINE__) "]>";
};
5) getFractionDelimiter
Returns POD C str., which represents the fractional separator used in the selected language.
auto getFractionDelimiter = [](const ptrdiff_t intPartPreLastDigit, const ptrdiff_t intPartLastDigit,
const char*& postfix, const bool folded,
const LocaleSettings& localeSettings) throw() -> const char* {
assert(intPartPreLastDigit < ptrdiff_t(10) && intPartLastDigit < ptrdiff_t(10));
postfix = "";
switch (localeSettings.locale) {
case ELocale::L_EN_US: case ELocale::L_EN_GB: return "point"; case ELocale::L_RU_RU: if (intPartLastDigit < ptrdiff_t() && localeSettings.shortFormat) return ""; if (folded) postfix = "и";
return ptrdiff_t(1) == intPartLastDigit ?
(ptrdiff_t(1) == intPartPreLastDigit ? "целых" : "целая") : "целых"; }
assert(false); return "<locale error [" MAKE_STR_(__LINE__) "]>";
};
6) getFoldedFractionEnding
If the number had a fractional part with the repeated pattern, which was folded, this specific ending would be added to the end of the numerical string, to indicate pattern reoccurrence.
auto getFoldedFractionEnding = [](const LocaleSettings& localeSettings) throw() {
switch (localeSettings.locale) {
case ELocale::L_EN_US: return "to infinity"; case ELocale::L_EN_GB: return "repeating"; case ELocale::L_RU_RU: return "в периоде";
}
assert(false); return "<locale error [" MAKE_STR_(__LINE__) "]>";
};
Generic processing lambdas
As i have already said, this one are language-independent AND used to process both integral
AND fractional
parts of the number (one per time).
1) processDigitsPart
: main processing cycle
size_t intPartAddedCount, strLenWithoutFractPart;
auto processDigitsPart = [&](size_t digitsPartSize, const size_t digitsSubPartSize,
size_t partBonusOrder, const bool fractPart) {
currDigit = size_t(), prevDigit = size_t(); if (digitsPartSize) {
assert(digitsSubPartSize); size_t currDigitsSubPartSize =
(digitsPartSize + partBonusOrder) % digitsSubPartSize; if (!currDigitsSubPartSize) currDigitsSubPartSize = digitsSubPartSize; auto subPartOrderExt = size_t();
if (ReserveBeforeAdding) str.reserve(str.length() + estimatePossibleLength(digitsPartSize, fractPart, localeSettings));
do {
if (currDigitsSubPartSize > digitsPartSize) { subPartOrderExt = currDigitsSubPartSize - digitsPartSize;
partBonusOrder -= subPartOrderExt;
currDigitsSubPartSize = digitsPartSize; }
digitsPartSize -= currDigitsSubPartSize;
processDigitsSubPart(currDigitsSubPartSize, digitsSubPartSize,
digitsPartSize + partBonusOrder, subPartOrderExt, fractPart);
currDigitsSubPartSize = digitsSubPartSize; } while (digitsPartSize);
}
auto mentionZeroPart = [&]() {
if (!str.empty()) str += delimiter;
const char* postfix;
str += getZeroOrderNumberStr(size_t(), size_t(), postfix, localeSettings);
str += postfix;
++totalAddedCount;
};
if (!addedCount) { if (!localeSettings.shortFormat || folded) { if (fractPart) {
addFractionDelimiter(); } else intPartLastDigit = ptrdiff_t(); mentionZeroPart();
++addedCount;
} else if (fractPart) { assert(!folded); assert(strLenWithoutFractPart <= str.size()); if (!intPartAddedCount) { mentionZeroPart(); }
}
}
};
This function takes a part of the number, for example, 1278 from 1278.45 AND process it by the subparts of the speicified size (currently 3, 2 OR 1). Considering digitsSubPartSize
= 2, there will be two such subparts: 12 AND 78. Each such subpart is processed by the other generic processing lambda: processDigitsPart
(see below).
In fact, processDigitsPart
performs a series of calls to the processDigitsPart
function, correctly separating the part on subparts, until the are no more subparts remains, also performing special action in the end, if there are nothing was actually added (in order to correctly process numbers like 0.0 with the shortFormat
flag turned ON AND some other specific cases).
This function also use estimatePossibleLength
language-specific processing lambda (will be described later) AND addFractionDelimiter
generic processing lambda (already mentioned, will be precisely described later).
2) processDigitsSubPart
: subprocessing cycle
Process subpart, received from the parent cycle (processDigitsPart
). Both two this functions are closures, which actually aren't processing any real number, they are, of course, processing the strBuf
char. array, which was previously filled by the sprintf
function during stage 1 of the conversion (see 'Conversion stages description' section above).
auto addedCount = size_t(); auto emptySubPartsCount = size_t();
auto processDigitsSubPart = [&](const size_t currDigitsSubPartSize,
const size_t normalDigitsSubPartSize,
const size_t order, size_t subPartOrderExt, const bool fractPart) {
assert(currDigitsSubPartSize && currDigitsSubPartSize <= size_t(3U));
auto currAddedCount = size_t(); auto emptySubPart = true; prevDigit = std::decay<decltype(prevDigit)>::type(); for (size_t subOrder = currDigitsSubPartSize - size_t(1U);;) {
if (DECIMAL_DELIM_ != *currSymbPtr) { currDigit = *currSymbPtr - '0'; PPOCESS_DIGIT_:
assert(*currSymbPtr >= '0' && currDigit < size_t(10U));
emptySubPart &= !currDigit;
processDigitOfATriad(subOrder + subPartOrderExt, order, currAddedCount,
normalDigitsSubPartSize, fractPart);
if (subPartOrderExt) { --subPartOrderExt;
prevDigit = currDigit;
currDigit = std::decay<decltype(currDigit)>::type(); goto PPOCESS_DIGIT_; }
if (!subOrder) { ++currSymbPtr; break;
}
--subOrder, prevDigit = currDigit;
}
++currSymbPtr;
}
if (emptySubPart) ++emptySubPartsCount; if (currAddedCount && normalDigitsSubPartSize >= minDigitsSubPartSizeToAddOrder) {
const char* postfix;
auto const orderStr = getOrderStr(order, prevDigit, currDigit, postfix, localeSettings);
assert(orderStr && postfix);
if (*orderStr) { assert(str.size()); str += delimiter, str += orderStr, str += postfix;
++currAddedCount;
}
}
addedCount += currAddedCount;
};
This function calls processDigitOfATriad
language-specific processing lambda for the each digit in the processed subpart.
As it is obvious of the name AND listing of a function, it usually used to process subparts of size = 3. Actually, it can process subparts of size 1, 2, OR 3 (AND all those sizes are really required at some point).
When all digits of the subpart are processed, function appends order string (like "thousand") if it is needed. This event occurs only if we process subparts of at least minDigitsSubPartSizeToAddOrder
size, which is setted by the call to a getMinDigitsSubPartSizeToAddOrder
language-specific processing lambda (would be presented in the next section of an article).
3) addFractionDelimiter
A very simple function, used to correctly separate integral
AND fractional
parts of the number.
auto intPartPreLastDigit = ptrdiff_t(-1), intPartLastDigit = ptrdiff_t(-1); auto addFractionDelimiter = [&]() {
const char* postfix;
auto const fractionDelim =
getFractionDelimiter(intPartPreLastDigit, intPartLastDigit, postfix, folded, localeSettings);
if (*fractionDelim) { if (!str.empty()) str += delimiter;
str += fractionDelim;
}
if (*postfix) {
if (*fractionDelim) str += delimiter;
str += postfix;
}
};
Language-specific processing lambdas
Final pack of lambdas, used during the processing.
The following ones are used to configure the conversion strategy, based on the selected language.
1) getMinDigitsSubPartSizeToAddOrder
Returns the minimal subpart size, for which an order string (like "hundred" OR "thousand" for english) should be appended during the conversion.
For example, for english again, when processing 1256 by subparts of size = 2, we would append "hundred" after 12, while processing the same number by subparts of size = 1, we would append nothing.
auto getMinDigitsSubPartSizeToAddOrder = [](const LocaleSettings& localeSettings) throw() {
switch (localeSettings.locale) {
case ELocale::L_EN_US: case ELocale::L_EN_GB: return size_t(2U); case ELocale::L_RU_RU: return size_t(3U); }
assert(false); return size_t();
};
2) getSpecificCaseSubPartSize
Returns the subpart size, when there is some specific processing required. You can see the samples of such specific cases in the function's listing.
auto getSpecificCaseSubPartSize = [](const long double& num,
const LocaleSettings& localeSettings) throw() {
switch (localeSettings.locale) {
case ELocale::L_EN_US:
if (num < 10000.0L) {
bool zeroTensAndOnes;
const auto hundreds =
MathUtils::getDigitOfOrder(size_t(2U), static_cast<long long int>(num), zeroTensAndOnes);
if (hundreds && !zeroTensAndOnes) return size_t(2U); }
break;
case ELocale::L_EN_GB:
if (num >= 1000.0L && num < 2001.0L) {
if (!(static_cast<size_t>(num) % size_t(100U))) return size_t(2U); }
break;
}
return size_t();
};
3) getIntSubPartSize
Returns the subpart size, when processing an integral part of the number.
auto getIntSubPartSize = [&]() throw() {
auto subPartSize = size_t();
if (localeSettings.verySpecific)
subPartSize = getSpecificCaseSubPartSize(num, localeSettings); if (!subPartSize) { switch (localeSettings.locale) { case ELocale::L_EN_US: case ELocale::L_EN_GB: case ELocale::L_RU_RU: subPartSize = size_t(3U);
}
}
return subPartSize;
};
4) getFractSubPartSize
Returns the subpart size, when processing a fractional part of the number.
auto getFractSubPartSize = [](const LocaleSettings& localeSettings) throw() {
switch (localeSettings.locale) {
case ELocale::L_EN_US: case ELocale::L_EN_GB:
return size_t(1U); case ELocale::L_RU_RU: return size_t(3U); }
assert(false); return size_t();
};
5) estimatePossibleLength
A heuristic function, used to predict the possible length of the string, that would represent the targeted number's part. It used to optionally preallocate memory for the provided storage, before an actual processing begins, in order to reduce an overall execution time (optimization).
auto estimatePossibleLength = [](const size_t digitsPartSize, const bool fractPart,
const LocaleSettings& localeSettings) throw() {
static const auto EN_US_AVG_CHAR_PER_DIGIT_NAME_ = size_t(4U); static size_t AVG_SYMB_PER_DIGIT_[ELocale::COUNT];
struct ArrayIniter { ArrayIniter() throw() {
AVG_SYMB_PER_DIGIT_[ELocale::L_EN_GB] = size_t(10U); AVG_SYMB_PER_DIGIT_[ELocale::L_EN_US] = size_t(9U); AVG_SYMB_PER_DIGIT_[ELocale::L_RU_RU] = size_t(8U); }
}; static const ArrayIniter INITER_;
static const auto RU_DELIM_LEN_ = size_t(5U); static const auto RU_MAX_FREQ_FRACT_POSTFIX_LEN_ = size_t(17U);
switch (localeSettings.locale) {
case ELocale::L_EN_US: case ELocale::L_EN_GB:
if (!fractPart) return AVG_SYMB_PER_DIGIT_[localeSettings.locale] * digitsPartSize;
return (EN_US_AVG_CHAR_PER_DIGIT_NAME_ + size_t(1U)) * digitsPartSize;
case ELocale::L_RU_RU: {
size_t len_ = AVG_SYMB_PER_DIGIT_[ELocale::L_RU_RU] * digitsPartSize;
if (fractPart && digitsPartSize) len_ += RU_DELIM_LEN_ + RU_MAX_FREQ_FRACT_POSTFIX_LEN_;
return len_;
}
}
assert(false); return size_t();
};
Next ones does some language-specific action.
6) addFractionPrefix
Used for a fractional part preprocessing.
For english language it adds leading zeroes, which could otherwise be missed, due to the format (scientific representation) of data in the basic char. array. Does nothing for the russian language.
auto addFractionPrefix = [&]() {
switch (localeSettings.locale) {
case ELocale::L_EN_US: case ELocale::L_EN_GB: {
const char* postfix;
for (auto leadingZeroIdx = size_t(); leadingZeroIdx < fractPartLeadingZeroesCount;) {
assert(str.size()); str += delimiter;
str += getZeroOrderNumberStr(size_t(), leadingZeroIdx, postfix, localeSettings);
str += postfix;
++leadingZeroIdx;
}
return;
}
case ELocale::L_RU_RU: return; }
assert(false); };
7) addFractionEnding
Used to do a fraction postprocessing.
For russian language it appends specific ending (like "десятимиллионная") based on the order (of magnitude) of a fractional part (AND on some other params, like a two last digits). Does nothing for the english language.
size_t currDigit, prevDigit;
auto addFractionEnding = [&](const size_t orderExt) {
if (folded) { auto const ending = getFoldedFractionEnding(localeSettings);
if (*ending) { str += delimiter;
str += ending;
}
return;
}
switch (localeSettings.locale) {
case ELocale::L_EN_US: case ELocale::L_EN_GB: break; case ELocale::L_RU_RU: {
auto toAdd = "";
assert(orderExt); const size_t subOrder = orderExt % size_t(3U);
switch (subOrder) { case size_t(1U): toAdd = orderExt < size_t(3U) ? "десят" : "десяти"; break;
case size_t(2U): toAdd = orderExt < size_t(3U) ? "сот" : "сто"; break;
}
if (*toAdd) {
str += delimiter;
str += toAdd;
}
if (orderExt > size_t(2U)) { if (!*toAdd) str += delimiter; const char* temp;
str += getOrderStr(orderExt, size_t(), size_t(), temp, localeSettings);
str += "н"; }
assert(prevDigit < size_t(10U) && currDigit < size_t(10U));
if (size_t(1U) == prevDigit) { toAdd = "ых";
} else { if (size_t(1U) == currDigit) {
toAdd = "ая"; } else toAdd = "ых"; }
str += toAdd;
}
break;
default: assert(false); str += "<locale error [" MAKE_STR_(__LINE__) "]>";
}
};
8) processDigitOfATriad
This is 1 of the 3 main processing functions (along with the processDigitsPart
AND processDigitsSubPart
). Used to process individual digits from the subpart of size up to 3 (a triad), so the subOrder
is a digit index within the subpart, which can be [0, 2]: zero for 9 in 639, 2 for 6 in the same subpart. order
is an actual order of magnitude of the current digit (3 for 8 in 208417).
const auto minDigitsSubPartSizeToAddOrder = getMinDigitsSubPartSizeToAddOrder(localeSettings);
auto totalAddedCount = size_t();
auto processDigitOfATriad = [&](const size_t subOrder, const size_t order, size_t& currAddedCount,
const size_t normalDigitsSubPartSize, const bool fractPart) {
auto addFirstToZeroOrderDelim = [&]() {
char delim_;
switch (localeSettings.locale) { case ELocale::L_EN_US: case ELocale::L_EN_GB: delim_ = '-'; break; case ELocale::L_RU_RU: default: delim_ = delimiter; break; }
str += delim_;
};
auto addDelim = [&](const char delim) {
if (ELocale::L_EN_GB == localeSettings.locale) {
if (totalAddedCount && normalDigitsSubPartSize >= minDigitsSubPartSizeToAddOrder) {
str += delim;
str += ENG_GB_VERBAL_DELIMITER;
}
}
str += delim;
};
assert(subOrder < size_t(3U) && prevDigit < size_t(10U) && currDigit < size_t(10U));
const char* infix, *postfix;
switch (subOrder) {
case size_t(): if (size_t(1U) == prevDigit) { if (!str.empty()) addDelim(delimiter); str += getFirstOrderNumberStr(currDigit, prevDigit, infix, postfix, localeSettings);
str += infix, str += postfix;
++currAddedCount, ++totalAddedCount;
} else if (currDigit || size_t(1U) == normalDigitsSubPartSize) { if (prevDigit) { assert(prevDigit > size_t(1U));
addFirstToZeroOrderDelim();
} else if (!str.empty()) addDelim(delimiter); str += getZeroOrderNumberStr(currDigit, order, postfix, localeSettings);
str += postfix;
++currAddedCount, ++totalAddedCount;
}
break;
case size_t(1U): if (currDigit > size_t(1U)) { if (!str.empty()) addDelim(delimiter); str += getFirstOrderNumberStr(currDigit, size_t(), infix, postfix, localeSettings);
str += infix, str += postfix;
++currAddedCount, ++totalAddedCount;
} break;
case size_t(2U): if (!currDigit) break; if (!str.empty()) str += delimiter; switch (localeSettings.locale) {
case ELocale::L_EN_US: case ELocale::L_EN_GB: str += getZeroOrderNumberStr(currDigit, order, postfix, localeSettings);
str += postfix;
str += delimiter;
{
const char* postfix_; str += getOrderStr(size_t(2U), size_t(0U), currDigit, postfix_, localeSettings);
assert(postfix_ && !*postfix_);
}
break;
case ELocale::L_RU_RU: str += getSecondOrderNumberStr(currDigit, infix, postfix, localeSettings);
str += infix, str += postfix;
break;
}
++currAddedCount, ++totalAddedCount;
break;
} };
Tests
There are over 4k lines of tests (over 380 test cases) in the ConvertionUtilsTests
module (see "TESTS" folder).
Test using Ideone online compiler:
...
#include <iostream>
#include <string>
int main() {
std::string str;
ConvertionUtils::LocaleSettings localeSettings;
auto errMsg = "";
std::cout.precision(LDBL_DIG);
auto num = 6437268689.4272L;
localeSettings.locale = ConvertionUtils::ELocale::L_EN_US;
ConvertionUtils::numToNumFormatStr(num, str, localeSettings, &errMsg);
std::cout << num << " =>\n " << str << std::endl << std::endl;
num = 1200.25672567L;
str.clear();
localeSettings.locale = ConvertionUtils::ELocale::L_EN_GB;
localeSettings.foldFraction = true;
localeSettings.verySpecific = true;
ConvertionUtils::numToNumFormatStr(num, str, localeSettings, &errMsg);
std::cout << num << " =>\n " << str << std::endl << std::endl;
num = 1.0000300501L;
str.clear();
localeSettings.locale = ConvertionUtils::ELocale::L_RU_RU;
ConvertionUtils::numToNumFormatStr(num, str, localeSettings, &errMsg);
std::cout << num << " =>\n " << str << std::endl << std::endl;
num = 9432654671318.0e45L;
str.clear();
localeSettings.shortFormat = true;
localeSettings.locale = ConvertionUtils::ELocale::L_RU_RU;
ConvertionUtils::numToNumFormatStr(num, str, localeSettings, &errMsg);
std::cout << num << " =>\n " << str;
return 0;
}
Result:
6437268689.4272 =>
six billion four hundred thirty-seven million two hundred sixty-eight thousand six hundred eighty-nine point four two seven two
1200.25672567 =>
twelve hundred point two five six seven repeating
1.0000300501 =>
одна целая триста тысяч пятьсот одна десятимиллиардная
9.432654671318e+57 =>
девять октодециллионов четыреста тридцать два септдециллиона шестьсот пятьдесят четыре седециллиона шестьсот семьдесят один квиндециллион триста восемнадцать кваттуордециллионо
Points of Interest
Developed strategy allows to extend module to support other languages, like spanish
, for example: 0.333333333333 = "cero coma treinta y tres periodico".
The class is using FuncUtils, MathUtils, MacroUtils AND MemUtils modules.
This module [ConvertionUtils
] is just a small part of the library, which uses C++11 features and which I am working under now, I decided to make it a public
property.
If you saw ANY errors in the processing, please notify me here in the comments AND/OR on the GitHub
.
History