Introduction
Messages like "%d file(s) found" are notoriously hard to localize. In English language, there are only 2 forms: 1 file (singular) and 2 or more files (plural), but other languages use up to 4 plural forms. For example, there are 3 forms in Polish:
0 plików
1 plik
2-4 pliki
5-21 plików
22-24 pliki
25-31 plików
etc.
Other languages (French, Russian, Czech, etc.) also use rules different from English and from each other.
The gettext
library extracts a rule for plural form selection from the localization file. The rule is a C language expression, which is evaluated for each message. It's a universal solution, but IMHO, an expression evaluator is overkill for this task.
My Solution
I developed a simpler solution, which works for all languages mentioned on gettext
page. It is based on these observations:
- All additional plural forms are used for some range of numbers, e.g., from 2 to 4 in Slovak and Czech.
- The pattern is often repeated for each 10 or 100 items. In Russian, it sounds like "twenty-one file", not "twenty-one files", because the noun agrees with the last figure, "one". The same pattern repeats for 30, 40, etc.
- The numbers from 10 to 19 (I call them "teens" for short) are often an exception to the rules. Just like 16 is spelled differently from 26, 36, 46, etc. in English: "sixteen" vs. "twenty-six", "thirty-six", and "forty-six".
- Zero is treated differently in some languages, e.g. Romanian.
So, the rule for each plural form will consist of these components:
range_start range_end modulo_for_repetition skip_teens_flag
Here are some examples:
English
singular - range_start = 1, range_end = 1
plural - all other numbers
Polish
singular - range_start = 1, range_end = 1
plural1 - range_start = 2, range_end = 4, modulo = 10, skip_teens = true
plural2 - all other numbers
Irish
singular - range_start = 1, range_end = 1
plural1 - range_start = 2, range_end = 2
plural2 - all other numbers
Lithuanian
singular - range_start = 1, range_end = 1, modulo = 10, skip_teens = true
plural1 - range_start = 2, range_end = 9, modulo = 10, skip_teens = true
plural2 - all other numbers ("teens")
The rules for each language could be written to a short string
, which is stored in the language file (e.g., for Lithuanian, the string
is "1 1 10 t; 2 9 10 t
").
Using the Code
To use my solution, include plurals.h and plurals.c in your project. The interface consists of two functions. First, you call PluralsReadCfg
to read rules from the string
. Next, you pass a number to PluralsGetForm
. It returns the index of the correct plural form for this number, which you use to read the string
from your language file:
PLURAL_INFO plurals;
PluralsReadCfg(&plurals, ReadFromLngFile("PluralRules"));
char lng_str_name[16], message[128];
sprintf(lng_str_name, "FilesFound%d", PluralsGetForm(&plurals, number));
sprintf(message, ReadFromLngFile(lng_str_name), number);
In the language file, you have string
s for each plural form:
PluralRules = "1"
FilesFound0 = "%d file found"
FilesFound1 = "%d files found"
ReadFromLngFile
is your own function. You could wrap two sprintf
s in a higher-level function (and, of course, use a secure function instead of sprintf
to protect your program from buffer overflow).
Conclusion
Two functions, PluralsReadCfg
and PluralsGetForm
, take 500 bytes in your executable file when compiled with MSVC++. This is a small price to pay for spelling your messages correctly in any language.
History
- 15th April, 2008: Initial post