Plural Forms

Peter Kankowski

4.69/5 (12 votes)

15 Apr 2008LGPL32 min read

Spelling messages like 5 file(s) found correctly in any language

Download source code - 23.66 KB

Introduction

Messages like "%d file(s) found" are notoriously hard to localize. In English language, there are only 2 forms: 1 file (singular) and 2 or more files (plural), but other languages use up to 4 plural forms. For example, there are 3 forms in Polish:

    0 plików
    1 plik
  2-4 pliki
 5-21 plików
22-24 pliki
25-31 plików
      etc.

Other languages (French, Russian, Czech, etc.) also use rules different from English and from each other.

The gettext library extracts a rule for plural form selection from the localization file. The rule is a C language expression, which is evaluated for each message. It's a universal solution, but IMHO, an expression evaluator is overkill for this task.

My Solution

I developed a simpler solution, which works for all languages mentioned on gettext page. It is based on these observations:

All additional plural forms are used for some range of numbers, e.g., from 2 to 4 in Slovak and Czech.
The pattern is often repeated for each 10 or 100 items. In Russian, it sounds like "twenty-one file", not "twenty-one files", because the noun agrees with the last figure, "one". The same pattern repeats for 30, 40, etc.
The numbers from 10 to 19 (I call them "teens" for short) are often an exception to the rules. Just like 16 is spelled differently from 26, 36, 46, etc. in English: "sixteen" vs. "twenty-six", "thirty-six", and "forty-six".
Zero is treated differently in some languages, e.g. Romanian.

So, the rule for each plural form will consist of these components:

range_start  range_end  modulo_for_repetition  skip_teens_flag

Here are some examples:

English
singular - range_start = 1, range_end = 1
plural   - all other numbers

Polish
singular - range_start = 1, range_end = 1
plural1  - range_start = 2, range_end = 4, modulo = 10, skip_teens = true
plural2  - all other numbers

Irish
singular - range_start = 1, range_end = 1
plural1  - range_start = 2, range_end = 2
plural2  - all other numbers

Lithuanian
singular - range_start = 1, range_end = 1, modulo = 10, skip_teens = true
plural1  - range_start = 2, range_end = 9, modulo = 10, skip_teens = true
plural2  - all other numbers ("teens")

The rules for each language could be written to a short string, which is stored in the language file (e.g., for Lithuanian, the string is "1 1 10 t; 2 9 10 t").

Using the Code

To use my solution, include plurals.h and plurals.c in your project. The interface consists of two functions. First, you call PluralsReadCfg to read rules from the string. Next, you pass a number to PluralsGetForm. It returns the index of the correct plural form for this number, which you use to read the string from your language file:

C++

PLURAL_INFO plurals;
PluralsReadCfg(&plurals, ReadFromLngFile("PluralRules"));

char lng_str_name[16], message[128];
sprintf(lng_str_name, "FilesFound%d", PluralsGetForm(&plurals, number));
sprintf(message, ReadFromLngFile(lng_str_name), number);

In the language file, you have strings for each plural form:

C++

PluralRules = "1"
FilesFound0 = "%d file found"
FilesFound1 = "%d files found"

ReadFromLngFile is your own function. You could wrap two sprintfs in a higher-level function (and, of course, use a secure function instead of sprintf to protect your program from buffer overflow).

Conclusion

Two functions, PluralsReadCfg and PluralsGetForm, take 500 bytes in your executable file when compiled with MSVC++. This is a small price to pay for spelling your messages correctly in any language.

History

15^th April, 2008: Initial post

License

This article, along with any associated source code and files, is licensed under The GNU Lesser General Public License (LGPLv3)