Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

pseudoLocalizer -- a tool to aid development and testing of internationalized applications

0.00/5 (No votes)
7 Oct 2004 2  
An explanation of why you want to pseudo-localize, and a pair of tools for 'translating' into a useful pseudolocal language.

Windows pseudolocalization tool

Introduction

Pseudo-localizing is one technique for validating the translation-readiness of your software. It allows you to run and test your software with strings that are distinctly non-English but still quite readable. Lingering hard coded English strings are exposed. Logic dependencies on specific English values become apparent. Dialog sizing problems are visible. A single pass at pseudo-localizing can be done with any editor but it is useful to have a consistent transformation that you can apply repeatedly as you iterate through build and test cycles.

Background

(My apologies to the non-English-first developers who read this. The article and the tool are aimed squarely at English-speaking developers readying their software for non-English-speaking markets. I'm sure the concept of pseudo-localization is useful no matter what language you start with, but this tool probably will not be.)

There are two basic steps to creating an international edition of your software: globalization and localization (including translation). Globalization involves designing and building your code so that, among other things, strings are separated from the source code, and all assumptions about number, date, time, and unit formatting can be changed at either build- or run-time. Localization involves sending your strings out to be translated, merging the translated strings back in when they return, resolving dialog sizing issues (non-English strings are often much longer than the original), and getting all that number, date, time, and unit formatting to really work right.

In practice, especially if you are trying to internationalize a substantial code base, these steps are done iteratively. It is remarkably difficult to identify, up front, all the native language assumptions you have made! There's really no substitute for seeing your product run in another language; it makes problems jump off the screen at you. But waiting until you have a translation contractor in the loop and costing you money before you begin to do your iterations, is a good way to spend way too much. Using a pseudo-localization, you can begin to build, run, and test non-English versions of your product long before you send the actual strings off to the translation house. It won't help with those pesky date, time, number, and unit formatting problems, but it can give you confidence that you've isolated all your string resources, so your first translation pass can be a complete one. And while there is no substitute for a thorough visual inspection of all your screens in each of your published language editions, pseudo-localized strings can help you identify and clean up the worst of your dialog problems, before the clock is ticking at the end of a release.

The pseudo-localization used here is a variation on that outlined in Developing International Software, Second Edition (Microsoft Press, 2003), pg. 372. Strings are bracketed with curly brackets ({}) so that you can see at a glance if a string has been truncated or not. The first occurrence of each vowel is changed into a doubled non-English variant (e.g., 'a' becomes '��'), which drives character set problems and gives a suitable lengthening of each string. Selected other characters are converted into non-English variants (e.g., 'n' becomes '�'). And, emulating French, spaces are inserted in front selected punctuation marks. {Th�� res��lt���g stri�gs ��re disti��tively ����-���glish but still quite readable !}

Using the tool

I've supplied C# and Python source code and C# executable versions for two variations of the pseudoLocalizer. Python was originally chosen as an implementation language for the ease and conciseness with which the localization transforms could be implemented. (Read: I enjoy using it. :-)) But the localization engine (pLocalize.py) can be easily re-written in any language that supports regular expressions. And the other two modules are straightforward wrappers that could be re-implemented in whatever language you are used to working in. So, when the 'compiled Python' executables proved too large to upload to the CodeProject site, I simply rewrote in C#.

pLoc.exe is a command line tool that can be used to translate words or phrases passed on the command line or as lines of text in a file. Type 'pLoc /?' to get a description of the syntax. To be useful, you'll have to redirect output to a file. (Note that the output looks like garbage in a DOS box but fine if you redirect it to a file and look at it through Notepad or another editor. Welcome to the wonderful world of incompatible character sets.)

pLocWin.exe is a GUI tool that can be used standalone, much like GuidGen.exe, to translate words or sentences into text that can then be copied to the clipboard. It can also be added to the Tools menu of the MSVC IDE and used to translate strings that you've highlighted in the editor.

To add pLocWin as a tool to the IDE, choose External Tools from the Tools menu. Use the Add button and add the full path to the executable in the command edit box. In the arguments edit box, add: "$(CurText)". (The double quotes are part of what you need to type in -- they allow multiword phrases and sentences to be processed.) Your highlighted text will appear as the "text to be translated". Press the translate and copy buttons, then minimize pseduoLocalizer to return to the IDE. Press CTRL-V and the pseudo-localized text will replace the original.

Points of interest

The 'compiled Python' distributions produce zip files way too big for this site to handle or post. Sigh. But, since I thought that it would be pretty easy to rewrite all of the components in any language that handled regular expressions, I decided to treat that as a challenge and I quickly ported the code to C#.

I must say that producing the GUI app in the the MSVC IDE C# environment was a pleasure. All the ease of producing a VB6 app without the pervasive feeling of code-voodoo. (Or maybe, I'm just getting used to having so much done for me!) The C# versions have not had as much testing as my original Python versions; please do let me know if you have problems/questions.

Variations that might be useful

There are a lot of variations on this pseudo-localization that might be useful.

You'll note that I'm only translating/doubling the first instance of each vowel (with lower and upper case versions being treated as separate instances). My experience is that this provides enough string lengthening to expose the dialog resizing problems you'll find for French and German translations. But, if you want to push the string length issue, you might want to expand the transformation for vowels to a second instance or all instances. In the Python regular expression syntax, this is easy to do by changing the third member of the RawFindReplaceList tuple to 2 (to transform first and second instances) or 0 (to transform all instances).

If the strings you are translating include C++ or Python format specifiers used at runtime to fill in variable text, it could be really useful to bracket those specifiers. For example, turn all instances of "%s" into "<%s>". (I left that out of my transformations so that they would remain language neutral.)

It's useful to highlight substitutions because they are inherently difficult to translate correctly. Word order is different from language to language, so substitutions that hard-code the order of the variables within strings make the strings tricky for the translators to work with. (Although, this is something that experienced translators are used to dealing with and can work around.) Even more of a problem are cases where an article, preposition, or adjective is hard coded in front of a variable noun. Our English "the", "of", "quick", and "red" words are the same no matter what noun that follows, but that's not the case in French, German, or Spanish.

History

  • 7 October 2004 -- initial version.
  • 12 October 2004 -- new downloadable executables in C# and source code to match.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here