Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / web / HTML

How to program multilingual web sites

2.20/5 (5 votes)
21 Jun 2008CPOL8 min read 1   2  
How to enlarge web sites to support several languages

Introduction

Have you created a site and wish to spread it globally and add multilingual capabilities? Are you pondering on how to develop a site with several languages? If so, then this article is for you. I will discuss the several options of how to write sites that contain many languages. I will tell the pros and cons of each option while focusing on ease of use, ease of programming, and performance impacts.

Background

This guide is written by me, an experiences old school ASP (Active Server Pages) programmer. I believe the article will bear fruits even for the ASP.Net programmers out there, as they face the same fundamental problems when developing large scale projects. You should know your code to understand the implications of adding multilingual capabilities to it; I usually write my web projects, large as small, using Microsoft® Notepad®, so I know my code rather well. Be too!

Using the code

I don't supply a full archive of source code. I am supplying a general notion on how to prepare multi-language support. The code I write below exhibits a general idea and should not be copied into your own web site; contemplate it and implement it in your site, if I had you convinced.

Main Article

Global expansion and the problem of multilingual support

I, myself, had the pleasure of being the lead programmer at Bono Pie LTD, Israel at the year 1999. I was only 19 of age at that time; as these were the pioneering years of web services, I was inexperienced as almost everyone else in the field of large scale web development. Nevertheless, and probably as oppose what you imagine until now, Bono Pie LTD had a durable success with about 20 employees, from which about four of them programmers in my team.

The site was programmed in ASP using VBScript and had many flashy HTML, CSS, Flash® files and other ornaments. Our database was Microsoft® SQL® server, after migrating from Microsoft® Access®, and had a lot of DBA hours, many of which my own. We worked hard every day to keep the business pace, creating new features and fixing up bugs.

Seeing in advance that supporting other languages will soon be needed, we started working on how to migrate our site. It wasn't an easy chore as the language of our site was Hebrew (the language used in Israel), in which the letters are written right-to-left (RTL). I will spare the thorough description about how to implement a site that supports LTR (left-to-right) and RTL altogether as I conceive that most of you will not bump into it. So, our problem is reduced to "How do I create the pages of web site to support different languages?"

First, let's examine what should be done. A code snippet from my site might look like this:
HTML
<table border=0>
    <tr width=100%>
        <td width=100%>
            Friends? Friends?!?
            We've only gone out together three times,
            and already you're telling me you want to be friends?
        </td>
    </tr>
</table>
In order to support multi-language I must somehow integrate the same text in several languages. A somewhat peculiar solution might be:
HTML
<table border=0>
    <tr width=100%>
        <td width=100%>
            <% if MyLanguage="English" then%>
                Friends? Friends?!?
                We've only gone out together three times,
                and already you're telling me you want to be friends?
            <% elseif MyLanguage="Spanish" then%>
                ¿Amigos? ¿Amigos?!?
                ¿Hemos salido solamente juntos tres veces,
                y ya me estás diciendo que quieres ser amigos?
            <% end if%>
        </td>
    </tr>
</table>
However, imprinting the texts here will complicate the ability for normal translators to translate the site; I will have to sit with them to see they don't break my code; this is an error prone resolution. In contrast, I will want to have a design that will allow me to combine all texts needed to be translated into a single point. Moreover, it is easy to see that human interaction must take part as some parts need to be translated, the text, while others, the HTML or ASP part, need not to be translated.

I knew I have to add this ability while maintaining my freedom for arbitrarily changing the code. Therefore, a solution as copying the site and manually changing each copy for a new language is disastrous as I will be able to maintain my code no more. So, we knew that we need to have only one source of code. However, if we create only one source of code, how will it transfer itself to several languages? An easy but somewhat faulty solution is to embed the language specific datum as an ASP directive most notably as a function call with a unique ID for identifying the specific string; this method is straightforward for new age developers as it resembles a String Table. An example:
HTML
<table border=0>
    <tr width=100%>
        <td width=100%>
            <%= GetLinguisticText("FamousQuotationPage_TitleHeader")%>
        </td>
    </tr>
</table>
Although this mechanism supposedly resembles my final solution, it is quite different. The problems with the proposed method is that not all language specific content can have such directives; we cannot add this into HTML pages as well as images. If we were to add such command into an HTML file, it will be displayed to the users and will not help us in achieving multilingual support. A different approach for supporting HTML files is to physically split the languages to several HTML files; however, this approach tends to be related to my first suggestion; in addition, it places new obstacles as the code now has to take into consideration the language used for picking up the right HTML file. All that is just said is also true for image files (PNG, JPG, GIF, etc.). Someone might say a satisfying solution is simply to change the HTML files to ASP files; however this scenario has implications upon performance and upon the structure of the site. So, we need a more general answer.

In order to support text in all file type and not to affect the site, s a different approach had to be taken. Instead of integrating the text directive into the file, we put the text directive into one file, henceforth "The Code", and the file with the text into a different file, henceforth "The Result"; viz. The code contains a marker for text and the result files contain text for each language. To achieve this, we created a small ASP site, henceforth "The engine", which generates the result files dynamically. This way, when we want to change anything in our site we need to edit the code and easily generate the result files. Sure, our solution is not as automate as the aforementioned solution; however, it does support all textual file types and even was expanded to support also other file types, such as image and Macromedia® Flash® files.

The proposed solution

First, we needed to put a specific mark inside the code pages so that our engine will know where to put translated text. A pretty good notation will be something like:
HTML
<table border=0>
    <tr width=100%>
        <td width=100%>
            <GetLinguisticText ID="FamousQuotationPage_TitleHeader" />
        </td>
    </tr>
</table>
Using such notation in the code files will allow for future options to be used by other attributes. Our engine will look for GetLinguisticText tag inside all code pages and will replace them with the proper text when constructing the result files. A straightforward engine code can be:
VBScript
''' This function read the content from <CodeFileName>,
''' Translate it by replacing all '<GetLinguisticText ID="XXX" />'
''' with the text indicated in the <ResultLanguage> String Table.
''' Then, it places it in a corresponding file in the <ResultLanguage> folder.
Sub GenerateResultFileFromCodeFile(CodeFileName, ResultLanguage)
    
    Dim FSO
    Set FSO=Server.CreateObject("Scripting.FileSystemObject")

    '''' Reading the code page into CodeText
    Dim CodeText
    Dim CodeFile
    ' Open the file for reading
    Set CodeFile=FSO.OpenTextFile(GetCodeDir() + CodeFileName, 1)
    CodeText=CodeFile.ReadAll
    CodeFile.Close
    Set CodeFile=Nothing

    '''' Translating CodeText into ResultText
    '''' by replacing each tag with its corresponding text
    Dim ResultText
    Dim IndexOfTag
    Dim TagStart
    TagStart="<GetLinguisticText "
    Dim TagEnd
    TagEnd=" />"
    IndexOfTag=InStr(CodeText,TagStart)
    While IndexOfTag>0
        ' Copy text before the appearance of the tag
        If IndexOfTag>1 Then
            ' Copying all characters before the tag into the result
            ResultText=ResultText+Left(CodeText, IndexOfTag-1)
            ' removing the copied part
            CodeText=Mid(CodeText,IndexOfTag)
        End If
        ' Trimming the tag's header
        CodeText=Mid(CodeText,Len(TagStart)+1)
        
        '''' Trimming the ID attribute
        '''' Should leave CodeText without the tag at all,
        '''' while appending the text into ResultText.
        ' The attribute of ID is: ID="XXX"
        IdAttributeStart="ID="""
        IdAttributeEnd=""""
        '''' Could not parse the ID attribute
        If Left(CodeText, Len(IdAttributeStart))<>IdAttributeStart Then
            'Raise a user-defined error
            Err.Raise 8
            Err.Description = "Invalid input encountered while parsing 
code page to translate (" + CodeFileName + ")"
            Err.Source = "Translator"
        End If

        ' Trimming the attribute's header
        CodeText=Mid(CodeText,Len(IdAttributeStart)+1)

        ' Locating the end of the attribute
        Dim IdAttributeEndIndex
        IdAttributeEndIndex=InStr(CodeText, IdAttributeEnd)
        Dim StringID
        ' Found the value of the ID tag
        StringID=Left(CodeText, IdAttributeEndIndex-1)
    
        ' Trimming the attribute's footer
        CodeText=Mid(CodeText, IdAttributeEndIndex-1+Len(IdAttributeEnd)+1)
        ' Trimming the tag's footer
        CodeText=Mid(CodeText, Len(TagEnd)+1)
        

        '''' Translating StringID into its text
        ResultText=ResultText+GetLinguisticText(StringID, ResultLanguage)



        ' looking for the next tag in the same file
        IndexOfTag=InStr(CodeText,TagStart)
    Wend
    ' Copying the rest of the Code into the Result
    ResultText=ResultText+CodeText

    '''' Writing the translated code page into the result file
    Dim ResultFile
    ' Open the file for writing and create the file if needed.
    Set ResultFile=FSO.OpenTextFile(GetResultDirForLanguage(ResultLanguage)
 + CodeFileName, 2, True, -2)
    ResultFile.Write ResultText
    ResultFile.Close
    Set ResultFile=Nothing

    Set FSO=Nothing
End Sub
It took me about half an one hour to write this code in Notepad® and about one minute and a half to debug it, using three iterations. All you need to do now is to create the three missing functions: Function GetCodeDir, Function GetResultDirForLanguage(Language), and Function GetLinguisticText(StringID, ResultLanguage); the latter should probably go into the String Table inside your database.

Now you can actually rebuild your site using the new mechanism. This is where it gets tedious: you manually need to insert all your text into its destined String Table and replace it with the GetLinguisticText tag with its corresponding ID.

Once you finished with laboring, you can recreate your site using a call to GenerateResultFileFromCodeFile for all your pages. You could probably use a table to list all your pages. Another table to hold all the languages will suit as well. I also created a table to list all the TextIDs and a table for every language for its translation, holding a key and value columns. Feel free to express yourself.

Easing the translation process

Of course you can ask a translator to translate texts directly inside your database, but don't expect too much of him or her. You will probably do better if you could display the text taken from a different language or several, if possible. I remember I created a form with the ID, the text in Hebrew and the destined language; the translator was only able to change the content of the language being translated to; also a previous and next buttons were supplied to allow scrolling the different IDs. So, to translate the site to a new language, Japanese in my case, a translator will to translate numerous texts using a single click routine.

Some of the text that is being displayed in web sites is actually fragmented in its code form, for example:
HTML
Hello <%= UserName%>, thank you for coming.
With the translation mechanism so far, the programmer will have to do something like this:
HTML
<GenerateResultFileFromCodeFile ID="MainPage_GreetingBeforeUserName" />
<%= UserName%><GenerateResultFileFromCodeFile ID="MainPage_GreetingAfterUserName" />


MainPage_GreetingBeforeUserName::English = "Hello "
MainPage_GreetingBeforeUserName::Spanish = "Hola "
MainPage_GreetingAfterUserName::English = ", thank you for coming."
MainPage_GreetingAfterUserName::Spanish = ", gracias por venire."
While it seems rather good, translation of this tends to be out of context and the translation tends to break when being translated from one language to another. As our engine creates other pages, even ASP pages, we can even put the small ASP code inside the translation unit. viz:
HTML
<GenerateResultFileFromCodeFile ID="MainPage_Greeting" />


MainPage_Greeting::English = "Hello <%= UserName%>, thank you for coming."
MainPage_Greeting::Spanish = "Hola <%= UserName%>, gracias por venire."
However, if the code is rather long and have many ASP directives, it would be advisable not to include it. But, feel free to add symbols: smiley, trademark; HTML code: a <br>, an <hr>; and the like if they contained in the text and are easy to follow. Even so, a good addition I have made was to connect each TextID with it Page and previewing the page to the translator; viz. when the translator translate a given TextID, he or she has a frame of half the screen with the page with the text that is being translated; this mitigates the context problem. In addition, grouping all the content of a specific page is effectual as it hasten the duration of translation.

I vaguely remember what exactly we did regarding other files, mainly image files. I remember our site had a prototype image, using our mother tongue Hebrew language and the translators had to upload the same image in their own language. I even remember we put our source images (Photoshop® PSD files, for instance) on the same back office site; so adding an images table with the image source file and also the names of the pictures in all the different languages will suffice.

Ciao.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)