Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

SMTP Internationalization

0.00/5 (No votes)
13 Jul 2003 1  
How to send non-English e-mail using .NET.

Introduction

You can find many articles dedicated to C# SMTP implementation on this or other sites. I'm not going to stop on protocol implementation details but rather on the issue of sending e-mail in languages other than English (I'd use Russian in our scenario). English-only based e-mail messaging systems use 7-bit System.Text.Encoding.ASCII encoding when text has to be converted to sequence of bytes for network transmission. All such applications convert any non-English characters (hex codes 0x80-0xFF) into '?' meaning that there is no proper character representation.

Solution

Simple solution to this problem is to use System.Text.Encoding instance that corresponds to source text encoding scheme. Source character set would usually correspond to one set in Control Panel/Regional Settings.

I use Russian as my default language, so that all Cyrillic characters appear properly inside text areas and on title bars. Apparently, there is an easy way to find out what default encoding scheme is used by Windows:

System.Text.Encoding sourceEncoding = System.Text.Encoding.Default;

A little test:

Console.WriteLine( "Windows charset: " + sourceEncoding.HeaderName );
Console.WriteLine( "Windows code page: " + sourceEncoding.CodePage );

would reveal that we are on the right way:

> Windows charset: Windows-1251

> Windows code page: 1251

Now, e-mail can be properly encoded for transmission. We'd just need to add character set identifier to message header:

text.AppendFormat( "Content-Type: text/plain;\r\n\tcharset=\"{0:G}\"\r\n", 
    sourceEncoder.HeaderName );

where text is a StringBuffer variable containing resulting text. Message body would be transmitted like this:

byte[] data = sourceEncoding.GetBytes( text.ToString() );
smtpStream.Write( data, 0, data.Length );

That would be all, but in real world, not everything is that simple. By historical reasons, Russian speaking countries use KOI-8 encoding as de-facto e-mail standard (not everyone is using Windows and accordingly code page 1251 might not be supported on some DOS or UNIX systems). That's why I set my default e-mail encoding in Outlook Express to KOI-8 (Options/Send), so I'd be able to chat with 'non-Windows' buddies:

Some investigation reveals that this value is also present in default encoding object:

Console.WriteLine( "E-Mail charset: " + sourceEncoding.BodyName );

> E-Mail charset: koi8-r

Luckily, there is a static function System.Text.Encoding.Convert() that can convert text from one encoding scheme to another. Here is a snippet of code that must be implemented before the message is sent. Don't forget that resulting code page will be different now, so 'Content-Type' charset header must refer to sourceEncoding.BodyName.

Using System.Text;
/* ............ */
Encoding srcEnc = Encoding.Default; 
Encoding dstEnc;

/* src & dst refer to same object if no intermediate conversion is required */
if( srcEnc.HeaderName.Equals( srcEnc.BodyName ) )
  dstEnc = srcEnc;
else
  dstEnc = Encoding.GetEncoding( srcEnc.BodyName );

/* ............ */
byte[] srcData = srcEnc.GetBytes( messageString );
byte[] dstData;

/* see if we need to convert data */
if( dstEnc != srcEnc )
  dstData = Encoding.Convert( srcEnc, dstEnc, srcData );
else
  dstData = srcData;

/* write encoded data */
smtpStream.Write( dstData, 0, dstData.Length );

That's all, folks. Latest version of the SMTP library source code and help file can be found here.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here