Introduction
For conversion of strings between UTF8 and UTF16 (as well as other formats), Microsoft gives us the MultiByteToWideChar
and WideCharToMultiByte
functions. These functions use null terminated char/widechar based strings. Use of those strings requires a bit of memory management, and if you use the functions extensively, your code may end up looking like a complete mess. That's why I decided to wrap these two functions for use with the more coder-friendly CString
types.
The conversion functions
UTF16toUTF8
CStringA UTF16toUTF8(const CStringW& utf16)
{
CStringA utf8;
int len = WideCharToMultiByte(CP_UTF8, 0, utf16, -1, NULL, 0, 0, 0);
if (len>1)
{
char *ptr = utf8.GetBuffer(len-1);
if (ptr) WideCharToMultiByte(CP_UTF8, 0, utf16, -1, ptr, len, 0, 0);
utf8.ReleaseBuffer();
}
return utf8;
}
UTF8toUTF16
CStringW UTF8toUTF16(const CStringA& utf8)
{
CStringW utf16;
int len = MultiByteToWideChar(CP_UTF8, 0, utf8, -1, NULL, 0);
if (len>1)
{
wchar_t *ptr = utf16.GetBuffer(len-1);
if (ptr) MultiByteToWideChar(CP_UTF8, 0, utf8, -1, ptr, len);
utf16.ReleaseBuffer();
}
return utf16;
}
Using the code
Use of the two helper functions is straightforward. But, do note that they are only useful if your project is set to use the UNICODE character set. The functions also only work in Visual Studio 7.1 or above. If you use Visual Studio 6.0, you won't be able to compile because you miss CStringA
and CStringW
. In the following code snippet, you have a usage example:
CStringW utf16("òèçùà12345");
CStringA utf8 = UTF16toUTF8(utf16);
CStringW utf16_2 = UTF8toUTF16(utf8);
History
After a comment by Ivo Beltchev, I decided to change the functions as he suggested. Initially, I designed the functions like this:
CStringA UTF16toUTF8(const CStringW& utf16)
{
LPSTR pszUtf8 = NULL;
CStringA utf8("");
if (utf16.IsEmpty())
return utf8;
size_t nLen16 = utf16.GetLength();
size_t nLen8 = 0;
if ((nLen8 = WideCharToMultiByte (CP_UTF8, 0, utf16, nLen16,
NULL, 0, 0, 0) + 2) == 2)
return utf8;
pszUtf8 = new char [nLen8];
if (pszUtf8)
{
memset (pszUtf8, 0x00, nLen8);
WideCharToMultiByte(CP_UTF8, 0, utf16, nLen16, pszUtf8, nLen8, 0, 0);
utf8 = CStringA(pszUtf8);
}
delete [] pszUtf8;
return utf8;
}
CStringW UTF8toUTF16(const CStringA& utf8)
{
LPWSTR pszUtf16 = NULL;
CStringW utf16("");
if (utf8.IsEmpty())
return utf16;
size_t nLen8 = utf8.GetLength();
size_t nLen16 = 0;
if ((nLen16 = MultiByteToWideChar (CP_UTF8, 0, utf8, nLen8, NULL, 0)) == 0)
return utf16;
pszUtf16 = new wchar_t[nLen16];
if (pszUtf16)
{
wmemset (pszUtf16, 0x00, nLen16);
MultiByteToWideChar (CP_UTF8, 0, utf8, nLen8, pszUtf16, nLen16);
utf16 = CStringW(pszUtf16);
}
delete [] utf16;
return utf16;
}
These functions work just as well, but the latter versions are smaller and a bit optimized. Thanks to Ivo for the observation!