Introduction
This article will show how to port a Visual Studio C++ project from a multi-byte configuration into Unicode, with a special emphasis on:
- Automatically adding the
_T("")
macro to quoted strings.
- Porting from
std::string
, std::ostringstream
, and std::ofstream
to Unicode compatible versions.
- Storing Unicode values in
std::string
.
Background
Once upon a time, I started a project in Visual Studio 6, C++, and the first line I ever wrote was:
AfxMessageBox("Hello World!");
When I hit "Build", it won't compile until I changed the project's configuration to Multibyte instead of the default Unicode.
I knew from that moment on that I better put the _T
macro ahead of every string, but after some time, I stopped doing it.
An year later, when the project became a 200,000 lines code monster, I was asked to translate the program to other languages, like Russian and Chinese. After changing the project configuration back to Unicode, it had thousands of errors, mostly because of quoted text not having added with the _T
macro.
This article will show how to automatically add the _T("")
macro to quoted strings, using Visual Studio's macro explorer.
Automatically adding the _T("") macro to quoted strings
Using the code
- Open your project in Visual Studio.
- In the top main menu, go to "Tools->Macros->Macro Explorer". The Macro Explorer panel should appear at the right part of the screen.
- Right-click on "MyMacros" and choose "New module".
- Type in, exactly, the following name: "AutoT".
- Right-click on the newly created module and choose "Edit".
- Paste in the following text, overwriting the existing few lines in the automatically generated code.
- Save and close the macro.
Note that although the following code is written in VBScript, it is intended for C++ programs:
Imports System
Imports EnvDTE
Imports EnvDTE80
Imports System.Diagnostics
Public Module AutoT
Sub ReplaceXWithY(ByVal X As String, ByVal Y As String, _
Optional ByVal MatchCase As Boolean = False, _
Optional ByVal PatternSyntax As _
EnvDTE.vsFindPatternSyntax = _
vsFindPatternSyntax.vsFindPatternSyntaxLiteral)
DTE.Find.Action = vsFindAction.vsFindActionReplace
DTE.Find.FindWhat = X
DTE.Find.ReplaceWith = Y
DTE.Find.Target = vsFindTarget.vsFindTargetOpenDocuments
DTE.Find.MatchCase = MatchCase
DTE.Find.MatchWholeWord = False
DTE.Find.Backwards = False
DTE.Find.MatchInHiddenText = False
DTE.Find.PatternSyntax = PatternSyntax
If (DTE.Find.Execute() = vsFindResult.vsFindResultNotFound) Then
Throw New System.Exception("vsFindResultNotFound")
End If
End Sub
Sub QuotedTextTo_T()
ReplaceXWithY("{:q}", "_T(\1)", True, _
vsFindPatternSyntax.vsFindPatternSyntaxRegExpr)
End Sub
End Module
When you go back to your project, you will see a macro named "AutoT
" in the Macro Explorer panel on the right.
Every time you double click on that macro, it will mark the next quoted text in the currently opened C++ file. Another click on the macro will wrap the text with the _T
macro:
AfxMessageBox("Hello World!");
which will be changed to:
AfxMessageBox(_T("Hello World!"));
and will compile both in Multibyte and Unicode configurations!
It is recommended to add a keyboard shortcut to the macro:
- In the top main menu of Visual Studio, go to "Tools->Options"
- Click on the + at the left of "Environment"
- Click "Keyboard"
- At the right pane, type "AutoT" under the "Show commands containing:" edit box, to find our new macro
- Click on the macro, and assign a shortcut key to it (I chose Ctrl-Alt-Num0)
- Click OK
Now every time you press that keyboard combination, the macro will be executed.
Note: Don't be tempted to blindly let the script do all the job. Human verification is needed. The script will try to add _T
macro to lines like:
#include "StdAfx.h"
In order to make the script skip such a line, simply press the "Right" arrow on your keyboard.
The script is also not smart enough to recognize in-text quotes like:
AfxMessageBox(_T("Hello \"World!\" "));
It will also fail to skip quoted text already wrapped with _T macro, but the good thing is it will never skip a quoted text :)
Note that you will also have to rename all occurrences like strcmp
to their TCHAR.H routines like _tcscmp
.
Porting from std::string, std::ostringstream, std::ofstream to Unicode compatible versions
If in your multibyte project you've widely used std::string
, std::ostringstream
, or std::ofstream
, those will work badly in a Unicode compilation.
The easiest way is to define the following, and rename all occurrences in your program from std::string
to tstring
for example.
#include <string>
typedef std::basic_string<TCHAR> tstring;
typedef std::basic_ostringstream<TCHAR> tostringstream;
typedef std::basic_ofstream<TCHAR> tofstream;
Also, replace all char in your code to TCHAR
.
Storing Unicode values in std::string
To store Unicode values, std::wstring
can be used, but when you must store a Unicode value in a standard std::string
or char array, you can store it in UTF-8 format, which is also used in TinyXML, among others.
The following helper functions may help you with the conversions:
std::string CStringToString(const CString& cs)
{
CT2CA pszConvertedAnsiString (cs);
return pszConvertedAnsiString;
}
tstring CStringTo_tstring(const CString& cs)
{
std::basic_string <TCHAR> bsConverted (cs);
return bsConverted;
}
std::string tstringTo_stdString(const tstring& ts)
{
return CStringToString(ts.c_str()).c_str();
}
tstring UTF8charTo_tstring( const char* strIn )
{
wchar_t buffer[2048]; MultiByteToWideChar(CP_UTF8, 0, strIn, -1, buffer, 2048 ); tstring ts1 = CString(buffer);
return ts1;
}
std::string tstringToUTF8string( tstring tsIn )
{
char buffer[2048]; WideCharToMultiByte( CP_UTF8, 0, tsIn.c_str() , -1, buffer, 2048, NULL, NULL );
std::string s1 = buffer;
return s1;
}
bool HasUnicodeChars( tstring tsIn )
{
std::string sNarrow = tstringTo_stdString(tsIn);
tstring tsFromNarrow = CString(sNarrow.c_str());
if ( tsFromNarrow != tsIn )
return true;
else
return false;
}
To convert tstring
to char*
:
CStringToString(sName.c_str()).c_str()
To convert std::string
to tstring
:
tstring ts30 = CString(stdS1.c_str());
Note: UTF-8 strings are like char strings, but Unicode letters might take two chars, making the string longer:
tstring ts1;
ts1 = _T("Some foreign language text");
int nLen = ts1.length();
int nSize = ts1.size();
std::string s2 = tstringToUTF8string(ts1);
int nLen2 = s2.length();
int nSize2 = s2.size();
tstring ts5 = UTF8charTo_tstring(s2.c_str());