In this tip, you will see how to create a text (.txt) file with a Unicode utf-8 file name, write a BOM to the beginning of the same file, and append the same file with Unicode string.
Introduction
This post discusses the following:
- Creating a text (.txt) file with a Unicode utf-8 file name
- Writing a BOM to the beginning of the same file
- Appending the same file with Unicode string
Background
I used to use Microsoft Visual Studio a lot for many years and I wrote many programs in it, some of which dealt with intense engineering design and testing. I was aware that Visual Studio (VS) is a cripple that did things for me fast. It is called a Rapid Application Development environment because it is that. But, I wanted more control over my code and I did not want to have to guess that the VS cripples were doing as I desired. I had to see the actual code. I had to be the one that coded it in.
Thus, when I retired, I stopped using Visual Studio and jumped directly into intermediate to advanced C++11. I studied available IDEs and available compilers and chose Code::Blocks 17.12 with MinGW and GCC 5.1 for its stability and usability. I rejected wxWidgets which comes with CodeBlocks since it (in my opinion) is just another cripple for RAD similar to VS and I did not want it.
A RAD is great if you are in a desperate hurry. I am retired and I can now program in C and C++ all day and all night if I like.
For the first few months, I was using Bjorne's books and studying there. I have not needed to use those except rarely since then.
C and C++ is so much better than what I had used before.
About 2 years later, I can now code in a large program that does more and is more cross-platform adjustable via the IDE (Code::Blocks 17.12) than I did before.
This article (yea, they told me to call it a "Tip", but it really is an article) addresses a realized need (to some extent) to use Unicode. Unicode is vast. Its potential is vast. When I left the limitations of Visual Studio, I also separately left the limitations of ANSII and now my code is in C and C++ and I can now actually program in Unicode (yes, I now can whether you understand that or not) and my interface is in Unicode. This does not tell all about the use of Unicode in C and C++, but it does give a working example that is to a great extent backward compatible and forward compatible. Read the limits that I disclose herein and enjoy.
Problems getting here:
Many people have a disgusting and perverted attitude against anything prevous, like Microsoft Windows' previous versions older than they personally are using, and they berate any and all that do not worship what is being advertised by marketers as "modern".
Microsoft has many problems with hard-coded C and C++ doing Unicode stuff. I understand. Microsoft, in the past, created lots of "code pages" in which they seem to have hard coded Unicode symbols to force their operating system to work with Unicode before C and C++ was developed enough to program sufficiently with it in mind. I get it. Microsoft did a great job of forcing their operating systems to be compatible with Unicode. But, Microsoft does not seem to have gotten beyond that. They still own the market, so to say, but they need to deal with that limitation and at the same time, be backward compatible. Maybe Bjorne can be hired for some astronomical price to help them.
Code::Blocks does not guess and suggest like Visual Studio did, but that is fine with me. It took some getting used to, but I like it, I like it a lot.
The biggest problem has been that the committee that stamps its approval on Bjorne's versions of C++ seems to have been overwhelmed by Unicode and thus has precipitated a lack of compiler development in that direction.
For these problems, I have and I do and I will get past them and my code will work for me as I desire.
Using the Code
I retain all of my rights for this article.
I proclaim this to be an open source article.
Codeproject.com now has rights to this article as they have chosen to publish it on their site.
The following sets up some basics (not all) that encapsulate or address the code that I am allowing you to see.
This was done on Microsoft Windows XP Professional 32 bit with service pack 2 (certainly not 3).
Using Code::Blocks 17.12 with MinGW with GCC 5.1 set for C++11.
Using a Graphical User Interface (GUI).
THIS IS SO IMPORTANT:
In Code::Blocks 17.12 / Settings / Editor (Configure editor) (General Settings) / Encoding settings / Encoding / "Use encoding when opening files:" UTF-8 / "Use this encoding" (+) "As default encoding (bypassing C::B's auto-detection)" / [+] "If conversion fails using the settings above, try system local settings"
and:
int WINAPI WinMain(HINSTANCE hInstance, HINSTANCE, LPSTR, int nCmdShow)
and at the start of the code before anything else:
#define _UNICODE
#define UNICODE
and other stuff, but that should get you close.
For testing, put the following example into your WinMain right before the following:
while(GetMessage(&Msg, nullptr, 0, 0) > 0)
{
TranslateMessage(&Msg);
DispatchMessage(&Msg);
}
return Msg.wParam;
Then after testing is done, and after you are satisfied with all of your adjustments and your own version of error checking, then move the following to a function so that you get a nice clean up.
__________
The code:
const wchar_t* TheFile;
TheFile = L"utf8_UsingByteOrderMark_C_天堂.txt";
HANDLE hDFile01;
hDFile01 = CreateFile(TheFile, GENERIC_READ | GENERIC_WRITE, 0,
NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
if (hFile == INVALID_HANDLE_VALUE)
{
MessageBox(nullptr,
L"Could not open utf8_UsingByteOrderMark_C_天堂.txt",
L"Could not open utf8_UsingByteOrderMark_C_天堂.txt",
MB_ICONEXCLAMATION | MB_OK);
}
DWORD NumberOfBytesWritten01;
BOOL bErr01;
bErr01 = false;
unsigned char BOM01[3]{ 0xef, 0xbb, 0xbf };
bErr01 = WriteFile(hDFile01, (LPCVOID)BOM01, (DWORD)sizeof(BOM01),
&NumberOfBytesWritten01, NULL);
CloseHandle(hDFile01);
HANDLE hDFile03;
hDFile03 = CreateFile(TheFile, GENERIC_READ | GENERIC_WRITE, 0,
NULL, OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
DWORD NumberOfBytesWritten02;
BOOL bErr02;
bErr02 = false;
bErr02 = WriteFile(hDFile03, L"hello - J - こんにちは -
abcdefghijklmnopqrstuvwxyz", 90, &NumberOfBytesWritten02, NULL);
CloseHandle(hDFile03);
That is it.
EOL
Points of Interest
My avocados come from Brazil, or Mexico, etc. My olive oil comes from the USA, or Greece (via a hidden transfer to Germany for processing), etc. My gasoline and oil comes from the Middle East, or South America, or even Russia (I guess if they pay off whichever the current flakes are in Washington at the time of delivery). The steel in my vehicle might come from Germany, or Russia, or China, but so many sources are mixed in the cauldrin that it is hard to tell. With all that, it is wise to make all of your internationally marketable software very Unicode capable.
Two or three years into this and I am finding C and C++ to be easy. It starts out difficult, then gets worse, then goes through multiple levels of hair pulling exasperations, then it is easy and fun.
Do the Unicode thing. Try not to pull your hair out. But, get comfortable with Unicode.
History
- 2nd July, 2022: Initial version