Unicode / Creating, Writing, Appending / a text (*.txt) File

Member 15078716

2.45/5 (5 votes)

1 Jul 2022CPOL5 min read

4.6K

Unicode / Creating, Writing, Appending / a text (*.txt) file - how to do it

In this tip, you will see how to create a text (.txt) file with a Unicode utf-8 file name, write a BOM to the beginning of the same file, and append the same file with Unicode string.

Introduction

This post discusses the following:

Creating a text (.txt) file with a Unicode utf-8 file name
Writing a BOM to the beginning of the same file
Appending the same file with Unicode string

Background

I used to use Microsoft Visual Studio a lot for many years and I wrote many programs in it, some of which dealt with intense engineering design and testing. I was aware that Visual Studio (VS) is a cripple that did things for me fast. It is called a Rapid Application Development environment because it is that. But, I wanted more control over my code and I did not want to have to guess that the VS cripples were doing as I desired. I had to see the actual code. I had to be the one that coded it in.

Thus, when I retired, I stopped using Visual Studio and jumped directly into intermediate to advanced C++11. I studied available IDEs and available compilers and chose Code::Blocks 17.12 with MinGW and GCC 5.1 for its stability and usability. I rejected wxWidgets which comes with CodeBlocks since it (in my opinion) is just another cripple for RAD similar to VS and I did not want it.

A RAD is great if you are in a desperate hurry. I am retired and I can now program in C and C++ all day and all night if I like.

For the first few months, I was using Bjorne's books and studying there. I have not needed to use those except rarely since then.

C and C++ is so much better than what I had used before.

About 2 years later, I can now code in a large program that does more and is more cross-platform adjustable via the IDE (Code::Blocks 17.12) than I did before.

This article (yea, they told me to call it a "Tip", but it really is an article) addresses a realized need (to some extent) to use Unicode. Unicode is vast. Its potential is vast. When I left the limitations of Visual Studio, I also separately left the limitations of ANSII and now my code is in C and C++ and I can now actually program in Unicode (yes, I now can whether you understand that or not) and my interface is in Unicode. This does not tell all about the use of Unicode in C and C++, but it does give a working example that is to a great extent backward compatible and forward compatible. Read the limits that I disclose herein and enjoy.

Problems getting here:

Many people have a disgusting and perverted attitude against anything prevous, like Microsoft Windows' previous versions older than they personally are using, and they berate any and all that do not worship what is being advertised by marketers as "modern".

Microsoft has many problems with hard-coded C and C++ doing Unicode stuff. I understand. Microsoft, in the past, created lots of "code pages" in which they seem to have hard coded Unicode symbols to force their operating system to work with Unicode before C and C++ was developed enough to program sufficiently with it in mind. I get it. Microsoft did a great job of forcing their operating systems to be compatible with Unicode. But, Microsoft does not seem to have gotten beyond that. They still own the market, so to say, but they need to deal with that limitation and at the same time, be backward compatible. Maybe Bjorne can be hired for some astronomical price to help them.

Code::Blocks does not guess and suggest like Visual Studio did, but that is fine with me. It took some getting used to, but I like it, I like it a lot.

The biggest problem has been that the committee that stamps its approval on Bjorne's versions of C++ seems to have been overwhelmed by Unicode and thus has precipitated a lack of compiler development in that direction.

For these problems, I have and I do and I will get past them and my code will work for me as I desire.

Using the Code

I retain all of my rights for this article.
I proclaim this to be an open source article.
Codeproject.com now has rights to this article as they have chosen to publish it on their site.

The following sets up some basics (not all) that encapsulate or address the code that I am allowing you to see.

This was done on Microsoft Windows XP Professional 32 bit with service pack 2 (certainly not 3).

Using Code::Blocks 17.12 with MinGW with GCC 5.1 set for C++11.

Using a Graphical User Interface (GUI).

THIS IS SO IMPORTANT:

In Code::Blocks 17.12 / Settings / Editor (Configure editor) (General Settings) / Encoding settings / Encoding / "Use encoding when opening files:" UTF-8 / "Use this encoding" (+) "As default encoding (bypassing C::B's auto-detection)" / [+] "If conversion fails using the settings above, try system local settings"

and:

C++

int WINAPI WinMain(HINSTANCE hInstance, HINSTANCE, LPSTR, int nCmdShow)

and at the start of the code before anything else:

C++

#define _UNICODE
#define UNICODE

and other stuff, but that should get you close.

For testing, put the following example into your WinMain right before the following:

C++

while(GetMessage(&Msg, nullptr, 0, 0) > 0)
    {
        TranslateMessage(&Msg);
        DispatchMessage(&Msg);
    }
return Msg.wParam;

Then after testing is done, and after you are satisfied with all of your adjustments and your own version of error checking, then move the following to a function so that you get a nice clean up.

__________

The code:

C++

///
///         UNICODE     THIS WORKS      START                   ///
///

    // This is a very simple example and it does not include a lot of error checking.
    // After you test this example, and before moving on to other code, 
    // place error checking in as you like.

    // Read the file in Microsoft Windows with NotePad,
    // not with Wordpad as some versions of Wordpad do not handle BOM correctly.
    // Notepad, not Wordpad.
    // Notepad.

    // File name
    const wchar_t* TheFile;
    TheFile = L"utf8_UsingByteOrderMark_C_天堂.txt";  // I think that is 
                                                      // Chinese for hello.

// --------------------------------------------------

    // Create a file
    //     by using the CreateFile dwCreationDisposition of "CREATE_ALWAYS",
    //     by deleting any old file with the same name in the specified directory,
    //     and then creating a new file by that name in that directory.
    // This time use "CREATE_ALWAYS".
    HANDLE hDFile01;
    hDFile01 = CreateFile(TheFile, GENERIC_READ | GENERIC_WRITE, 0, 
               NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
            // A simple error checking example
                if (hFile == INVALID_HANDLE_VALUE)
                    {
                        MessageBox(nullptr, 
                        L"Could not open utf8_UsingByteOrderMark_C_天堂.txt", 
                        L"Could not open utf8_UsingByteOrderMark_C_天堂.txt", 
                        MB_ICONEXCLAMATION | MB_OK);
                        // return; or something else as you decide.
                    }

    DWORD NumberOfBytesWritten01;

    BOOL bErr01;
    bErr01 = false;

    // In this new file which is blank,
    //     place a byte order mark to tell later file readers 
    //     that this is a Unicode encoded file which uses utf-8.
    unsigned char BOM01[3]{ 0xef, 0xbb, 0xbf };

    bErr01 = WriteFile(hDFile01, (LPCVOID)BOM01, (DWORD)sizeof(BOM01), 
             &NumberOfBytesWritten01, NULL);

    // This file was created and opened up new by using "CREATE_ALWAYS".
    // Close the file then later open it up again for the append version "OPEN_ALWAYS".
    CloseHandle(hDFile01);

// --------------------------------------------------

    // Append a file
    //     by using the CreateFile dwCreationDisposition of "OPEN_ALWAYS",
    //     if the old file exists with the same name in the specified directory 
    //     then append to it,
    //     but if the old file does not exist, then create it.
    // This time use "OPEN_ALWAYS".
    HANDLE hDFile03;
    hDFile03 = CreateFile(TheFile, GENERIC_READ | GENERIC_WRITE, 0, 
               NULL, OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);

    DWORD NumberOfBytesWritten02;

    BOOL bErr02;
    bErr02 = false;

    // In the following, I am showing you the result of 
    // limiting the size of your writing
    //     by the DWORD being 90 instead of the entire length of the text being sent.
    // Examine the result to see the effect.
    bErr02 = WriteFile(hDFile03, L"hello - J - こんにちは - 
             abcdefghijklmnopqrstuvwxyz", 90, &NumberOfBytesWritten02, NULL);// I think 
                                              // that is Japanese for hello.

    CloseHandle(hDFile03);
///
///         UNICODE     THIS WORKS      END                                 ///
///

That is it.

EOL

Points of Interest

My avocados come from Brazil, or Mexico, etc. My olive oil comes from the USA, or Greece (via a hidden transfer to Germany for processing), etc. My gasoline and oil comes from the Middle East, or South America, or even Russia (I guess if they pay off whichever the current flakes are in Washington at the time of delivery). The steel in my vehicle might come from Germany, or Russia, or China, but so many sources are mixed in the cauldrin that it is hard to tell. With all that, it is wise to make all of your internationally marketable software very Unicode capable.

Two or three years into this and I am finding C and C++ to be easy. It starts out difficult, then gets worse, then goes through multiple levels of hair pulling exasperations, then it is easy and fun.

Do the Unicode thing. Try not to pull your hair out. But, get comfortable with Unicode.

History

2^nd July, 2022: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)