Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C++

UTF-8 in Windows - INI Files

4.94/5 (6 votes)
2 Apr 2020MIT4 min read 10.9K   292  
How to handle UTF-8 in Windows INI files
The code shown in this tip makes it easy to keep the application settings in an INI file using UTF-8 encoding.

Introduction

In my previous article, "Doing UTF-8 in Windows", I showed how you can work with UTF-8 using basically only two functions, utf8::narrow and utf8::widen. For general file I/O, you just have to convert the file name from UTF-8 to UTF-16 and all the reading and writing functions remain unchanged:

FILE *f = utf8::fopen (u8"ܐܪܡܝܐ.txt", "w");

fputs (u8"This text is in Aramaic ܐܪܡܝܐ", f);
fclose (f);

There is one case that is not covered by these rules: the INI files, also called "profile files" in Microsoft parlance. Although there are many other ways of storing application settings, INI files are still widely used either for compatibility reasons or because they are simple to work with.

The problem is that the basic Windows API calls for reading and writing INI files, GetPrivateProfileString and PutPrivateProfileString, combine both the file name and the information to be read or written in one API call. As an example, here is the signature of the GetPrivateProfileStringW function:

C++
DWORD GetPrivateProfileStringW(
  LPCWSTR lpAppName,
  LPCWSTR lpKeyName,
  LPCWSTR lpDefault,
  LPWSTR  lpReturnedString,
  DWORD   nSize,
  LPCWSTR lpFileName
);

If we would use the utf8::widen function to convert all our UTF-8 strings, we would end up with an INI file that contains UTF-16 characters.

The solution is to completely forget about the Windows API functions and roll our own implementation for accessing INI files. This is by far not the only implementation of INI files that you can find out there. For a list of implementations, you can check the Wikipedia page. Some of them might be a bit over-hyped; one such project claims to be "the ultimate and most consistent INI file parser library written in C". The only claim I make is that my implementation struggles to be as compatible as possible with the original Windows API.

As such, you will find no arbitrary extensions to the file format and I've done a lot of testing to identify different corner cases. Here are the rules I discovered by trying different combination of calls to the original Windows API:

  • The only comments lines are the ones starting with a semi-colon (hashes are not considered comments by Windows API).
  • There are no trailing comments; anything after the '=' sign is part of the key value.
  • Leading and trailing spaces are removed both from returned strings and from parameters.

The only changes compared to the Windows API are:

  • Line length defaults to 1024 (the INI_BUFFER_SIZE value) while Windows limits it to 256 characters.
  • Files without a path are in current directory while Windows places them in Windows folder.

Implementation

An INI file is implemented as a IniFile object. The basic member functions IniFile::GetString and IniFile::PutString allow you to read or write settings in the INI file like in the code below:

C++
utf8::IniFile test ("test.ini");
test.PutString ("key1", "value11", "section1");
string val = test.GetString ("key1", "section1");

The original Windows API handles only two data types for INI files: strings and integer numbers.

(GetPrivateProfileString and GetPrivateProfileInt functions). I thought it was useful to extend these functions to additional data types and also add some utility functions. This is not an extension of the file format; it is just an extension of the API for accessing these files. Here are some of these functions:

  • PutInt and GetInt for integer values
  • PutDouble and GetDouble for floating point values
  • PutBool and GetBool for boolean variables (when reading, the code, understands things like "on" or "0" or "OFF")
  • PutColor and GetColor for RGB color representations
  • PutFont and GetFont to save and retrieve font settings
  • HasKey and HasSection to check if a key or a section exists in the INI file

Looking at the code, there are a few points of interest.

There is no in-memory buffering for the INI files. Everything is written out to disk as quickly as possible. This was a design decision because:

  1. that's what Windows does and I wanted to be as compatible as possible, and
  2. it is quite annoying when parameters don't get saved if the application has crashed or otherwise unexpectedly ended. This drawback is that INI files become less efficient but are not meant to be general data files.

Moreover, every time a key is written in an INI file, the whole file gets re-written; as I was saying, efficiency was not a design goal. 

Conclusion

The code shown in this article makes it easy to keep the application settings in an INI file using UTF-8 encoding.

This concluded the series about UTF-8 in Windows. The previous two articles in this series are:

For reference, the code included with this article also contains the code from the previous ones.

History

  • 2nd April, 2020: Initial version

License

This article, along with any associated source code and files, is licensed under The MIT License