SUTFCPP is а C++ header-only library that fills the C++17 standard gap in support for Unicode strings. The standard doesn't give us any helpers for converting strings of different widths to each other, as well as any tool for iterating by code points. The library was created to work exclusively with Unicode, without support for other encodings. The main features are listed below:
- Easy to use: the library is header-only
- Small: consists of few header files, there are no dependencies
- Cross-platform: supports MSVC, GCC and CLANG C++17 compilers
- Has compile time support
The library implements two level APIs:
- Low level API for code point and code unit manipulations:
namespace sutf
{
constexpr it_t code_point_next(it_t it) noexcept;
constexpr uint_t code_point_read(it_t it) noexcept;
constexpr it_t code_point_write(it_t it, uint_t cp) noexcept;
constexpr uint_t code_point_count(it_t it, const it_t last) noexcept;
constexpr uint_t code_point_count(const type_t& str) noexcept;
constexpr out_t code_point_convert(in_t src, const in_t last, out_t dst) noexcept;
constexpr uint_t code_unit_count<codeuint_t>(uint_t cp) noexcept;
constexpr uint_t code_unit_count<codeuint_t>(it_t it, const it_t last) noexcept;
constexpr uint_t code_unit_count<codeuint_t>(const type_t& str) noexcept
}
- High level API for strings and buffers:
namespace sutf
{
string to_string(const string_t& str);
wstring to_wstring(const string_t& str);
u8string to_u8string(const string_t& str);
u16string to_u16string(const string_t& str);
u32string to_u32string(const string_t& str);
basic_string<char_t> to_anystring(basic_string<char_t> str);
uint_t convert(const src_t& src, dst_t& dst);
}
Implementation
Integration
#include <sutfcpplib/utf_codepoint.h> // Include only code unit and codepoint support
#include <slimcpplib/utf_string.h> // Include full UTF support
using namespace std::literals;
String Literal Declarations
auto str_utf8 = u8"\U00000041\U000000a9\U00002190\U0001f602\
U00000042\U000000ae\U00002705\U0001f973"sv; auto str_utf16_or_utf32 = L"\U00000041\U000000a9\U00002190\
U0001f602\U00000042\U000000ae\U00002705\U0001f973"sv; auto str_utf16 = u"\U00000041\U000000a9\U00002190\U0001f602\
U00000042\U000000ae\U00002705\U0001f973"sv; auto str_utf32 = U"\U00000041\U000000a9\U00002190\U0001f602\
U00000042\U000000ae\U00002705\U0001f973"sv;
Limitations
- The high-level functions, such as
to_string()
or convert()
, use memory allocation in their implementation and can't be used in compile time expressions, in correspondence to C++17 standard. - The low-level functions do not check that code points are valid according to the Unicode standard, always assume that the input buffer is code point aligned and output buffer has enough space.
Examples
- main.cpp - examples of using the main interface of the library with comments