Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C++20

The View of the String - Breaking Down std::string_view

0.00/5 (No votes)
24 Apr 2023CPOL5 min read 6.3K  
std::string_view can optimize both performance and code readability in code sections which handle strings, but, it can also lead to UB and memory issues if used incorrectly.
std::string_view can be used to optimize both performance and readability in code sections which handle strings. However, it could also cause unwanted behavior. This is an example for a case when great power is followed by great responsibility.

Communication is a powerful tool of humanity. It allows us to transfer ideas and thoughts from one to another. There are many ways of communications, and one of them is words. In order to communicate using words, we need the ability to understand each word in a sentence and to understand the way the words connect with each other. Words analysis today became a major topic in development, and a lot of AI tools are trying to perform words analysis (NLP techniques and more). However, strings might be painful in some legacy C++ versions (11/14), and the understanding that the language needs some more abilities to handle strings got attention in C++17, with std::string_view.

std::string

Before getting into details of why std::string_view is so important, we need to first discuss about the abilities of std::string.

std::string is basically a friendly wrapper for char*/char[]. It allows us to store a continuous memory allocation of chars, modifying it, iterating, and eventually displaying it. For example:

C++
std::string str = "My str";
std::string prefix = "My ";
if (str.compare(0, prefix.size(), pre) == 0) {
    std::cout << str.substr(prefix.size()); // "str"
}

Now, the compare function seems a little bit of C language, but there is another pitfall here. std::string::substr costs us with another std::string allocation. Because it doesn’t modify the original string instance, it returns a new string instance, which really isn’t needed here. In order to avoid that, we have to do something like that:

C++
for (size_t i = prefox.size(); i < str.size(); ++i) {
    std::cout << str[i];
}

Let’s see another usage example:

C++
bool validate(const std::string& str) {
    std::string start = "lstart", stop = "lstop";
    return str.compare(0, start.size(), start) == 0 && 
           str.compare(str.size() - stop.size(), stop.size(), stop) == 0;
}

Here, we don’t have any copies performed. But it’s easy to cause one if forgetting the & sign in the function’s signature. But again, C’s compare syntax.

std::string_view (C++17)

Since C++17, we can use string_view instance to watch a continuous memory allocation which is already allocated. That means that we can get a sub string view, which supports iterations and comparisons actions, without allocating a new std::string instance for that, and avoiding C syntax, at the same time.

C++
std::string str = "My str";
std::string prefix = "My ";
std::string_view str_v = str;                   // no allocation performed
if (str_v.substr(0, prefix.size()) == prefix) { // no allocation
    std::cout << str_v.substr(prefix.size());   // no allocation
}

It means that for the validate function, we can now simply pass a std::string_view without any need for const & specifiers:

C++
bool validate(std::string_view str) {
    std::string start = "lstart", stop = "lstop";
    return str.substr(0, start.size()) == start && 
           str.substr(str.size() - stop.size()) == stop;
}

How Does It Work?

std::string_view is actually a structure which contains a pointer to the start of the chars buffer, and a size. This information is passed in the constructors, and extracted to a new instance in substr function. When constructing a std::string_view instance from a std::string instance, we are actually using std::string::operator basic_string_view and then constructing a std::string_view from a std::string_view.

One Step Forward

std::string_view can also be constructed out of a char* instance, or from char* & size_t parameters. That means, that in case we only need to watch and analyse compile time strings (that are being saved into the binary source, and therefore their addresses are valid to use), we can assign them directly to a std::string_view instance, instead of constructing a std::string instance at first.

C++
std::string_view str = "My str"; // no string allocation
std::string_view prefix = "My ";
if (str.substr(0, prefix.size()) == prefix) {
    std::cout << str.substr(prefix.size());
}
bool validate(std::string_view str) {
    std::string_view start = "lstart", stop = "lstop"; // no string allocations
    return str.substr(0, start.size()) == start && 
           str.substr(str.size() - stop.size()) == stop;
}

* Important to know: When constructing a std::string_view instance out of char* without specifying the length, the length will be defined be the first null character. It’s important to use it carefully. We’ll discuss this issue further.

C++20/23 Extensions

New standards brought new useful features to std::string_view and to std::string objects. In C++20, we got two new member functions: strats_with & ends_with (which are perfect for the examples above), and since C++23, we also have contains member function:

C++
std::string_view str = "My str";
std::string_view prefix = "My ";
if (str.starts_with(prefix)) {
    std::cout << str.substr(prefix.size());
}
bool validate(std::string_view str) {
    return str.starts_with("lstart") && str.ends_with("lstop");
}

Constexpr

All of the above functions can be made or used within a constexpr context. Since std::string_view doesn’t allocate any new data, it’s an open window for compile time programming.

C++
constexpr std::string_view str = "My str";
constexpr std::string_view prefix = "My ";
if (str.starts_with(prefix)) {
    std::cout << str.substr(prefix.size());
}
constexpr bool validate(std::string_view str) {
    return str.starts_with("lstart") && str.ends_with("lstop");
}

Bad Practices & Potential Issues

std::string_view is designed to allow better performances when analysing strings. However, there is always a trade-off between performances and safety, and that trade-off takes a major place when dealing with std::string_view.

Rule #1: Never Return a std::string_view

C++
std::string_view func() {
    std::string str;
    std::cin >> str;
    return str;
}

This innocent function leads to an unsafe memory access. The str is allocating a new place on the heap for the input characters. After returning a std::string_view of this memory, in the destructor, it releases the allocated memory. That means that the return std::string_view is now pointing to a released memory.

That being said, returning a std::string_view won’t always cause an unsafe memory access. Cases where the returned std::string_view pointing to a static memory, or to a memory that is accessible outside the function, are still valid, but can become invalid in the future, so the safe way is to forbid returning of std::string_view in any case.

Rule #2: Careful with Null Terminator

As mentioned before, null terminator is highly not recommended to use, and should always be kept in mind when using std::string_view.

C++
std::string_view str = "my cool str";
str.remove_prefix(str.find(" "));
str.remove_suffix(str.size() - str.rfind(" "));
std::cout << str;        // "cool" - OK
std::cout << str.data(); // "cool str"

remove_prefix & remove_suffix only change the start & end point of the view. That means, that the remove_suffix function doesn’t insert a null terminator at the end, so printing the underlying data won’t be affected from it. In order to fix it, we can modify the owner string (if such exists) or constructing a std::string from it, so it’ll perform it for us (without modifying the original string).

C++
{ // Modifying owner
    std::string str = "cool str";
    std::string_view str_v = str;
    str_v.remove_suffix(4);
    str[4] = '\0';

    std::cout << str << "\n"; // "cool\0str"
    std::cout << str_v << "\n"; // "cool"
    std::cout << str_v.data(); // "cool"
}

{ // Allocating a new string
    std::string_view str = "cool str";
    str.remove_suffix(4);
    std::string modified_str(str);
    std::string_view mstr_v = modified_str;
    
    std::cout << str << "\n";          // "cool"
    std::cout << str.data() << "\n";   // "cool str"
    std::cout << modified_str << "\n"; // "cool"
    std::cout << mstr_v << "\n";       // "cool"
    std::cout << mstr_v.data();        // "cool"
}

Rule #3: Don’t Lose Ownership

It’s important to remember that std::string_view doesn’t own the contained string, and therefore won’t protect or release it. In addition to UB or illegal memory access, it may cause in some cases, memory leak (might be caused due to converting an existing code which uses std::string to use std::string_view):

C++
const char* get() { return new char[]{"my new str"}; }
{
    std::string_view str = get();
    // Here we can call: delete str.data()
    str.remove_prefix(1);// Memory leak!
    // delete str.data() // Invalid call here. 
                         // The pointer doesn't point to the allocated section head.
}

Conclusion

std::string_view can be used to optimize both performance and readability in code sections which handle strings. However, any usage comes with an additional responsibility to use it in the correct way, and avoiding unwanted behavior (especially on scaling and when code modifications enter to the picture). It is another example for a case when a great power is followed by a great responsibility.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)