std::string_view can be used to optimize both performance and readability in code sections which handle strings. However, it could also cause unwanted behavior. This is an example for a case when great power is followed by great responsibility.
Communication is a powerful tool of humanity. It allows us to transfer ideas and thoughts from one to another. There are many ways of communications, and one of them is words. In order to communicate using words, we need the ability to understand each word in a sentence and to understand the way the words connect with each other. Words analysis today became a major topic in development, and a lot of AI tools are trying to perform words analysis (NLP techniques and more). However, string
s might be painful in some legacy C++ versions (11/14), and the understanding that the language needs some more abilities to handle string
s got attention in C++17, with std::string_view
.
std::string
Before getting into details of why std::string_view
is so important, we need to first discuss about the abilities of std::string
.
std::string
is basically a friendly wrapper for char*
/char[]
. It allows us to store a continuous memory allocation of char
s, modifying it, iterating, and eventually displaying it. For example:
std::string str = "My str";
std::string prefix = "My ";
if (str.compare(0, prefix.size(), pre) == 0) {
std::cout << str.substr(prefix.size()); }
Now, the compare
function seems a little bit of C language, but there is another pitfall here. std::string::substr
costs us with another std::string
allocation. Because it doesn’t modify the original string
instance, it returns a new string
instance, which really isn’t needed here. In order to avoid that, we have to do something like that:
for (size_t i = prefox.size(); i < str.size(); ++i) {
std::cout << str[i];
}
Let’s see another usage example:
bool validate(const std::string& str) {
std::string start = "lstart", stop = "lstop";
return str.compare(0, start.size(), start) == 0 &&
str.compare(str.size() - stop.size(), stop.size(), stop) == 0;
}
Here, we don’t have any copies performed. But it’s easy to cause one if forgetting the &
sign in the function’s signature. But again, C’s compare syntax.
std::string_view (C++17)
Since C++17, we can use string_view
instance to watch a continuous memory allocation which is already allocated. That means that we can get a sub string view, which supports iterations and comparisons actions, without allocating a new std::string
instance for that, and avoiding C syntax, at the same time.
std::string str = "My str";
std::string prefix = "My ";
std::string_view str_v = str; if (str_v.substr(0, prefix.size()) == prefix) { std::cout << str_v.substr(prefix.size()); }
It means that for the validate
function, we can now simply pass a std::string_view
without any need for const &
specifiers:
bool validate(std::string_view str) {
std::string start = "lstart", stop = "lstop";
return str.substr(0, start.size()) == start &&
str.substr(str.size() - stop.size()) == stop;
}
How Does It Work?
std::string_view
is actually a structure which contains a pointer to the start of the char
s buffer, and a size. This information is passed in the constructors, and extracted to a new instance in substr
function. When constructing a std::string_view
instance from a std::string
instance, we are actually using std::string::operator basic_string_view
and then constructing a std::string_view
from a std::string_view
.
One Step Forward
std::string_view
can also be constructed out of a char*
instance, or from char*
& size_t
parameters. That means, that in case we only need to watch and analyse compile time string
s (that are being saved into the binary source, and therefore their addresses are valid to use), we can assign them directly to a std::string_view
instance, instead of constructing a std::string
instance at first.
std::string_view str = "My str"; std::string_view prefix = "My ";
if (str.substr(0, prefix.size()) == prefix) {
std::cout << str.substr(prefix.size());
}
bool validate(std::string_view str) {
std::string_view start = "lstart", stop = "lstop"; return str.substr(0, start.size()) == start &&
str.substr(str.size() - stop.size()) == stop;
}
* Important to know: When constructing a std::string_view
instance out of char*
without specifying the length, the length will be defined be the first null
character. It’s important to use it carefully. We’ll discuss this issue further.
C++20/23 Extensions
New standards brought new useful features to std::string_view
and to std::string
objects. In C++20, we got two new member functions: strats_with
& ends_with
(which are perfect for the examples above), and since C++23, we also have contains
member function:
std::string_view str = "My str";
std::string_view prefix = "My ";
if (str.starts_with(prefix)) {
std::cout << str.substr(prefix.size());
}
bool validate(std::string_view str) {
return str.starts_with("lstart") && str.ends_with("lstop");
}
Constexpr
All of the above functions can be made or used within a constexpr
context. Since std::string_view
doesn’t allocate any new data, it’s an open window for compile time programming.
constexpr std::string_view str = "My str";
constexpr std::string_view prefix = "My ";
if (str.starts_with(prefix)) {
std::cout << str.substr(prefix.size());
}
constexpr bool validate(std::string_view str) {
return str.starts_with("lstart") && str.ends_with("lstop");
}
Bad Practices & Potential Issues
std::string_view
is designed to allow better performances when analysing string
s. However, there is always a trade-off between performances and safety, and that trade-off takes a major place when dealing with std::string_view
.
Rule #1: Never Return a std::string_view
std::string_view func() {
std::string str;
std::cin >> str;
return str;
}
This innocent function leads to an unsafe memory access. The str
is allocating a new place on the heap for the input characters. After returning a std::string_view
of this memory, in the destructor, it releases the allocated memory. That means that the return std::string_view
is now pointing to a released memory.
That being said, returning a std::string_view
won’t always cause an unsafe memory access. Cases where the returned std::string_view
pointing to a static
memory, or to a memory that is accessible outside the function, are still valid, but can become invalid in the future, so the safe way is to forbid returning of std::string_view
in any case.
Rule #2: Careful with Null Terminator
As mentioned before, null
terminator is highly not recommended to use, and should always be kept in mind when using std::string_view
.
std::string_view str = "my cool str";
str.remove_prefix(str.find(" "));
str.remove_suffix(str.size() - str.rfind(" "));
std::cout << str; std::cout << str.data();
remove_prefix
& remove_suffix
only change the start & end point of the view. That means, that the remove_suffix
function doesn’t insert a null
terminator at the end, so printing the underlying data won’t be affected from it. In order to fix it, we can modify the owner string
(if such exists) or constructing a std::string
from it, so it’ll perform it for us (without modifying the original string
).
{ std::string str = "cool str";
std::string_view str_v = str;
str_v.remove_suffix(4);
str[4] = '\0';
std::cout << str << "\n"; std::cout << str_v << "\n"; std::cout << str_v.data(); }
{ std::string_view str = "cool str";
str.remove_suffix(4);
std::string modified_str(str);
std::string_view mstr_v = modified_str;
std::cout << str << "\n"; std::cout << str.data() << "\n"; std::cout << modified_str << "\n"; std::cout << mstr_v << "\n"; std::cout << mstr_v.data(); }
Rule #3: Don’t Lose Ownership
It’s important to remember that std::string_view
doesn’t own the contained string
, and therefore won’t protect or release it. In addition to UB or illegal memory access, it may cause in some cases, memory leak (might be caused due to converting an existing code which uses std::string
to use std::string_view
):
const char* get() { return new char[]{"my new str"}; }
{
std::string_view str = get();
str.remove_prefix(1); }
Conclusion
std::string_view
can be used to optimize both performance and readability in code sections which handle string
s. However, any usage comes with an additional responsibility to use it in the correct way, and avoiding unwanted behavior (especially on scaling and when code modifications enter to the picture). It is another example for a case when a great power is followed by a great responsibility.