Table of Contents
Before any discussion is started on string_view
, we have to revisit a C API: strtok()
whose purpose is to split the char
array into tokens.
char * strtok ( char * str, const char * delimiters );
#include <cstdio>
#include <algorithm>
int main()
{
using namespace std;
char str[] = "Apple,Orange,Mango";
char dup_arr[sizeof(str)];
const char s[2] = ",";
char *token;
printf("str address:%p\n", str);
token = strtok(str, s);
while (token != NULL)
{
printf("%s\n", token);
printf("token address:%p\n", token);
memcpy(dup_arr, str, sizeof(dup_arr));
std::replace(begin(dup_arr), end(dup_arr)-1, 0, 35);
printf("%s\n", dup_arr);
token = strtok(NULL, s);
}
return(0);
}
The output is shown below. I added ^
to indicate where the token
is pointing at in the str
. I use #
represents the null terminator since it is non-printable character.
str address:008FF704
Apple
token address:008FF704
Apple#Orange,Mango
^
Orange
token address:008FF70A
Apple#Orange#Mango
^
Mango
token address:008FF711
Apple#Orange#Mango
^
strtok
has 2 problems: it modifies str
parameter but its redeeming point is it is very fast as it does not have to allocate string for token. The other problem is it cannot split string with empty token: Example: ",,"
because that would turn into "##"
and cause strtok
to return null which signal to the client code prematurely that it has reached the end of str
. This is where C++17 string_view
comes to rescue: string_view
contains a char
pointer and a size
as data members ,and includes many of the useful member functions which std::string
has. Its length does not include null terminator, meaning a string_view
does not have to be null terminated, making it a perfect candidate to write a C++17 constness correct strtok
. But this article is not about writing string_view
version of strtok
.
This is in-place string modification is widely used in Fast XML DOM parsers like RapidXML and Pugixml. RapidJSON is a JSON parser that make use of this feature as well.
<Fruit name="Orange" type="Citrus" />
<Fruit#name#"Orange# type#"Citrus# />
^ ^ ^ ^ ^
XML parser cannot totally avoid string allocation if the text is strictly immutable and needs to be unescaped(shown below) or the text is modified to be longer.
<Food name="Ben & Jerry" type="Ice Cream" />
"Ben & Jerry" needs to be unescaped to "Ben & Jerry"
For conversion, we use Boost Spirit Qi. str_to_value
is a overloaded template function which works for std::string
, string_view
and char
array (not char
pointer). For demo purpose, we use Boost string_ref
because string_view
is not yet available in Visual C++ yet. For simplicity, other overloads are not shown. Reader can view them in str_to_value.h
. float
, short
, long
and long long
, together with their unsigned
counterparts are supported.
#include <string>
#include <iostream>
#include <boost/utility/string_ref.hpp>
#include <boost/spirit/include/qi.hpp>
template<typename string_type>
inline bool str_to_value(const string_type& src, double& dest)
{
namespace qi = boost::spirit::qi;
return qi::parse(std::cbegin(src), std::cend(src), qi::double_, dest);
}
template<typename string_type>
inline bool str_to_value(const string_type& src, int& dest)
{
namespace qi = boost::spirit::qi;
return qi::parse(std::cbegin(src), std::cend(src), qi::int_, dest);
}
int main(int argc, char *argv [])
{
boost::string_ref srd("123.456");
double d = 0.0;
if (str_to_value(srd, d))
{
std::cout << d << std::endl;
}
boost::string_ref srn("123");
int n = 0;
if (str_to_value(srn, n))
{
std::cout << n << std::endl;
}
return 0;
}
C++ String to Double Benchmark (Looping 1 million times)
Version 1.1.1 double benchmark
Latest double benchmark which fixes crack_atof
scientific notation conversion problem and improves its performance by 10% and puts it on par with fast_atof
.
atof: 100ms
lexical_cast: 648ms
std::istringstream: 677ms <== Probably unfair comparison since istringstream instaniate a string
std::stod: 109ms
std::strtod: 96ms
crack_atof: 7ms
fast_atof: 7ms <== do not use this one because conversion is not correct.
boost_spirit: 17ms <== reported to be inaccurate in some case
google_dconv: 38ms
std::from_chars: 71ms
C++ String to Integer Benchmark (Looping 10 million times)
atol: 243ms
lexical_cast: 952ms
std::istringstream: 5338ms
std::stoll: 383ms
simple_atol: 74ms
sse4i_atol: 72ms
boost_spirit: 78ms
std::from_chars: 59ms
- Originally
string_view
is called string_ref
in Boost - String view is not null-terminated, so
atoi()
cannot be used - Use Boost Spirit Qi
- Can be used for
std::string
, char array. char ptr is not supported - Or any class that has
cbegin()
and cend()
- Caveat: somebody report Boost Spirit Qi floating point conversion are not accurate
Related Source Code Repositories are below.
- 1st Oct 2016: First release
- 5th Apr 2017: Updated string-to-float benchmark with
strtod
and Google double conversion - 6th June 2018: Uploaded floatbench 1.1.0 which fixes
crack_atof()
scientific notation conversion problem and its performance improves by 10%. Thanks to Tian Bo. - 7th June 2018: Uploaded intbench 1.1.0 which includes
std::from_chars
into the benchmark. std::from_chars
requires C++17 support and VC++ have problems compiling it for floating point conversion, thus it is only added in intbench for now. - 27th Oct 2018: Uploaded floatbench 1.1.2 which includes
std::from_chars
from VS 2018 Update 15.8 into the benchmark.