Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / Markdown

C++17: string_view Conversion to Integral Types

2.94/5 (7 votes)
5 Jun 2018CPOL3 min read 34K   370  
Implementing string_view conversion to integral types using Boost Spirit Qi v2

Table of Contents

Rationale

Before any discussion is started on string_view, we have to revisit a C API: strtok() whose purpose is to split the char array into tokens.

char * strtok ( char * str, const char * delimiters );
#include <cstdio>
#include <algorithm>

int main()
{
    using namespace std;

    char str[] = "Apple,Orange,Mango";
    /* dup_arr will contains str later */
    char dup_arr[sizeof(str)];
    const char s[2] = ",";
    char *token;

    /* display str address */
    printf("str address:%p\n", str);

    /* get the first token */
    token = strtok(str, s);

    /* walk through other tokens */
    while (token != NULL)
    {
        /* display token */
        printf("%s\n", token);
        /* display token address */
        printf("token address:%p\n", token);

        /* copy str into dup_arr */
        memcpy(dup_arr, str, sizeof(dup_arr));
        /* replace the null char with \x35 which is # */
        std::replace(begin(dup_arr), end(dup_arr)-1, 0, 35);
        /* display dup_arr */
        printf("%s\n", dup_arr);

        /* get next token */
        token = strtok(NULL, s);
    }

    return(0);
}

The output is shown below. I added ^ to indicate where the token is pointing at in the str. I use # represents the null terminator since it is non-printable character.

str address:008FF704

Apple
token address:008FF704
Apple#Orange,Mango
^

Orange
token address:008FF70A
Apple#Orange#Mango
      ^

Mango
token address:008FF711
Apple#Orange#Mango
             ^

strtok has 2 problems: it modifies str parameter but its redeeming point is it is very fast as it does not have to allocate string for token. The other problem is it cannot split string with empty token: Example: ",," because that would turn into "##" and cause strtok to return null which signal to the client code prematurely that it has reached the end of str. This is where C++17 string_view comes to rescue: string_view contains a char pointer and a size as data members ,and includes many of the useful member functions which std::string has. Its length does not include null terminator, meaning a string_view does not have to be null terminated, making it a perfect candidate to write a C++17 constness correct strtok. But this article is not about writing string_view version of strtok.

This is in-place string modification is widely used in Fast XML DOM parsers like RapidXML and Pugixml. RapidJSON is a JSON parser that make use of this feature as well.

// original xml text
<Fruit name="Orange" type="Citrus" />
// mutated xml text
<Fruit#name#"Orange# type#"Citrus# />
 ^     ^     ^       ^     ^

XML parser cannot totally avoid string allocation if the text is strictly immutable and needs to be unescaped(shown below) or the text is modified to be longer.

<Food name="Ben &amp; Jerry" type="Ice Cream" />

"Ben &amp; Jerry" needs to be unescaped to "Ben & Jerry"

Conversion to float and integer

For conversion, we use Boost Spirit Qi. str_to_value is a overloaded template function which works for std::string, string_view and char array (not char pointer). For demo purpose, we use Boost string_ref because string_view is not yet available in Visual C++ yet. For simplicity, other overloads are not shown. Reader can view them in str_to_value.h. float, short, long and long long, together with their unsigned counterparts are supported.

#include <string>
#include <iostream>
#include <boost/utility/string_ref.hpp> 
#include <boost/spirit/include/qi.hpp>

template<typename string_type>
inline bool str_to_value(const string_type& src, double& dest)
{
    namespace qi = boost::spirit::qi;

    return qi::parse(std::cbegin(src), std::cend(src), qi::double_, dest);
}

template<typename string_type>
inline bool str_to_value(const string_type& src, int& dest)
{
    namespace qi = boost::spirit::qi;

    return qi::parse(std::cbegin(src), std::cend(src), qi::int_, dest);
}

int main(int argc, char *argv [])
{
    boost::string_ref srd("123.456");
    double d = 0.0;
    if (str_to_value(srd, d))
    {
        std::cout << d << std::endl; // display 123.456
    }

    boost::string_ref srn("123");
    int n = 0;
    if (str_to_value(srn, n))
    {
        std::cout << n << std::endl; // display 123
    }

    return 0;
}

Boost Spirit Qi Benchmark

C++ String to Double Benchmark (Looping 1 million times)

Version 1.1.1 double benchmark

Latest double benchmark which fixes crack_atof scientific notation conversion problem and improves its performance by 10% and puts it on par with fast_atof.

              atof:  100ms
      lexical_cast:  648ms
std::istringstream:  677ms <== Probably unfair comparison since istringstream instaniate a string
         std::stod:  109ms
       std::strtod:   96ms
        crack_atof:    7ms
         fast_atof:    7ms <== do not use this one because conversion is not correct.
      boost_spirit:   17ms <== reported to be inaccurate in some case
      google_dconv:   38ms
   std::from_chars:   71ms

C++ String to Integer Benchmark (Looping 10 million times)

              atol:  243ms
      lexical_cast:  952ms
std::istringstream: 5338ms
        std::stoll:  383ms
       simple_atol:   74ms
        sse4i_atol:   72ms
      boost_spirit:   78ms
   std::from_chars:   59ms

Summary

  • Originally string_view is called string_ref in Boost
  • String view is not null-terminated, so atoi() cannot be used
  • Use Boost Spirit Qi
    • Can be used for std::string, char array. char ptr is not supported
    • Or any class that has cbegin() and cend()
    • Caveat: somebody report Boost Spirit Qi floating point conversion are not accurate

Related Source Code Repositories are below.

History

  • 1st Oct 2016: First release
  • 5th Apr 2017: Updated string-to-float benchmark with strtod and Google double conversion
  • 6th June 2018: Uploaded floatbench 1.1.0 which fixes crack_atof() scientific notation conversion problem and its performance improves by 10%. Thanks to Tian Bo.
  • 7th June 2018: Uploaded intbench 1.1.0 which includes std::from_chars into the benchmark. std::from_chars requires C++17 support and VC++ have problems compiling it for floating point conversion, thus it is only added in intbench for now.
  • 27th Oct 2018: Uploaded floatbench 1.1.2 which includes std::from_chars from VS 2018 Update 15.8 into the benchmark.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)