Introduction
This article discusses several methods of converting strings to numeric data types and back again. I also give some benchmarks of various techniques of performing these conversions and added a few custom functions of my own.
Background
When I first started programming in C++, I found that one of the hardest adjustments from scripting languages like PHP was in converting back and forth between strings and numeric data types. C++ as a language doesn't handle these conversions at all. There are various functions for performing these conversions: (note these are just a subset of the conversion functions you can use)
sprintf()
can be used to convert numeric types to integers
- CRT functions like
itoa
, atoi
, and atof
can be used for converting integers to strings and visa versa
std::strstream
and std::stringstream
enable you to add numeric types and extract as strings and visa versa
- you can relatively easily create custom functions for converting from strings to numeric types
The Benchmarks
The various conversion functions/methods are discussed in greater detail below. If you are like me, however, you skip to the good stuff first, so I'll save you all that scrolling and give you the empirical data right away.
All tests were done on a million conversions with an array of a thousand randomly generated values (doubles or integers in numeric or string form depending on the test). The tests were run several times and the average time used for comparison. There was very little variance from test to test (less than 5%) so I believe these measures are quite accurate. I also tried varying the random number algorithm in order to ensure that the results hold up for the full range of values for a double and an integer (and the different ways you can express them in strings).
All benchmarks were done in release builds with an emphasis on speed. With no optimizations, the results were similar but slower, but the string stream gap really widened with the string streams performing as much as a hundred times slower that the CRT functions.
|
Double to String |
Integer to String |
String to Double |
String to Integer |
strstream |
4,078,000 |
1,765,000 |
2,851,000 |
1,850,000 |
stringstream |
3,788,000 |
2,064,000 |
3,124,000 |
2,334,000 |
_snprintf* |
2,155,000 ("%f ") |
589,000 ("%i ") |
NA |
NA |
CRT functions |
4,285,000 _gcvt() |
192,000 itoa() |
1,490,000 atof() |
60,000 atoi() |
my functions |
NA |
NA |
374,000 |
50,000 |
*There was no measurable performance difference between sprintf
and _snprintf
.
The most surprising part of these results to me was that my own functions actually outperformed the CRT equivalents. My functions are more flexible than the CRT functions in that they will search the string for the first number before starting to convert anything. This can make parsing server responses much easier! My functions also differentiate between the inability to convert a string to a number and a string representing the number zero. Instead of simply returning zero, they throw an InvalidConversionException
when the string simply contains no numbers.
String Streams
Certainly. the easiest way to do type conversions to strings involves the use of string streams. A ToString()
function could be written like so:
template<class T>
void ToString(const T& val, char * str, int count)
{
std::strstream strm(str,count,0);
strm << val;
}
Where val
is the value of just about any basic type to convert and str
is your destination string. No temporary buffer is created, this function makes the conversion right into the destination string.
It's slower, but more intuitive with the STL string class:
template<class T>
std::string ToString(const T& val)
{
std::stringstream<char> strm;
strm << val;
return strm.str();
}
A lot of conversion functions are written like this. In fact, the above is the conversion function I've been using in my C++ projects. It's very simple to use, but comes with a price. If you look at the table above, you'll see that string streams perform 10-20 times worse in every category excepting double to string conversions. The reason for this is you are adding the creation and destruction of a fairly complex object, sometimes the allocation and deallocation of temporary buffers, and adding many layers of code on top of the basic conversion functions.
CRT Functions
My biggest beef with the CRT conversion functions is they have such obscure names! _itoa()
, atoi()
, atof()
, _gcvt()
, _fcvt()
, _ecvt()
, etc. But once you figure out which function does what and how to use them properly, they are very useful. atoi()
was only 20% slower than my custom conversion class. And itoa()
proved to be the best method for converting an integer to a string. The performance of some of these functions was disappointing, however. _gcvt()
was slower than using the string streams! I would be interested to know how _fcvt()
stacks up.
sprintf()
The function sprintf()
works just like printf()
, except that it saves the result into a string instead of printing it to the output buffer. sprintf()
, however, has no regard for the fact that strings are fixed length devices in memory and will overflow if you don't watch it. For this reason, I recommend the use of _snprintf()
which allows you to specify a maximum size. Just be sure to specify the size as one less than the actual allocated memory block, because _snprintf()
does not store the terminating null character if it has to cut the output short at count
.
void ToString(char * destStr, int count, double val)
{
_snprintf(destStr,count,"%f",val);
}
For the benchmarks, I used "%f
" and "%i
" as format strings, but it is worthwhile to note that sprintf()
gives you more formatting options than other techniques. You can round floating point values to the desired precision and have them padded with zeros, for example.
Rolling your own
The basic concept for converting a string to a numerical type has to do with the way the computer represents numerals in a string. In an ASCII string, all characters are represented by the numbers 0-127 (the positive range of a signed byte (char)). The string "123" is therefore really just 49,50,51. You could use a switch
statement to replace each numerical character with its equivalent numerical value, but there is a faster way. Since all the numerical characters are contiguous, you can obtain their numeric value by determining their distance from ASCII zero (48). For example:
char numeral = '7';
int num = numeral-'0';
Add some for loops and multiply by 10 as you go across and you get:
const char * str = "12345";
int num = 0;
for (; *str != '\0' && *str >= '0' && *str <='9'; str++)
num = (num * 10) + (*str - '0');
Additionally, you need to test for signed values, and in the case of converting to a double, you need to test for values after the decimal point and for exponents. A good conversion function should also be able to ignore non numerical characters at the beginning of a string (like whitespace) and trailing characters. It should also distinguish between a string that could not be converted and a string like "0.00". When converting to an unsigned integral type, you should also have a way of dealing with a negative number input. I chose to ignore sign characters if the target type is unsigned, but another valid solution would be to convert "-1" to 4294967295 (for an unsigned int).
You can get these functions in the demo project.
These functions are fast as they are, but there is still considerable room for improvement. They are faster if you already know the length of the string (e.g. it's stored in a string class).
A Note on STL Strings
You'll note that many of the better conversion methods require the creation of a temporary character array, performing the conversion, and then copying the result into the STL string. There's not much you can do about this. Personally, this was the straw that broke the camel's back. I've had it with STL strings that require ten lines of code and temporary buffers all over the place in order to do something simple. My next article will be about the creation of a better string class.
Appendix A: ASCII Table
Char Dec Oct Hex | Char Dec Oct Hex | Char Dec Oct Hex | Char Dec Oct Hex
-------------------------------------------------------------------------------------
(nul) 0 0000 0x00 | (sp) 32 0040 0x20 | @ 64 0100 0x40 | ` 96 0140 0x60
(soh) 1 0001 0x01 | ! 33 0041 0x21 | A 65 0101 0x41 | a 97 0141 0x61
(stx) 2 0002 0x02 | " 34 0042 0x22 | B 66 0102 0x42 | b 98 0142 0x62
(etx) 3 0003 0x03 | # 35 0043 0x23 | C 67 0103 0x43 | c 99 0143 0x63
(eot) 4 0004 0x04 | $ 36 0044 0x24 | D 68 0104 0x44 | d 100 0144 0x64
(enq) 5 0005 0x05 | % 37 0045 0x25 | E 69 0105 0x45 | e 101 0145 0x65
(ack) 6 0006 0x06 | & 38 0046 0x26 | F 70 0106 0x46 | f 102 0146 0x66
(bel) 7 0007 0x07 | ' 39 0047 0x27 | G 71 0107 0x47 | g 103 0147 0x67
(bs) 8 0010 0x08 | ( 40 0050 0x28 | H 72 0110 0x48 | h 104 0150 0x68
(ht) 9 0011 0x09 | ) 41 0051 0x29 | I 73 0111 0x49 | i 105 0151 0x69
(nl) 10 0012 0x0a | * 42 0052 0x2a | J 74 0112 0x4a | j 106 0152 0x6a
(vt) 11 0013 0x0b | + 43 0053 0x2b | K 75 0113 0x4b | k 107 0153 0x6b
(np) 12 0014 0x0c | , 44 0054 0x2c | L 76 0114 0x4c | l 108 0154 0x6c
(cr) 13 0015 0x0d | - 45 0055 0x2d | M 77 0115 0x4d | m 109 0155 0x6d
(so) 14 0016 0x0e | . 46 0056 0x2e | N 78 0116 0x4e | n 110 0156 0x6e
(si) 15 0017 0x0f | / 47 0057 0x2f | O 79 0117 0x4f | o 111 0157 0x6f
(dle) 16 0020 0x10 | 0 48 0060 0x30 | P 80 0120 0x50 | p 112 0160 0x70
(dc1) 17 0021 0x11 | 1 49 0061 0x31 | Q 81 0121 0x51 | q 113 0161 0x71
(dc2) 18 0022 0x12 | 2 50 0062 0x32 | R 82 0122 0x52 | r 114 0162 0x72
(dc3) 19 0023 0x13 | 3 51 0063 0x33 | S 83 0123 0x53 | s 115 0163 0x73
(dc4) 20 0024 0x14 | 4 52 0064 0x34 | T 84 0124 0x54 | t 116 0164 0x74
(nak) 21 0025 0x15 | 5 53 0065 0x35 | U 85 0125 0x55 | u 117 0165 0x75
(syn) 22 0026 0x16 | 6 54 0066 0x36 | V 86 0126 0x56 | v 118 0166 0x76
(etb) 23 0027 0x17 | 7 55 0067 0x37 | W 87 0127 0x57 | w 119 0167 0x77
(can) 24 0030 0x18 | 8 56 0070 0x38 | X 88 0130 0x58 | x 120 0170 0x78
(em) 25 0031 0x19 | 9 57 0071 0x39 | Y 89 0131 0x59 | y 121 0171 0x79
(sub) 26 0032 0x1a | : 58 0072 0x3a | Z 90 0132 0x5a | z 122 0172 0x7a
(esc) 27 0033 0x1b | ; 59 0073 0x3b | [ 91 0133 0x5b | { 123 0173 0x7b
(fs) 28 0034 0x1c | < 60 0074 0x3c | \ 92 0134 0x5c | | 124 0174 0x7c
(gs) 29 0035 0x1d | = 61 0075 0x3d | ] 93 0135 0x5d | } 125 0175 0x7d
(rs) 30 0036 0x1e | > 62 0076 0x3e | ^ 94 0136 0x5e | ~ 126 0176 0x7e
(us) 31 0037 0x1f | ? 63 0077 0x3f | _ 95 0137 0x5f | (del) 127 0177 0x7f
Credits
Thanks to George Anescu and his CPreciseTimer class. And special thanks to the (unknown) person who created the ASCII table above.