The problem
1. Character buffers are inconvenient and bug prone
Whenever you want to handle a character string in C++ (or C), a few questions arise. A big one is about the length: how long can this string be?
It is easy to grow tired of using fixed size character buffers, like:
char buf[200];
This generates endless problems (and bugs) because no matter how big the buffer, the day when the string overflows always comes. On that day, if you coded right, your string is just truncated. In many cases it will still be a bug, even if your application does not crash. You might lose or corrupt data, which is often worse.
2. Dynamic strings are slower
The perfect solution to this maximum length problem is to use dynamically allocated strings. Any available C++ library proposes a class to handle dynamic strings. By using such a class, you get rid of the maximum length problem, and you get several advantages in the bargain: you get a nice object-oriented interface, much friendlier than the standard strxxx
functions; you get a safer implementation, etc.
But everything has a price: dynamic strings are slower. Of course, since they need to allocate memory in the heap, and free it later.
The result is a dilemma between speed and convenience. If you need things to be fast, you might be reluctant to use dynamic strings. Luckily, since string manipulation is usually not the most time-consuming task for general applications, the choice is most of the time easy.
The idea
In most cases, each character string that your application handles has a reasonable maximum length that will be respected in 90% of the time. But because of the 10 other percent (or less), you end up using dynamic strings, which are slower, but can handle quite any length...
Then why not adapt dynamic strings so that they accept a stack-allocated buffer at construction time, and then eventually grow, becoming heap-allocated only if needed?
This way you can have dynamic strings while sparing the first allocation on the heap in 90% of the cases.
A short code sample
Here is a very simple example. We allocate a string with enough room for 25 characters, then assign constant strings to it.
First, with a fixed length character buffer (note that the lines below do not work):
char name[25+1];
strcpy(name,"John Lennon");
strcpy(name,"Blondaux Georges Jacques Babylas");
With usual dynamic strings, it would become (MFC version):
CString name('\0',25);
name="John Lennon";
name="Blondaux Georges Jacques Babylas";
Or (STL version):
std::string name(25,'\0');
name="John Lennon";
name="Blondaux Georges Jacques Babylas";
And now using my own implementation of "stack-allocated dynamic strings", called tstr
:
tstrDecl(name,25);
name.set("John Lennon");
name.set("Blondaux Georges Jacques Babylas");
Note that we only spared the time of the first allocation. Except for that, things work in the same way (with the slight difference that std::string
and tstr
allocate more space than needed to favor speed against memory space).
Implementation details
The following line declares a stack-allocated dynamic string with an initial maximum length of 25:
tstrDecl(name,25);
Here is how it works in the implementation (tstr.h):
class tstr {
public:
tstr(char* buf,size_t tlenofbuf);
...
};
#define tstrDecl(str,len) char str##_tbuf[(len)+_tstr_overx];
tstr str(str##_tbuf,sizeof(str##_tbuf)/sizeof(char));
The macro declares an auxiliary character buffer (to reserve some space on the stack). Then, it declares the string object, passing the auxiliary buffer's address to its constructor. Of course you don't need to know about this to use the string.
For more details, please look in the provided source code.
Remarks / drawbacks
- Everything said here can apply to Unicode character strings. One can define strings to use
wchar_t
instead of char
, for example. The provided source code works with wchar_t
if you define _UNICODE
in stdafx.h.
- Everything said here can apply to C code as well (for those of you who are still "forced" to use it, or chose to...).
- Since the string is allocated on the stack (at least when created), you cannot return it to the caller. If you need to return a dynamic string to the caller, you can still create one by using the other constructors (i.e., do not use
tstrDecl
in this case).
- My
tstr
class does not count references (std::string
and CString
do). I think this can be added without problem.
- I know it is rather "uncool" to use preprocessor macros these days... If someone finds a nice way to do it without macros, I will be glad to update this article.
- I did not implement the assignment operator in
tstr
(to allow something like myTstr="abc"
). This is quite a question of taste; I find an explicit call to "set" less ambiguous than redefining "=". Feel free to add it if you prefer.
- For strings that are data members of a class (and not just local variables), the same idea can be applied by creating an auxiliary member to hold the character buffer. However, the code become less readable because it requires to use two different macros (one for the declaration of the class, one for the constructor). An example is included in the provided source code. If you create a large number of objects (whether on the stack or in the heap), this might still be an interesting optimization to spare one allocation in the heap per string data member.
Mini benchmark
The provided source code ("tstrSample") runs a mini benchmark to compare the speed of CString
, std::string
and tstr
. It was of course designed to show where tstr
performs better! For example, if one uses a test that often reallocates strings (increasing their length many times), then CString
performs the best, and this is not a real surprise since, I suppose, the guys there at Microsoft must have spent the necessary time to optimize CString
's speed. I did not optimize tstr
yet, and even if I do, I have little chances to do better than them :).
So the provided test just declares tstr
strings allocated on the stack, and reallocates them quite rarely (there is a setting in the test that you can change to see how it influences the benchmark's result; look for ReallocCondition
).
I won't even give benchmark results here, it is not very important. In cases that "look like" real life, I would say that tstr
does between 10 to 15 percent faster than CString
. Strangely, std::string
is often far behind, but I did not look further to find why.
The important thing, I believe, is that this feature could be implemented in existing libraries.
Philosophic question: can this article be useful?
I had a philosophy teacher who told us:
"Please, please, do not try to express your own ideas. All ideas have been expressed since a long time before, by people who thought a lot deeper, and wrote a lot clearer than you do."
Stack-allocated dynamic strings are so much useful to me now, and in the same time so "obvious", that I have a hard time believing no one else thought about it before. I still cannot believe it, but my searches on internet were unsuccessful. I asked a few friends too, with no result. So even if it is just to help spreading the idea, I suppose that this article can be useful.
In brief, if you know who discovered that thing, and published it somewhere before me, please send me a link, I will update this article as it should be.