(untagged)

Allocate your dynamic strings on the stack

Olivier Lombart

0.00/5 (No votes)

5 Jun 2005

An article about stack-allocated dynamic strings in C++.

Download demo project - 10.5 Kb

The problem

1. Character buffers are inconvenient and bug prone

Whenever you want to handle a character string in C++ (or C), a few questions arise. A big one is about the length: how long can this string be?

It is easy to grow tired of using fixed size character buffers, like:

char buf[200];

This generates endless problems (and bugs) because no matter how big the buffer, the day when the string overflows always comes. On that day, if you coded right, your string is just truncated. In many cases it will still be a bug, even if your application does not crash. You might lose or corrupt data, which is often worse.

2. Dynamic strings are slower

The perfect solution to this maximum length problem is to use dynamically allocated strings. Any available C++ library proposes a class to handle dynamic strings. By using such a class, you get rid of the maximum length problem, and you get several advantages in the bargain: you get a nice object-oriented interface, much friendlier than the standard strxxx functions; you get a safer implementation, etc.

But everything has a price: dynamic strings are slower. Of course, since they need to allocate memory in the heap, and free it later.

The result is a dilemma between speed and convenience. If you need things to be fast, you might be reluctant to use dynamic strings. Luckily, since string manipulation is usually not the most time-consuming task for general applications, the choice is most of the time easy.

The idea

In most cases, each character string that your application handles has a reasonable maximum length that will be respected in 90% of the time. But because of the 10 other percent (or less), you end up using dynamic strings, which are slower, but can handle quite any length...

Then why not adapt dynamic strings so that they accept a stack-allocated buffer at construction time, and then eventually grow, becoming heap-allocated only if needed?

This way you can have dynamic strings while sparing the first allocation on the heap in 90% of the cases.

A short code sample

Here is a very simple example. We allocate a string with enough room for 25 characters, then assign constant strings to it.

First, with a fixed length character buffer (note that the lines below do not work):

char name[25+1]; //+1 for final zero

strcpy(name,"John Lennon"); //no problem

strcpy(name,"Blondaux Georges Jacques Babylas"); //*** bug here; string overflowed

With usual dynamic strings, it would become (MFC version):

CString name('\0',25); //string's memory is allocated in the heap (time consuming)

name="John Lennon"; //no reallocation (fast)

name="Blondaux Georges Jacques Babylas"; //reallocated (time consuming)

Or (STL version):

//31 character size allocated in the heap (time consuming)

std::string name(25,'\0');
//no reallocation (fast)

name="John Lennon";
//reallocated (time consuming)

name="Blondaux Georges Jacques Babylas";

And now using my own implementation of "stack-allocated dynamic strings", called tstr:

//declares a stack-allocated dynamic string (fast)

tstrDecl(name,25);
//no reallocation, still on the stack (fast)

name.set("John Lennon");
//reallocation; now in the heap (time consuming)

name.set("Blondaux Georges Jacques Babylas");

Note that we only spared the time of the first allocation. Except for that, things work in the same way (with the slight difference that std::string and tstr allocate more space than needed to favor speed against memory space).

Implementation details

The following line declares a stack-allocated dynamic string with an initial maximum length of 25:

tstrDecl(name,25);

Here is how it works in the implementation (tstr.h):

class tstr {
public:
    //constructor; uses given buffer as non-dynamic buffer

    tstr(char* buf,size_t tlenofbuf);

    ...
};

// declaration of a local tstr that uses a stack allocated

// buffer (fast first allocation, dynamic increase possible)

#define tstrDecl(str,len) char str##_tbuf[(len)+_tstr_overx]; 
        tstr str(str##_tbuf,sizeof(str##_tbuf)/sizeof(char));

The macro declares an auxiliary character buffer (to reserve some space on the stack). Then, it declares the string object, passing the auxiliary buffer's address to its constructor. Of course you don't need to know about this to use the string.

For more details, please look in the provided source code.

Remarks / drawbacks

Everything said here can apply to Unicode character strings. One can define strings to use wchar_t instead of char, for example. The provided source code works with wchar_t if you define _UNICODE in stdafx.h.
Everything said here can apply to C code as well (for those of you who are still "forced" to use it, or chose to...).
Since the string is allocated on the stack (at least when created), you cannot return it to the caller. If you need to return a dynamic string to the caller, you can still create one by using the other constructors (i.e., do not use tstrDecl in this case).
My tstr class does not count references (std::string and CString do). I think this can be added without problem.
I know it is rather "uncool" to use preprocessor macros these days... If someone finds a nice way to do it without macros, I will be glad to update this article.
I did not implement the assignment operator in tstr (to allow something like myTstr="abc"). This is quite a question of taste; I find an explicit call to "set" less ambiguous than redefining "=". Feel free to add it if you prefer.
For strings that are data members of a class (and not just local variables), the same idea can be applied by creating an auxiliary member to hold the character buffer. However, the code become less readable because it requires to use two different macros (one for the declaration of the class, one for the constructor). An example is included in the provided source code. If you create a large number of objects (whether on the stack or in the heap), this might still be an interesting optimization to spare one allocation in the heap per string data member.

Mini benchmark

The provided source code ("tstrSample") runs a mini benchmark to compare the speed of CString, std::string and tstr. It was of course designed to show where tstr performs better! For example, if one uses a test that often reallocates strings (increasing their length many times), then CString performs the best, and this is not a real surprise since, I suppose, the guys there at Microsoft must have spent the necessary time to optimize CString's speed. I did not optimize tstr yet, and even if I do, I have little chances to do better than them :).

So the provided test just declares tstr strings allocated on the stack, and reallocates them quite rarely (there is a setting in the test that you can change to see how it influences the benchmark's result; look for ReallocCondition).

I won't even give benchmark results here, it is not very important. In cases that "look like" real life, I would say that tstr does between 10 to 15 percent faster than CString. Strangely, std::string is often far behind, but I did not look further to find why.

The important thing, I believe, is that this feature could be implemented in existing libraries.

Philosophic question: can this article be useful?

I had a philosophy teacher who told us:

"Please, please, do not try to express your own ideas. All ideas have been expressed since a long time before, by people who thought a lot deeper, and wrote a lot clearer than you do."

Stack-allocated dynamic strings are so much useful to me now, and in the same time so "obvious", that I have a hard time believing no one else thought about it before. I still cannot believe it, but my searches on internet were unsuccessful. I asked a few friends too, with no result. So even if it is just to help spreading the idea, I suppose that this article can be useful.

In brief, if you know who discovered that thing, and published it somewhere before me, please send me a link, I will update this article as it should be.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here