Introduction
The above image shows that to concatenate about 40,000 strings in C# takes about 3 seconds. With the StringBuilderPlus
, you can concatenate those same strings about a hundred times faster (or, 1/30th of a second). The experienced developer will note that StringBuilder
can do this too, but there are a few weaknesses in StringBuilder
that many people don't know about (skip to the "Problem" section to read more about that). StringBuilderPlus
remedies those, as explained below, in the "Solution" section of the article.
Using the Code
For those of you who want to skip the details and dive into the code, here's pretty much everything you'd want to do with StringBuilderPlus
:
using Zigrem.Text;
StringBuilderPlus plus = new StringBuilderPlus();
plus.Suffix(" world");
plus.Prefix("Hello");
StringBuilderPlus tabbed = new StringBuilderPlus();
tabbed.Suffix("Message:");
tabbed.Suffix("\r\n");
tabbed.Suffix("\t");
tabbed.Suffix(plus);
MessageBox.Show(tabbed.ToString());
That code will cause the following message to be displayed:
Background
Heads Up: If you already understand why StringBuilder
speeds up string concatenations, skip this section. Otherwise, read on!
If you've ever dealt with a large number of string concatenations, you know that they can slow an application down to a crawl. It doesn't seem to make sense that combining a few kilobytes of strings can take several seconds.
The problem is that each time you do a string concatenation, the left string and the right string are copied to an entirely new string. So, even with a single string concatenation, you have twice the data you need loaded in memory – that of the original two strings and the concatenated string. If you build large strings out of a bunch of small strings, this can get out of hand quickly. The below diagram demonstrates this:
This diagram shows a few things. The blue squares are the strings you want to combine. They are:
- "The "
- "dog "
- "ate "
- "the "
- "cat "
- "all "
- "day "
- "for "
- "fun."
The green square is the result of combining each of the above 9 strings together, which turns out to be:
- "The dog ate the cat all day for fun."
The gloomy looking gray sections are the strings that are unnecessarily created when you use the typical C# method of combining strings. A typical line of code that would do this would be:
string sentence = "The " + "dog " + "ate " + "the " + "cat " + "all " + "day " +
"for " + "fun.";
That innocent looking line of code actually takes up much more processing power and memory than it appears to. If strings were combined in the ideal way, you would expect that the sentence would be the only string created from this operation. However, since each string is combined to its neighbor in succession, it turns out that 7 other strings are also created (shown in gray in the diagram above). The total amount of unnecessary memory allocations created from this operation is equal to the following equation, where N is the number of strings you are combining:
In big-O notation, this turns out to be the following number of allocations:
So, to combine a mere N strings, you are actually taking up about N * N resources on your computer. Luckily, there is a way around this.
Garbage collection prevents memory from getting too out of hand (those strings created in the gray area are automatically released from memory), but there is still the problem of the amount of processing taken up by the numerous string copies. To solve this problem, the creators of the .NET Framework were kind enough to include StringBuilder
. StringBuilder
acts like a string, except you can append strings to it without a huge copy operation each time. This effectively reduces the amount of memory allocations required to be about the size of the string that results after all the append operations (see code below). However, the StringBuilder
is not as perfect as it appears.
string sentence = "The " + "dog " + "ate " + "the " + "cat " + "all " +
"day " + "for " + "fun.";
StringBuilder sb = new StringBuilder();
sb.Append("The ");
sb.Append("dog ");
sb.Append("ate ");
sb.Append("the ");
sb.Append("cat ");
sb.Append("all ");
sb.Append("day ");
sb.Append("for ");
sb.Append("fun.");
sentence = sb.ToString();
Problem
There is one major problem with the StringBuilder
– append operations are cheap, but prefix operations are not. A StringBuilder
is a lot like a list; you can add to the end of it in O(1) time, but adding to the beginning of it takes O(N) time, where N is the number of existing elements. Also like a list, the only way to perform a prefix to a StringBuilder
is to do an insertion at index 0. This makes the StringBuilder
structure just as inefficient as strings when it comes to prefixing strings. You could take my word for it, or you could look at the below image, which shows that 40,000 StringBuilder
inserts are just as inefficient as 40,000 string concatenations. To get around this limitation, I have created StringBuilderPlus
.
Solution
StringBuilderPlus
acts much like StringBuilder
, except that it can do both prefixes and suffixes in O(1) time. An added benefit of StringBuilderPlus
is that it is nestable; that is, one StringBuilderPlus
can be added to another StringBuilderPlus
. Just like affixing strings to a StringBuilderPlus
, affixing a StringBuilderPlus
to a StringBuilderPlus
takes O(1) time.
Under the Hood
Internally, StringBuilderPlus
is fairly simple. In fact, I wrote an optimized version, but chose to rewrite a simpler version for this article to demonstrate the basics more clearly. Perhaps surprisingly, the data structure that stores each component of the Builder+ is just a run of the mill linked list (shown below). When a string is prefixed, it is added to the beginning of the linked list. When a string is suffixed, it is added to the end of the linked list. However, nested StringBuilderPlus
es make the algorithm a little more complicated.
public class StringBuilderPlus
{
private LinkedList<IString> strings;
public Suffix(string str)
{
this.strings.AddLast(new PlainString(str));
}
public Prefix(string str)
{
this.strings.AddFirst(new PlainString(str));
}
public string Value
{
get
{
StringBuilder combined = new StringBuilder();
foreach(IString str in this.strings)
{
str.AppendTo(combined);
}
return combined.ToString();
}
}
}
Before I describe how I efficiently implement Builder+ nesting, I will go into a little more detail about the linked list stored internally by each Builder+. The linked list stores elements of type IString
(shown below), which is an interface I created to store multiple types of strings. Given this structure, the simplest way of affixing a Builder+ to another would be to inherit a StringBuilderPlus
from IString
(hypothetical class shown below). However, that creates some problems.
private interface IString
{
void AppendTo(StringBuilder builder);
int Length
{
get;
}
}
public class StringBuilderPlus : IString
{
}
For one, using this strategy would cause a problem that modifying a nested Builder+ would also modify the containing Builder+. This would cause problems with caching a Builder+ to a string (i.e., a previously cached string would be invalidated if a nested builder+ changed). While that problem could be solved with some more sophisticated code, there is another problem to deal with. If a Builder+ was added to itself, this would create an infinite recursion when the Builder+ gets converted to a string. Again, this is a soluble problem. However, I wanted to keep things simple and intuitive, so I instead decided to create a StringsWrapper
(shown below).
private class StringsWrapper : IString
{
private LinkedList<IString> strings;
public int Length
{
}
public StringsWrapper(LinkedList<IString> strings, int length)
{
this.strings = strings;
}
public void AppendTo(StringBuilder builder)
{
}
}
A StringsWrapper
is just a class that implements IString
and that stores a linked list of IString
s in an immutable fashion. That is, a linked list is passed to the StringsWrapper
on instantiation and that linked list can never be modified after that (at least not from the StringsWrapper
). When one Builder+ is added to another, the nested Builder+ is wrapped with a StringsWrapper
, the nested Builder+ is cleared, then the wrapped strings are added to both StringBuilderPlus
es. Since StringsWrapper
is immutable, it can be passed around without fear of it modifying a different Builder+.
public class StringBuilderPlus
{
public void Suffix(string str)
{
}
public void Suffix(StringBuilderPlus other)
{
StringsWrapper wrapper =
new StringsWrapper(other.Strings, other.Length);
other.Strings = new LinkedList<istring>();
other.Strings.AddLast(wrapper);
this.Strings.AddLast(wrapper);
}
public void Prefix(string str)
{
}
public void Prefix(StringBuilderPlus other)
{
}
}
I've really only mentioned one class so far, StringsWrapper
, that implements IString
. However, my implementation contains one other class that implements IString
, PlainString
(shown below). The function of this class is very simple; it is just a container for a string. I chose to create a sealed implementation of StringBuilderPlus
, but other classes could inherit from IString
in order to extend the functionality of StringBuilderPlus
. An example would be a stream object that loads strings from, say, a file, network, or algorithm.
private class PlainString : IString
{
private string stringValue;
public int Length
{
get
{
return this.stringValue.Length;
}
}
public PlainString(string str)
{
this.stringValue = str;
}
public void AppendTo(StringBuilder builder)
{
builder.Append(this.stringValue);
}
}
Before I get into optimization details and possible improvements, I'll summarize the basic structure of the StringBuilderPlus
(refer to the below diagram as you read). A StringBuilderPlus
contains a linked list of IString
s. It is this linked list that facilitates affixing strings to either end of a Builder+. PlainString
and StringsWrapper
implement IString
. PlainString
is just a wrapper for a string. StringsWrapper
acts as an immutable wrapper for a linked list of IString
s. Its purpose is effectively to make a collection of strings immutable, so that that collection may be passed around as a string-like object. This immutability is what facilitates nesting StringBuilderPluses
.
Optimizations
I have added a few simple optimizations to StringBuilderPlus
to enhance performance of simple operations. There are more optimizations to be added, but here are some of them that I have already added.
- When nesting
StringBuilderPluses
, the linked list the nested Builder+ contains is wrapped. If the nested Builder+ is added to another Builder+ again, the wrapped linked list will not be wrapped again, as it is already immutable. This prevents an arbitrarily deep structure of wrappers. - When an empty string is affixed to a
StringBuilderPlus
that is not null
, the affix operation is not performed. That way, if you add a thousand empty strings to a Builder+, you will not have to deal with them later on when you convert the Builder+ to a string.
Possible Improvements
While StringBuilderPlus
improves upon the functionality of StringBuilder
, it is certainly not the holy grail of string handling. There are a ton of other operations that might be added to make StringBuilderPlus
more useful. I will leave that as an exercise for the reader. However, here are some ideas of how you might improve StringBuilderPlus
:
- Add the ability to insert strings into a Builder+ at an arbitrary index. Using a binary tree instead of a linked list would be prudent at this point. To prevent slowing down the affix operations, you'd probably want to keep track of the start node and the end node of the tree (to avoid turning an O(1) operation into an O(log(n)) operation).
- Add string handling functions directly to
StringBuilderPlus
. An example would be substring and replace. - Support alternative types of strings, such as file streams. That way, you can store gigs and gigs of strings on your hard drive, but represent those strings with only a few bytes in RAM.
- If you think of something else, let me know and I'll add it to this list!
History of Changes
This section lists changes I've made to the article.
- 2011-05-31: Removed extra "
for
" in string contatenation example and removed excessive whitespace. - 2009-08-21: Changed tester form to more accurately reflect 40,000 string concatenations