Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C#

StringBuilderPlus Improves Upon StringBuilder

4.85/5 (62 votes)
1 Jun 2011CPOL9 min read 150.3K   818  
StringBuilderPlus facilitates prefixing and suffixing strings and StringBuilderPluses in an efficient manner.
The StringBuilderPlus tester application.

Introduction

The above image shows that to concatenate about 40,000 strings in C# takes about 3 seconds. With the StringBuilderPlus, you can concatenate those same strings about a hundred times faster (or, 1/30th of a second). The experienced developer will note that StringBuilder can do this too, but there are a few weaknesses in StringBuilder that many people don't know about (skip to the "Problem" section to read more about that). StringBuilderPlus remedies those, as explained below, in the "Solution" section of the article.

Using the Code

For those of you who want to skip the details and dive into the code, here's pretty much everything you'd want to do with StringBuilderPlus:

C#
// This goes near the top of your code file.
// Also, you must first add the Zigrem.Text project to your solution, 
// then add a reference to that project from your project.
using Zigrem.Text;

// This can be tossed into any old method in your code file.
StringBuilderPlus plus = new StringBuilderPlus();
plus.Suffix(" world");
plus.Prefix("Hello");
StringBuilderPlus tabbed = new StringBuilderPlus();
tabbed.Suffix("Message:");
tabbed.Suffix("\r\n");
tabbed.Suffix("\t");
tabbed.Suffix(plus);
MessageBox.Show(tabbed.ToString());

That code will cause the following message to be displayed:

"Hello World" demonstration of StringBuilderPlus.

Background

Heads Up: If you already understand why StringBuilder speeds up string concatenations, skip this section. Otherwise, read on!

If you've ever dealt with a large number of string concatenations, you know that they can slow an application down to a crawl. It doesn't seem to make sense that combining a few kilobytes of strings can take several seconds.

The problem is that each time you do a string concatenation, the left string and the right string are copied to an entirely new string. So, even with a single string concatenation, you have twice the data you need loaded in memory – that of the original two strings and the concatenated string. If you build large strings out of a bunch of small strings, this can get out of hand quickly. The below diagram demonstrates this:

Concatenating numerous strings can create performance issues.

This diagram shows a few things. The blue squares are the strings you want to combine. They are:

  1. "The "
  2. "dog "
  3. "ate "
  4. "the "
  5. "cat "
  6. "all "
  7. "day "
  8. "for "
  9. "fun."

The green square is the result of combining each of the above 9 strings together, which turns out to be:

  • "The dog ate the cat all day for fun."

The gloomy looking gray sections are the strings that are unnecessarily created when you use the typical C# method of combining strings. A typical line of code that would do this would be:

C#
// Note that this is an oversimplification.
// The compiler will probably optimize this line.
// However, were the data not hard coded, this would not be the case.
// Just pretend this is a for loop that is combining data from a database.
string sentence = "The " + "dog " + "ate " + "the " + "cat " + "all " + "day " + 
	"for " + "fun.";

That innocent looking line of code actually takes up much more processing power and memory than it appears to. If strings were combined in the ideal way, you would expect that the sentence would be the only string created from this operation. However, since each string is combined to its neighbor in succession, it turns out that 7 other strings are also created (shown in gray in the diagram above). The total amount of unnecessary memory allocations created from this operation is equal to the following equation, where N is the number of strings you are combining:

N^2/2 - N/2 - 1

In big-O notation, this turns out to be the following number of allocations:

big-O(N^2)

So, to combine a mere N strings, you are actually taking up about N * N resources on your computer. Luckily, there is a way around this.

Garbage collection prevents memory from getting too out of hand (those strings created in the gray area are automatically released from memory), but there is still the problem of the amount of processing taken up by the numerous string copies. To solve this problem, the creators of the .NET Framework were kind enough to include StringBuilder. StringBuilder acts like a string, except you can append strings to it without a huge copy operation each time. This effectively reduces the amount of memory allocations required to be about the size of the string that results after all the append operations (see code below). However, the StringBuilder is not as perfect as it appears.

C#
// Ignoring compiler optimizations, this takes O(N^2) time.
string sentence = "The " + "dog " + "ate " + "the " + "cat " + "all " + 
	"day " + "for " + "fun.";

// This takes O(N) time.
StringBuilder sb = new StringBuilder();
sb.Append("The ");
sb.Append("dog ");
sb.Append("ate ");
sb.Append("the ");
sb.Append("cat ");
sb.Append("all ");
sb.Append("day ");
sb.Append("for ");
sb.Append("fun.");
sentence = sb.ToString();

Problem

There is one major problem with the StringBuilder – append operations are cheap, but prefix operations are not. A StringBuilder is a lot like a list; you can add to the end of it in O(1) time, but adding to the beginning of it takes O(N) time, where N is the number of existing elements. Also like a list, the only way to perform a prefix to a StringBuilder is to do an insertion at index 0. This makes the StringBuilder structure just as inefficient as strings when it comes to prefixing strings. You could take my word for it, or you could look at the below image, which shows that 40,000 StringBuilder inserts are just as inefficient as 40,000 string concatenations. To get around this limitation, I have created StringBuilderPlus.

string_builder_insert_test.png

Solution

StringBuilderPlus acts much like StringBuilder, except that it can do both prefixes and suffixes in O(1) time. An added benefit of StringBuilderPlus is that it is nestable; that is, one StringBuilderPlus can be added to another StringBuilderPlus. Just like affixing strings to a StringBuilderPlus, affixing a StringBuilderPlus to a StringBuilderPlus takes O(1) time.

Under the Hood

Internally, StringBuilderPlus is fairly simple. In fact, I wrote an optimized version, but chose to rewrite a simpler version for this article to demonstrate the basics more clearly. Perhaps surprisingly, the data structure that stores each component of the Builder+ is just a run of the mill linked list (shown below). When a string is prefixed, it is added to the beginning of the linked list. When a string is suffixed, it is added to the end of the linked list. However, nested StringBuilderPluses make the algorithm a little more complicated.

C#
// This is a simplified version of how StringBuilderPlus implements 
// string prefixing and suffixing.
public class StringBuilderPlus
{
	// The IString interface is shown in the next code snippet.
	private LinkedList<IString> strings;
	public Suffix(string str)
	{
		// PlainString implements IString.
		this.strings.AddLast(new PlainString(str));
	}
	public Prefix(string str)
	{
		this.strings.AddFirst(new PlainString(str));
	}
	public string Value
	{
		get
		{
			StringBuilder combined = new StringBuilder();
			foreach(IString str in this.strings)
			{
				str.AppendTo(combined);
			}
			return combined.ToString();
		}
	}
}

Before I describe how I efficiently implement Builder+ nesting, I will go into a little more detail about the linked list stored internally by each Builder+. The linked list stores elements of type IString (shown below), which is an interface I created to store multiple types of strings. Given this structure, the simplest way of affixing a Builder+ to another would be to inherit a StringBuilderPlus from IString (hypothetical class shown below). However, that creates some problems.

C#
// The IString interface.
private interface IString
{
	void AppendTo(StringBuilder builder);
	int Length
	{
		get;
	}
}

// This is NOT how StringBuilderPlus is defined, but is one way it could be.
public class StringBuilderPlus : IString
{
	// ... Fill in code for AppendTo and Length...
}

For one, using this strategy would cause a problem that modifying a nested Builder+ would also modify the containing Builder+. This would cause problems with caching a Builder+ to a string (i.e., a previously cached string would be invalidated if a nested builder+ changed). While that problem could be solved with some more sophisticated code, there is another problem to deal with. If a Builder+ was added to itself, this would create an infinite recursion when the Builder+ gets converted to a string. Again, this is a soluble problem. However, I wanted to keep things simple and intuitive, so I instead decided to create a StringsWrapper (shown below).

C#
// Wraps a collection of strings.
private class StringsWrapper : IString
{
	private LinkedList<IString> strings;
	
	public int Length
	{
		// Manages string length...
	}
	
	public StringsWrapper(LinkedList<IString> strings, int length)
	{
		this.strings = strings;
		// ... more code ...
	}
	
	public void AppendTo(StringBuilder builder)
	{
		// Appends the wrapped strings to the builder...
	}
}

A StringsWrapper is just a class that implements IString and that stores a linked list of IStrings in an immutable fashion. That is, a linked list is passed to the StringsWrapper on instantiation and that linked list can never be modified after that (at least not from the StringsWrapper). When one Builder+ is added to another, the nested Builder+ is wrapped with a StringsWrapper, the nested Builder+ is cleared, then the wrapped strings are added to both StringBuilderPluses. Since StringsWrapper is immutable, it can be passed around without fear of it modifying a different Builder+.

C#
// Notice that builder+ now has methods for affixing other builder+'s.
public class StringBuilderPlus
{
	public void Suffix(string str)
	{
		// Suffix string...
	}
	public void Suffix(StringBuilderPlus other)
	{
		// Wrap strings.
		StringsWrapper wrapper = 
			new StringsWrapper(other.Strings, other.Length);

		// "other" builder+ should use the wrapper instead of the strings.
		other.Strings = new LinkedList<istring>();
		other.Strings.AddLast(wrapper);

		// Add wrapped strings to this builder+.
		this.Strings.AddLast(wrapper);
	}
	public void Prefix(string str)
	{
		// Prefix string...
	}
	public void Prefix(StringBuilderPlus other)
	{
		// Prefix builder+ (basically does what "Suffix" does above)...
	}
}

I've really only mentioned one class so far, StringsWrapper, that implements IString. However, my implementation contains one other class that implements IString, PlainString (shown below). The function of this class is very simple; it is just a container for a string. I chose to create a sealed implementation of StringBuilderPlus, but other classes could inherit from IString in order to extend the functionality of StringBuilderPlus. An example would be a stream object that loads strings from, say, a file, network, or algorithm.

C#
private class PlainString : IString
{
	private string stringValue;
	
	public int Length
	{
		get
		{
			return this.stringValue.Length;
		}
	}
	
	public PlainString(string str)
	{
		this.stringValue = str;
	}
	
	public void AppendTo(StringBuilder builder)
	{
		builder.Append(this.stringValue);
	}
}

Before I get into optimization details and possible improvements, I'll summarize the basic structure of the StringBuilderPlus (refer to the below diagram as you read). A StringBuilderPlus contains a linked list of IStrings. It is this linked list that facilitates affixing strings to either end of a Builder+. PlainString and StringsWrapper implement IString. PlainString is just a wrapper for a string. StringsWrapper acts as an immutable wrapper for a linked list of IStrings. Its purpose is effectively to make a collection of strings immutable, so that that collection may be passed around as a string-like object. This immutability is what facilitates nesting StringBuilderPluses.

A simple UML diagram of the classes and interfaces used to create StringBuilderPlus.

Optimizations

I have added a few simple optimizations to StringBuilderPlus to enhance performance of simple operations. There are more optimizations to be added, but here are some of them that I have already added.

  • When nesting StringBuilderPluses, the linked list the nested Builder+ contains is wrapped. If the nested Builder+ is added to another Builder+ again, the wrapped linked list will not be wrapped again, as it is already immutable. This prevents an arbitrarily deep structure of wrappers.
  • When an empty string is affixed to a StringBuilderPlus that is not null, the affix operation is not performed. That way, if you add a thousand empty strings to a Builder+, you will not have to deal with them later on when you convert the Builder+ to a string.

Possible Improvements

While StringBuilderPlus improves upon the functionality of StringBuilder, it is certainly not the holy grail of string handling. There are a ton of other operations that might be added to make StringBuilderPlus more useful. I will leave that as an exercise for the reader. However, here are some ideas of how you might improve StringBuilderPlus:

  • Add the ability to insert strings into a Builder+ at an arbitrary index. Using a binary tree instead of a linked list would be prudent at this point. To prevent slowing down the affix operations, you'd probably want to keep track of the start node and the end node of the tree (to avoid turning an O(1) operation into an O(log(n)) operation).
  • Add string handling functions directly to StringBuilderPlus. An example would be substring and replace.
  • Support alternative types of strings, such as file streams. That way, you can store gigs and gigs of strings on your hard drive, but represent those strings with only a few bytes in RAM.
  • If you think of something else, let me know and I'll add it to this list!

History of Changes

This section lists changes I've made to the article.

  • 2011-05-31: Removed extra "for " in string contatenation example and removed excessive whitespace.
  • 2009-08-21: Changed tester form to more accurately reflect 40,000 string concatenations

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)