Introduction
How you handle strings in your code can have surprising effects on performance speed. In this article, I shall look at two of the common issues that using strings can produce: use of temporary string variables and string concatenation.
Background
There comes a time in every project when you have to start looking at coding standards. Using FxCop is a good place to start. My favourite set of FxCop rules is the 'Performance' set.
So there I was, checking my project against FxCop and seeing lots of issues with strings. I must admit something: I have always had problems with C#'s immutable strings. When I see myString.ToUpper()
, I always forget that it won't change the contents of myString
but will return a new string entirely (this is because strings are immutable in C#).
I proceeded to fix my code to remove FxCop's warnings and then I noticed something - my code was faster. I decided to investigate and ended up writing the test code that I present here.
Using the code
The test code is very simple. A console application calls four test methods. Each method performs a string processing routine 1000 times (so the time to execute is nice and long to look at performance differences).
The four test methods are split into two groups of two. The first group compares case-insensitive string comparison.
String Comparison and Temporary String Creation
The first test routine is a bad case-insensitive string comparison. The routine for the comparison is:
static bool BadCompare(string stringA, string stringB)
{
return (stringA.ToUpper() == stringB.ToUpper());
}
For this code, FxCop shows the following advice:
"StringCompareTest.BadCompare(String, String):Boolean calls
String.op_Equality(String, String):Boolean after converting 'stack1', a local,
to upper or lowercase. If possible, eliminate the string creation and call the
overload of String.Compare that performs a case-insensitive comparison."
What this means is that each call to ToUpper()
is creating a temporary string which has to be created and managed by the garbage collector. This takes extra time and uses more memory. The String.Compare
method is more efficient.
The second test routine uses String.Compare
:
static bool GoodCompare(string stringA, string stringB)
{
return (string.Compare(stringA, stringB, true,
System.Globalization.CultureInfo.CurrentCulture) == 0);
}
This method prevents the creation of unnecessary temporary strings.
According to nprof, the Good Comparison takes 1.69% of the total execution time of the code, while the Bad Comparison takes 5.50% of the total execution time.
So the String.Compare
method is over three times as fast as the ToUpper
method. If you have code that is performing a lot of string comparisons (especially in a loop) then using String.Compare
can make a big difference.
String Concatenation inside a loop
The final pair of test routines consider string concatenation within a loop.
The 'bad' test routine is as follows:
static string BadConcatenate(string[] items)
{
string strRet = string.Empty;
foreach(string item in items)
{
strRet += item;
}
return strRet;
}
When FxCop sees this code, it is so outraged that it even marks the broken rule in red! FxCop says the following:
"Change StringCompareTest.BadConcatenate(String[]):String to use StringBuilder
instead of String.Concat or +="
The 'good' test routine was written as follows:
static string GoodConcatenate(string[] items)
{
System.Text.StringBuilder builder = new System.Text.StringBuilder();
foreach(string item in items)
{
builder.Append(item);
}
return builder.ToString();
}
This is an almost archetypal example given for the use of the System.Text.StringBuilder
class. The issue with the bad example is the creation of more temporary strings. Because strings are immutable, the concatenation operator (+=
) actually creates a new string out of the two originals and then points the original string instance at the new string.
However, when we look at performance, according to nprof, the we find that the 'Bad' concatenation takes 5.67% of the total execution time, while the 'Good' concatenation takes 22.09%. I'll run that by you again:
Using StringBuilder
took almost four times longer than simple string concatenation!
Why?
The answer is partly in the design of the test; the concatenation routines only concatenate ten short strings. The StringBuilder
class is a more complex class than a simple immutable string, so creating one StringBuilder
is more expensive in performance than doing ten simple string concatenations.
I repeated the test with differing numbers of string concatenations, and found the following results:
Note: The values shown here are the % of the total execution time taken by the test routines. The 'Good Concatenation' test is not actually getting faster, but takes less relative time than the 'Bad Concatenation' routine.
So, it would seem that the StringBuilder
class is only really faster if you are concatenating more than about 600 strings.
Of course, the other reason for the use of the StringBuilder
class is memory allocation. Using the CLRProfiler produced the following memory use timeline for concatenation of 100 simple strings:
The section marked 'A' shows the effect of the bad string concatenation routine on memory allocation and de-allocation. The maximum allocated memory is increasing rapidly and there is a high number of garbage collections occurring (roughly 215 collections for this section).
The section immediately following the 'A' section shows the memory profile for the good string concatenation routine. The maximum allocated memory is increasing less rapidly and there are far fewer garbage collections being made (roughly 60 collections for this section).
So using the StringBuilder
class may not be faster in some cases, but it is kinder to the garbage collector.
Conclusions
Use the String.Compare
method for case-insensitive string comparison. It's just faster. Nice and simple.
Use the StringBuilder
class for speed increases only if you are concatenating more than about 600 strings within a loop. The caveat here is that the length of the strings you are manipulating may also affect the speed tradeoff, as may the effects on the Garbage Collector so you should really perform your own tests for your specific code.
Points of Interest
I was surprised at what a difference using the correct string manipulation methods made to code in the real world (although we do perform a lot of string comparisons and concatenations in my current project).
FxCop's performance rules are a good starting point for finding potentially slow code which can direct you to some easy fixes to improve code performance. Both of the issues discussed here are marked by FxCop as 'non-breaking' which means that the changes should not break any code depending on the code changed. This should be a no-brainer: a non-breaking change for performance improvements should always be made.
History
- April 2005 - First draft of the article.