Introduction
Strings are a sequence of characters. There are different types of characters, but that is a topic for a different article (for a better understanding of character types, go here). I�m not covering the whole structure of the String
type here, but just highlighting some of the �special� features!
Immutable and Interned
In .NET, strings are immutable. This means that, once a value is assigned to a String
object, it can never be changed. That�s right � you can�t change a String
�s value! Take a look at this code:
1 class Test
2 {
3 public static void Main()
4 {
5 string myString = "1234";
6 System.Console.WriteLine(myString);
7 myString += "5678";
8 System.Console.WriteLine(myString);
9 }
10 }
The output from this is:
1234
12345678
Though it seems as if we just changed the value of myString
from �1234� to �12345678�, we really didn�t! Let�s step through the above code. In line 5, a new String
object is allocated on the heap with a value of �1234�, and myString
points to its memory address. In line 7, a new string is once again allocated on the heap, with value �12345678�, and myString
now points to this new memory location. So you actually sit with two string objects on the heap, even though you�re only referencing one of them. The �1234� string is still interned, and if unused, it will be garbage collected with the next GC cycle.
If you now create any number of string objects, all with a value of �1234�, they would all point to the one interned instance. This ensures that strings use memory very efficiently.
When instantiating a string object with the value of �1234�, your string could thus be pointing to the same location as other already existing strings are. Now, imagine the chaos you could cause by changing the content of your string � you�d change the content of *ALL* other strings pointing to that location! This is the reason for strings� immutability.
Performance
The performance gain by interning strings is in regard of memory optimization, and is quite obvious. When you have one thousand strings with the same value, you�d use only the memory space needed for one instance � the strings would all point to the same memory address. However, consider the following scenario:
1 class Test
2 {
3 public static void Main()
4 {
5 string myString = "1";
6 string myString += "2";
7 string myString += "3";
8 string myString += "4";
9 System.Console.WriteLine(myString);
10 }
11 }
We needed one string here with the value of �1234�, but in actual fact, we now have four strings on the heap! (�1�, �12�, �123� and �1234� - our variable myString
points to the last one). This seems like a bad situation, where the string�s behavior actually decreases performance!
For this reason, there is the StringBuilder
object. By using StringBuilder
, we can enhance performance of string concatenation in our previous listing as follows:
1 class Test
2 {
3 public static void Main()
4 {
5 StringBuilder mySB = new StringBuilder(4);
6 mySB.Append("1");
7 mySB.Append("2");
8 mySB.Append("3");
9 mySB.Append("4");
10 System.Console.WriteLine(mySB.ToString());
11 }
12 }
The StringBuilder
constructor accepts a parameter to specify the initial buffer size. In our case, we chose 4, since we know that this would be the length of our string. This will create a contiguous memory block of the specified size, where you can chop and change your string to your heart�s content. If you would append anything to your string that would overrun the buffer size, the StringBuilder
�s memory buffer will automagically increase. Note that it is better to choose a buffer size that is slightly too big, than to have your StringBuilder
�s buffer grow often.
Appending to a StringBuilder
outperforms string concatenation by far, since there is much less overhead in terms of allocating new objects and collecting the old ones.
References to interned strings
Look at the following code:
1 class Test
2 {
3 public static void Main()
4 {
5 string firstString = "1234";
6 StringBuilder sb = new StringBuilder(4);
7 sb.Append("1234");
8 string secondString = sb.ToString();
9 string thirdString = String.Intern(sb.ToString());
10 System.Console.WriteLine((Object)secondString == (Object)firstString);
11 System.Console.WriteLine((Object)thirdString == (Object)firstString);
12 }
13 }
The output of this would be:
False
True
In line 6, the StringBuilder
is allocated space on the heap � separate from the space of firstString
. This makes perfect sense, since at this time the CLR does not know yet that the value of sb
will be the same as that for firstString
. So in line 8, instead of pointing to the interned firstString
, secondString
will point to the location of the StringBuilder
. If you wish to make use of an interned string (if it exists), do it as in line 9.
The Intern
method returns a reference to the interned string if it exists. If it does not exist, it will create an interned string with the value specified, and return a reference to this new interned string.
Final words
It isn�t necessary to use a StringBuilder
for every concatenation. When you just append two (or a relatively low number) strings together, just concatenate them. If you have to concatenate in a loop with many iterations, use a StringBuilder
. In my opinion, for readability, concatenate your strings, but if performance suffers noticeably, consider using a StringBuilder
.
Once again, this article is published on my blog, and can be discussed over there.