(untagged)

Strings in .NET

Ernst Kuschke

0.00/5 (No votes)

25 Apr 2004

Strings in .NET are special - this article shows why.

Introduction

Strings are a sequence of characters. There are different types of characters, but that is a topic for a different article (for a better understanding of character types, go here). I�m not covering the whole structure of the String type here, but just highlighting some of the �special� features!

Immutable and Interned

In .NET, strings are immutable. This means that, once a value is assigned to a String object, it can never be changed. That�s right � you can�t change a String�s value! Take a look at this code:

1  class Test
2  {
3      public static void Main()
4      {
5          string myString = "1234";
6          System.Console.WriteLine(myString);
7          myString += "5678";
8          System.Console.WriteLine(myString);
9      }
10 }

The output from this is:

1234
12345678

Though it seems as if we just changed the value of myString from �1234� to �12345678�, we really didn�t! Let�s step through the above code. In line 5, a new String object is allocated on the heap with a value of �1234�, and myString points to its memory address. In line 7, a new string is once again allocated on the heap, with value �12345678�, and myString now points to this new memory location. So you actually sit with two string objects on the heap, even though you�re only referencing one of them. The �1234� string is still interned, and if unused, it will be garbage collected with the next GC cycle.

If you now create any number of string objects, all with a value of �1234�, they would all point to the one interned instance. This ensures that strings use memory very efficiently.

When instantiating a string object with the value of �1234�, your string could thus be pointing to the same location as other already existing strings are. Now, imagine the chaos you could cause by changing the content of your string � you�d change the content of *ALL* other strings pointing to that location! This is the reason for strings� immutability.

Performance

The performance gain by interning strings is in regard of memory optimization, and is quite obvious. When you have one thousand strings with the same value, you�d use only the memory space needed for one instance � the strings would all point to the same memory address. However, consider the following scenario:

1  class Test
2  {
3      public static void Main()
4      {
5          string myString = "1";
6          string myString += "2";
7          string myString += "3";
8          string myString += "4";
9          System.Console.WriteLine(myString);
10     }
11  }

We needed one string here with the value of �1234�, but in actual fact, we now have four strings on the heap! (�1�, �12�, �123� and �1234� - our variable myString points to the last one). This seems like a bad situation, where the string�s behavior actually decreases performance!

For this reason, there is the StringBuilder object. By using StringBuilder, we can enhance performance of string concatenation in our previous listing as follows:

1  class Test
2  {
3      public static void Main()
4      {
5          StringBuilder mySB = new StringBuilder(4);
6          mySB.Append("1");
7          mySB.Append("2");
8          mySB.Append("3");
9          mySB.Append("4");
10         System.Console.WriteLine(mySB.ToString());
11     }
12  }

The StringBuilder constructor accepts a parameter to specify the initial buffer size. In our case, we chose 4, since we know that this would be the length of our string. This will create a contiguous memory block of the specified size, where you can chop and change your string to your heart�s content. If you would append anything to your string that would overrun the buffer size, the StringBuilder�s memory buffer will automagically increase. Note that it is better to choose a buffer size that is slightly too big, than to have your StringBuilder�s buffer grow often.

Appending to a StringBuilder outperforms string concatenation by far, since there is much less overhead in terms of allocating new objects and collecting the old ones.

References to interned strings

Look at the following code:

1  class Test
2  {
3      public static void Main()
4      {
5          string firstString = "1234";
6          StringBuilder sb = new StringBuilder(4);
7          sb.Append("1234");
8          string secondString = sb.ToString();
9          string thirdString = String.Intern(sb.ToString());
10         System.Console.WriteLine((Object)secondString == (Object)firstString);
11         System.Console.WriteLine((Object)thirdString == (Object)firstString);
12     }
13 }

The output of this would be:

False
True

In line 6, the StringBuilder is allocated space on the heap � separate from the space of firstString. This makes perfect sense, since at this time the CLR does not know yet that the value of sb will be the same as that for firstString. So in line 8, instead of pointing to the interned firstString, secondString will point to the location of the StringBuilder. If you wish to make use of an interned string (if it exists), do it as in line 9.

The Intern method returns a reference to the interned string if it exists. If it does not exist, it will create an interned string with the value specified, and return a reference to this new interned string.

Final words

It isn�t necessary to use a StringBuilder for every concatenation. When you just append two (or a relatively low number) strings together, just concatenate them. If you have to concatenate in a loop with many iterations, use a StringBuilder. In my opinion, for readability, concatenate your strings, but if performance suffers noticeably, consider using a StringBuilder.

Once again, this article is published on my blog, and can be discussed over there.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here