Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C#

DIY String Interning

4.95/5 (10 votes)
25 Oct 2018CPOL2 min read 8.8K  
Saving time and memory reading repetitive data

A Quick Performance Trick

Take a look at this peculiar thing:

C#
public class StringCache
{
	private readonly Dictionary<string, string> _cache;

	public StringCache()
	{
		_cache = new Dictionary<string, string>();
	}

	public virtual string Get(string value)
	{
		if (value == null) return null;

		string result;
		if (!_cache.TryGetValue(value, out result))
		{
			result = value;
			_cache[result] = result;
		}
		return result;
	}
}

As you can see, it has a Dictionary mapping one string to another with both key and value being identical, which seems fairly pointless. But it is useful, as when you call the Get method, it will either return a reference to what you supply if it's not seen it before, or a reference to a previous instance of the same thing if it has.

I created this as an experiment to use in the Data Access Layer of a project I was working on. ADO was calling a stored procedure which was returning a lot of rows, and often I would see the same string value appearing for the same field over and over again. Millions of times in some cases. You might think this is indicative of a less than perfectly normalised database, to which I would reply that it's a 'real world' example. :)

Anyway, I figured I could save some memory by using this - lots of references to the same string instance rather than lots of references to individual instances of the same string. At the start of the read loop, I'd create one of these, pass the strings through it and let it drop out of scope at the end of the loop.

It reduced the memory footprint of the service nicely, but I was expecting to take a hit in the overall read time, after all, each string needs a hashcode calculating and some dictionary 'stuff' doing additionally. In fact, it speeded the read up. Not really sure why, there's more to do but perhaps there was room for that based on the speed of the IO being the bottleneck. That could only make it the same speed, not faster, so I assume it's something to do with garbage collection.

Anyway who cares. I've retrofitted these little caches in lots of DAL functions which return endless records and the result is faster times and less memory consumption.

Simple, and a little bit beautiful.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)