Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / All-Topics

“If All You Have is a Hammer…

4.58/5 (11 votes)
22 May 2013CPOL3 min read 10.2K  
If all you have is a hammer...

…everything looks like a nail” says the proverb.

I’ll illustrate it with a little story about a common mistake I see sometimes.

(Code samples are in C#, but use only basic features so C++ and Java developers should not be too disoriented.)

Static Arrays

When you started programming, chances are the first collection you used was a fixed-length array so you’ve become accustomed, not to say addicted, to its brackets [] indexing. And for good reason: this is a very simple and intuitive interface for accessing the elements of a collection, a good hammer.

So when you wanted to compute the sum of all the elements of a nail, uh I mean an int[] array, you used code like this:

C#
long sum = 0;
for (int i = 0; i < array.Length; ++i)
{
    sum += array[i];
}

Simple and efficient!

Dynamic Arrays

Later, you were faced with situations where you couldn't know in advance the final length of your collection and you discovered dynamic arrays, e.g. List<T>. You naturally copy-pasted your old code, and quickly discovered that the only thing you needed to change was replacing the “Length” property by the “Count” property:

C#
long sum = 0;
for (int i = 0; i < list.Count; ++i)
{
    sum += list[i];
}

And all was working fine, no issue on the horizon even with big set of data. Seems like the list is a nail too after all, so far so good…

Associative Arrays

Later again, you’ve found that sometimes indexing with continuous integers is not enough and you needed to associate a richer identifier to each element of your collection. Google revealed to you that what you needed is named an associative array, map or dictionary and that C# offers some implementations like Dictionary<TKey, TValue>.

But this time, this is more involved: you can still use the Count property but you’ve lost the sexy bracket indexing :( ! This is quite frustrating, you have a hammer, a really cool and powerful hammer that has worked for years with tons of situations, but now you have to deal with something that at first sight doesn’t really look like a nail. But by looking again at the Dictionary type, you see that thanks to Linq and its ElementAt() extension method, it could ressemble a nail, no it’s become a nail, no no it’s a nail, a nail, yes a nail!

And you do this:

C#
long sum = 0;
for (int i = 0; i < dictionary.Count; ++i)
{
    sum += dictionary.ElementAt(i).Value;
}

Woot! Woot! It’s working! :)

Well at least with a toy program where you have to treat only a small set of data…

It Almost Worked

Because later, when your boss asks you to treat a huge 1G file, you feel that something is going wrong because it’s dead slow. Later, your boss comes back and starts looking over your shoulder waiting for the results eagerly as he needs them for the 4pm metting. When you have exhausted all your jokes and that it’s still running, you understand it might not end on time and feel well … embarrassed.

So what happens?

The answer is simple: a dictionary is not a nail, you can’t index it quickly as you would with an array or a list that are both based on continuous memory areas where elements can be accessed in constant time aka O(1). So when you ask Linq to get the ith element of a dictionary, it will simply enumerate the elements and count until it reaches i and returns the current element. The complexity of this enumeration is O(n), not so crazy. But remember you use it each time you loop in your main for loop iteration => the whole complexity of your sum is O(n2), ouch!

Reality knocking...

Reality knocking…

The Solution

We’ve seen that the integer indexing was not a good fit for the dictionary so we will use another abstraction: a simple iteration over the elements:

C#
long sum = 0;
foreach (KeyValuePair<int, int> pair in dictionary)
{
    sum += pair.Value;
}

This leads to quite a different code, not that complex, but this time you obtain decent performance.

The Benchmark

Here is the benchmark I’ve used:

C#
using System;
using System.Collections.Generic;
using System.Linq;
using System.Diagnostics;

class Test
{
    static void Main()
    {
        const int N = (int)5e4;
        
        int[] array = new int[N];
        List<int> list = new List<int>(N);
        Dictionary<int, int> dictionary = new Dictionary<int, int>(N);
        
        Random rand = new Random();
        
        for (int i = 0; i < N; ++i)
        {
            int n = rand.Next() % 100;
        
            array[i] = n;
            list.Add(n);
            dictionary[i] = n;
        }
        
        long sumA = 0, sumL = 0, sumD = 0, sumD2 = 0;
        
        Stopwatch sw = new Stopwatch();
        
        sw.Start();
        for (int i = 0; i < array.Length; ++i)
        {
            sumA += array[i];
        }
        sw.Stop();
        
        long arrayDuration = sw.ElapsedTicks;
        
        sw.Restart();
        for (int i = 0; i < list.Count; ++i)
        {
            sumL += list[i];
        }
        sw.Stop();
        
        long listDuration = sw.ElapsedTicks;
        
        sw.Restart();
        for (int i = 0; i < dictionary.Count; ++i)
        {
            sumD += dictionary.ElementAt(i).Value;
        }
        sw.Stop();
        
        long dictionaryDuration = sw.ElapsedTicks;
        
        sw.Restart();
        foreach (KeyValuePair<int, int> pair in dictionary)
        {
            sumD2 += pair.Value;
        }
        sw.Stop();
        
        long dictionaryDuration2 = sw.ElapsedTicks;

        string outputTemplate = @"+----------------------+
|TYPE  |TICKS   |FACTOR|
+----------------------+
|array |{0,8}|{4,6}|
+----------------------+
|list  |{1,8}|{5,6}|
+----------------------+
|dico1 |{2,8}|{6,6}|
+----------------------+
|dico2 |{3,8}|{7,6}|
+----------------------+";
        
        Console.WriteLine(outputTemplate, arrayDuration, listDuration, 
           dictionaryDuration, dictionaryDuration2,
           1, listDuration / arrayDuration, 
           dictionaryDuration / arrayDuration, 
           dictionaryDuration2 / arrayDuration);
    }
}

And here are some results:

+----------------------+
|TYPE  |TICKS   |FACTOR|
+----------------------+
|array |     240|     1|
+----------------------+
|list  |     704|     2|
+----------------------+
|dico1 |53844427|224351|
+----------------------+
|dico2 |    2367|     9|
+----------------------+

As you can see, the naive ElementAt iteration is 20000 times slower than the foreach iteration and 200000 times slower than the reference fixed-length array implementation!

Conclusion

The bottom line is simple: stop to see nails everywhere, and try to adapt to each situation, here to each collection to obtain the best of it. :)

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)