Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C#7.0

Real Case of Premature Micro-Optimization

4.33/5 (3 votes)
31 Jan 2020CPOL2 min read 9.2K   21  
Ternary Operator vs Lookup Table Benchmark

The benchmark is hosted at Github.

I discovered a real case of premature micro-optimization when you don't measure it. You know it is pretty bad when you read premature and micro in the same sentence. On page 44 of Optimizing C++ ebook by Agner Fog, it is written ...

In some cases it is possible to replace a poorly predictable branch by a table lookup. For example:

C++
float a;
bool b;
a = b ? 1.5f : 2.6f;

The ?: operator here is a branch. If it is poorly predictable then replace it by a table lookup:

C++
float a;
bool b = 0;
const float lookup[2] = {2.6f, 1.5f};
a = lookup[b];

If a bool is used as an array index then it is important to make sure it is initialized or comes from a reliable source so that it can have no other values than 0 or 1. In some cases the compiler can automatically replace a branch by a conditional move.

I was trying to implement this lookup table optimization (to replace ternary operator) on a floating-point value which the code is compiled with G++ and ran on Linux. I also ran the integer benchmark and on other compilers such like Visual C++ 2019 and Clang and also the Visual C# 7 to see their differences. Below is the C++ benchmark code. The lookup array is declared as a static local variable inside the function.

C++
int64_t IntTernaryOp(bool value)
{
    return value ? 3 : 4;
}

int64_t IntArrayOp(bool value)
{
    static const int64_t arr[2] = { 3, 4 };
    return arr[value];
}

double FloatTernaryOp(bool value)
{
    return value ? 3.0f : 4.0f;
}

double FloatArrayOp(bool value)
{
    static const double arr[2] = { 3.0f, 4.0f };
    return arr[value];
}

int main()
{
    const size_t MAX_LOOP = 1000000000;
    
    int64_t sum = 0;
    double sum_f = 0;

    timer stopwatch;

    stopwatch.start("IntTernaryOp");
    sum = 0;
    for (size_t k = 0; k < MAX_LOOP; ++k)
    {
        sum += IntTernaryOp(k % 2);
    }
    stopwatch.stop(sum);

    stopwatch.start("IntArrayOp");
    sum = 0;
    for (size_t k = 0; k < MAX_LOOP; ++k)
    {
        sum += IntArrayOp(k % 2);
    }
    stopwatch.stop(sum);

    stopwatch.start("FloatTernaryOp");
    sum_f = 0;
    for (size_t k = 0; k < MAX_LOOP; ++k)
    {
        sum_f += FloatTernaryOp(k % 2);
    }
    stopwatch.stop(sum_f);

    stopwatch.start("FloatArrayOp");
    sum_f = 0;
    for (size_t k = 0; k < MAX_LOOP; ++k)
    {
        sum_f += FloatArrayOp(k % 2);
    }
    stopwatch.stop(sum_f);

    return 0;
}

In Visual C# code, static local variable is not permitted so the lookup array is declared as a static member variable inside the class. C# does not allow casting integer bool to integer (to be used as array index), so I use an integer instead.

C#
static Int64[] arrInt64 = new Int64[] { 3, 4 };
static Double[] arrDouble = new Double[] { 3.0, 4.0 };

static Int64 IntTernaryOp(Int64 value)
{
    return (value==1) ? 3 : 4;
}

static Int64 IntArrayOp(Int64 value)
{
    return arrInt64[value];
}

static double FloatTernaryOp(Int64 value)
{
    return (value == 1) ? 3.0f : 4.0f;
}

static double FloatArrayOp(Int64 value)
{
    return arrDouble[value];
}

static void Main(string[] args)
{
    const int MAX_LOOP = 1000000000;

    Int64 sum = 0;
    double sum_f = 0.0;

    Stopwatch stopWatch = new Stopwatch();
    stopWatch.Start();
    sum = 0;
    for (Int64 k = 0; k < MAX_LOOP; ++k)
    {
        sum += IntTernaryOp(k % 2);
    }
    stopWatch.Stop();
    DisplayElapseTime("IntTernaryOp", stopWatch.Elapsed, sum);

    stopWatch = new Stopwatch();
    stopWatch.Start();
    sum = 0;
    for (Int64 k = 0; k < MAX_LOOP; ++k)
    {
        sum += IntArrayOp(k % 2);
    }
    stopWatch.Stop();
    DisplayElapseTime("IntArrayOp", stopWatch.Elapsed, sum);

    stopWatch = new Stopwatch();
    stopWatch.Start();
    sum_f = 0;
    for (Int64 k = 0; k < MAX_LOOP; ++k)
    {
        sum_f += FloatTernaryOp(k % 2);
    }
    stopWatch.Stop();
    DisplayElapseTime("FloatTernaryOp", stopWatch.Elapsed, sum_f);

    stopWatch = new Stopwatch();
    stopWatch.Start();
    sum_f = 0;
    for (Int64 k = 0; k < MAX_LOOP; ++k)
    {
        sum_f += FloatArrayOp(k % 2);
    }
    stopWatch.Stop();
    DisplayElapseTime("FloatArrayOp", stopWatch.Elapsed, sum_f);
}

These are the benchmark results on a computer with Intel i7-8700 at 3.2Ghz with 16GB RAM.

VC++ /Ox results
       IntTernaryOp:  562ms, result:3500000000
         IntArrayOp:  523ms, result:3500000000
     FloatTernaryOp: 3972ms, result:3.5e+09
       FloatArrayOp: 1030ms, result:3.5e+09
G++ 7.4.0 -O3 results
       IntTernaryOp:  306ms, result:3500000000
         IntArrayOp:  519ms, result:3500000000
     FloatTernaryOp: 1030ms, result:3.5e+09
       FloatArrayOp: 1030ms, result:3.5e+09
Clang++ 6.0.0 -O# results
       IntTernaryOp:  585ms, result:3500000000
         IntArrayOp:  523ms, result:3500000000
     FloatTernaryOp: 1030ms, result:3.5e+09
       FloatArrayOp: 1030ms, result:3.5e+09
C# 7 Release Mode, .NET Framework 4.7.2
       IntTernaryOp: 1311ms, result:3500000000
         IntArrayOp: 1038ms, result:3500000000
     FloatTernaryOp: 2448ms, result:3500000000
       FloatArrayOp: 1036ms, result:3500000000

For the integer benchmark, the lookup table got worse performance than lookup table with G++, while in Visual C++/C# and Clang++, there is improvement. As it turns out in the floating benchmark I am looking for (in G++), improvement is only seen in Visual C++/C# while G++ and Clang++ retained the same timing for both ternary and lookup table code. I did check the assembly code at the Godbolt Compiler Explorer, none of the ternary operator is converted to use conditional move as suggested by Agner Fog's book so I guess the short branch is still with the CPU cache, therefore it makes little difference in G++ and Clang++ floating-point test. The morale of the story is never take a book's word at its value and always measure to confirm your hypothesis. In my case, I forgo this lookup micro-optimization.

History

  • 31st January, 2020: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)