Something You May Not Know About the Switch Statement in C/C++

Zuoliu Ding

4.86/5 (152 votes)

19 Dec 2012CPOL13 min read

387K

463

A discussion on how switch/case is executed, by reverse engineering in VC++

Download samples - 24.65 KB

Introduction

Many programming languages such as C/C++, C#, Java, and Pascal provide the switch statement to let us implement selection logic. In some scenarios, it's a good alternative to if-then-else, making code clearer and more readable. When using switch in practice, you may want to know:

How the switch block is executed at runtime?
Is it running faster than an if-then-else for a long list of conditions?
For n conditions, what is the switch time complexity?

The C/C++ standard defines the specification of language elements, but it doesn't say anything about how to implement the switch statement. Every vendor is free to use any implementation as long as it fits the standard. This article discusses what happens when running a switch statement in Visual C++, with a few examples under different conditions. We'll analyze these examples by using the Microsoft Visual Studio IDE, since it can generate the corresponding assembly listing on compiling. Thus, a general understanding of Intel (x86) assembly language is assumed. As you can see shortly, all results here are based on reverse engineering, and so the article is never a comprehensive description of switch implementations in compilers. If you are learning Assembly Language Programming, this article might be a study material to read.

Our first example is switch1.cpp, a commonly used simple block as below:

C++

#include "functions.h"

int main()
{
    int i =3;    // or i =20

    switch (i)
    {
        case 1: f1(); break;
        case 2: f2(); break;
        case 5: f1(); break;
        case 7: f2(); break;
        case 10: f1(); break;
        case 11: f2(); break;
        case 12: f2(); break;
        case 17: f1(); break;
        case 18: f1(); break;

        default: f3(); 
    }

    return 0;
}

where three functions are defined in functions.cpp:

C++

void f1() { cout  "f1 called\n"; }
void f2() { cout  "f2 called\n"; }
void f3() { cout  "f3 called\n"; }

Could the worst case be thought of as i=3 or i=20? How does it execute: does it exhaust all nine cases and finally reach the default to call f3? Let's answer it from the assembly translations.

In order to generate an assembly listing in Visual Studio, open the switch1.cpp Property dialog and select the Output Files category under C/C++. On the right pane, choose the Assembly With Source Code (/FAs) option as shown below:

The option in the switch1.cpp Properties

Then, when you compile switch1.cpp, an assembly file called switch1.asm will be generated. Using this option, the listing includes the C++ source code, which is commented by the semi-colon with a line number as shown in the next section.

The Two-Level Jump Table

Let's analyze the assembly listing from top down. Here is where the switch starts:

ASM

; 5    :     int i =3;    // or i =20

    mov    DWORD PTR _i$[ebp], 3

; 6    : 
; 7    :     switch (i)

    mov    eax, DWORD PTR _i$[ebp]
    mov    DWORD PTR tv64[ebp], eax
    mov    ecx, DWORD PTR tv64[ebp]
    sub    ecx, 1
    mov    DWORD PTR tv64[ebp], ecx
    cmp    DWORD PTR tv64[ebp], 17            ; 00000011H
    ja     SHORT $LN1@main
    mov    edx, DWORD PTR tv64[ebp]
    movzx  eax, BYTE PTR $LN15@main[edx]
    jmp    DWORD PTR $LN16@main[eax*4]

Assuming that the symbol _i$[ebp] is an alias of i, tv64[ebp] is another name of i2, $LN15@main is renamed as table1, $LN16@main as table2, and $LN1@main is a label named Default_Label. The snippet merely does this in pseudocode:

i2 = i;
i2 = i2-1;
if i2 > 17 goto Default_Label;
goto table2[4*table1[i2]];

Here, 17 indicates the last case condition value, because the integers 1 to 18 are mapped from 0 to 17. This is why i2 is decremented to make it a zero-based integer as an index. Now, if i2 is greater than 17 (e.g., n=20), the control goes to the default. Otherwise, it goes to where table2[4*table1[i2]] is pointed.

How about when i is zero? Then i2 becomes -1. Worry about the index out of range error? No, it will never happen. Back to the assembly listing, you can see that -1 is saved as DWORD, the double word as an unsigned 4-byte integer. Hence, it must be greater than 17 and goes to default.

Let's look at two tables and see how they work together. The table1 is pretty simple, with a starting address at $LN15@main, which you can just think of as an array name.

ASM

$LN15@main:
    DB    0
    DB    1
    DB    9
    DB    9
    DB    2
    DB    9
    DB    3
    DB    9
    DB    9
    DB    4
    DB    5
    DB    6
    DB    9
    DB    9
    DB    9
    DB    9
    DB    7
    DB    8

With this array, table1[0] is 0, table1[1] is 1, table1[2] and table1[3] are 9, etc. These values are created for the calculation of the index of table2, which starts at $LN16@main:

ASM

$LN16@main:
    DD    $LN10@main
    DD    $LN9@main
    DD    $LN8@main
    DD    $LN7@main
    DD    $LN6@main
    DD    $LN5@main
    DD    $LN4@main
    DD    $LN3@main
    DD    $LN2@main
    DD    $LN1@main

The above labels, from $LN10@main to $LN1@main, are ten calling targets in C++, for nine cases plus one default. Notice that DB represents defining byte (8 bits), while DD defines the double word type of four bytes (32 bits). This is why we need to multiply 4 in table2[4*table1[i2]]. By this formula, we calculate the calling address via table1 and table2:

if i equals 1, i2 is 0 and table1[0] is 0, jump to $LN10@main by table2[0], the first case.
if i equals 2, i2 is 1 and table1[1] is 1, jump to $LN9@main by table2[4*1], the second case.
if i equals 3, i2 is 2 and table1[2] is 9, jump to $LN1@main by table2[4*9], the default.
... ...

Now we come to the code segment labeled by LN10@main to $LN1@main as the calling targets:

ASM

$LN10@main:
; 8    :     {
; 9    :         case 1: f1(); break;

    call        ?f1@@YAXXZ            ; f1
    jmp    SHORT $LN11@main

$LN9@main:
; 10   :         case 2: f2(); break;

    call    ?f2@@YAXXZ                ; f2
    jmp    SHORT $LN11@main

$LN8@main:
; 11   :         case 5: f1(); break;

    call    ?f1@@YAXXZ                ; f1
    jmp    SHORT $LN11@main

; ... ...

$LN2@main:
; 17   :         case 18: f1(); break;

    call    ?f1@@YAXXZ                ; f1
    jmp    SHORT $LN11@main

$LN1@main:
; 18   : 
; 19   :         default: f3(); 

    call    ?f3@@YAXXZ                ; f3

$LN11@main:
; 20   :     }
; 21   : 
; 22   :     return 0;

Functions f1, f2, and f3 are converted to the decorated assembly procedures, and prefixed with a question mark, like ?f1@@YAXXZ, ?f2@@YAXXZ, and ?f3@@YAXXZ. Notice that $LN10@main is a label of case 1 to call f1, $LN9@main for case 2 to call f2, and $LN1@main for default to call f3.

An additional label is $LN11@main, pointing to the location after the last default clause. As soon as each case operation is done, the control jumps to $LN11@main. This implements the break statement. Without it, the control goes to the next case. This is why the break statement is necessary in the C/C++ switch block.

Obviously, based on such a two-level table mechanism, we have one comparison, one multiplication, and two address jumps. The time complexity of this switch pattern should be O(1). Put this all together, we have a big picture like this:

Recall that we use DB to define index values in table1, and DD for table2. In this example, there are only 18 case values for table1. But DB defining byte means the range is only from zero to 255. What if there are more cases in switch, with the number of values over 255?

The One-Level Jump Table

Here comes the second example, switch2.cpp with 1000 cases. For simplicity, we start from the condition value zero:

C++

int main2()
{
    int i =1000;

    switch (i)
    {
        case 0: f1(); break;
        case 1: f1(); break;
        case 2: f2(); break;
        case 3: f1(); break;
      // ... ...
        case 995: f1(); break;
        case 996: f1(); break;
        case 997: f1(); break;
        case 998: f2(); break;
        case 999: f1(); break;

        default: f3(); 
    }

    return 0;
}

Don't think it makes things complicated. On the contrary, it becomes simpler. This is the switch2.asm that Visual Studio generates for us:

ASM

; 5    :     int i =1000;

    mov    DWORD PTR _i$[ebp], 1000        ; 000003e8H

; 6    : 
; 7    :     switch (i)

    mov    eax, DWORD PTR _i$[ebp]
    mov    DWORD PTR tv64[ebp], eax
    cmp    DWORD PTR tv64[ebp], 999        ; 000003e7H
    ja     $LN1@main2
    mov    ecx, DWORD PTR tv64[ebp]
    jmp    DWORD PTR $LN1006@main2[ecx*4]

Likewise, assume that the symbol _i$[ebp] is an alias of i. Because the first case starts from zero, no mapping is needed. So i2 (from tv64[ebp]) equals i. The array $LN1006@main2 is an only jump table, and the label $LN1@main2 is Default_Label. This snippet simply does this:

i2 = i;
if i2 > 999 goto Default_Label;
goto table[4*i2];

By searching $LN1006@main2 in switch2.asm, we find the code labels all defined by double words:

ASM

$LN1006@main2:
    DD    $LN1001@main2
    DD    $LN1000@main2
    DD    $LN999@main2
    DD    $LN998@main2
    DD    $LN997@main2
    DD    $LN996@main2
   ; ... ...
    DD    $LN5@main2
    DD    $LN4@main2
    DD    $LN3@main2
    DD    $LN2@main2

Totally 1000 calling targets, each label is followed by a procedure call, downwards from $LN1001@main2 to $LN2@main2:

ASM

$LN1001@main2:
; 8    :     {
; 9    :         case 0: f1(); break;
    call    ?f1@@YAXXZ                ; f1
    jmp    $LN1002@main2

$LN1000@main2:
; 10   :         case 1: f1(); break;

    call    ?f1@@YAXXZ                ; f1
    jmp    $LN1002@main2

$LN999@main2:
; 11   :         case 2: f2(); break;

    call    ?f2@@YAXXZ                ; f2
    jmp    $LN1002@main2

; ... ...

$LN3@main2:
; 1106 :         case 998: f2(); break;

    call    ?f2@@YAXXZ                ; f2
    jmp    SHORT $LN1002@main2

$LN2@main2:
; 1107 :         case 999: f1(); break;

    call    ?f1@@YAXXZ                ; f1
    jmp    SHORT $LN1002@main2

$LN1@main2:
; 1108 : 
; 1109 :         default: f3(); 

    call    ?f3@@YAXXZ                ; f3
$LN1002@main2:

; 1110 :     }
; 1111 : 
; 1112 :     return 0;

Here, $LN1@main2 directs to default, and the additional label $LN1002@main implements break. Although the complexity is also O(1), this mechanism should be a little bit efficient since only one address jump is required. The following is a picture of this example:

Until now, we see the switch illustrations executed so well. Is it really better to use switch than if-then-else? For n conditions without a match, definitely, if-then-else must check each condition until the last, which is the worst case of O(n). But for switch, do we always expect its complexity to be O(1)? Or is there some kind of code that would lead its execution beyond that?

Using Binary Search

We will give the third example showing the large gap between case condition values in switch3.cpp, where the switch execution behaves as the binary search:

C++

int main3()
{
    int i =1;

    switch (i)
    {
        case 100: f1(); break;
        case 200: f2(); break;
        case 250: f2(); break;
        case 500: f1(); break;
        case 700: f2(); break;
        case 750: f2(); break;
        case 800: f2(); break;
        case 900: f1(); break;

        default: f3(); 
    }

    return 0;
}

By generating the assembly listing switch3.asm, this time, we take a look from bottom up by checking the case operations first:

ASM

$LN9@main3:

; 8    :     {
; 9    :         case 100: f1(); break;
    call    ?f1@@YAXXZ                ; f1
    jmp    SHORT $LN10@main3

$LN8@main3:
; 10   :         case 200: f2(); break;
    call    ?f2@@YAXXZ                ; f2
    jmp    SHORT $LN10@main3

$LN7@main3:
; 11   :         case 250: f2(); break;
    call    ?f2@@YAXXZ                ; f2
    jmp    SHORT $LN10@main3

$LN6@main3:
; 12   :         case 500: f1(); break;
    call    ?f1@@YAXXZ                ; f1
    jmp    SHORT $LN10@main3

$LN5@main3:
; 13   :         case 700: f2(); break;
    call    ?f2@@YAXXZ                ; f2
    jmp    SHORT $LN10@main3

$LN4@main3:
; 14   :         case 750: f2(); break;
    call    ?f2@@YAXXZ                ; f2
    jmp    SHORT $LN10@main3

$LN3@main3:
; 15   :         case 800: f2(); break;
    call    ?f2@@YAXXZ                ; f2
    jmp    SHORT $LN10@main3

$LN2@main3:
; 16   :         case 900: f1(); break;
    call    ?f1@@YAXXZ                ; f1
    jmp    SHORT $LN10@main3

$LN1@main3:
; 17   : 
; 18   :         default: f3(); 
    call    ?f3@@YAXXZ                ; f3

$LN10@main3:
; 19   :     }
; 20   : 
; 21   :     return 0;

Pretty neat, with eight cases plus one default, we can easily write this part into a list of nine rows, each corresponding to a target label, a C++ case, and a procedure name:

Besides $LN9@main3 to $LN1@main3, we also have $LN10@main3 implement the break statement. Next, how do we jump to these calling targets? Look at the following code:

ASM

; 5    :     int i =1;

    mov    DWORD PTR _i$[ebp], 1

; 6    : 
; 7    :     switch (i)

    mov    eax, DWORD PTR _i$[ebp]
    mov    DWORD PTR tv64[ebp], eax
    cmp    DWORD PTR tv64[ebp], 700        ; 000002bcH
    jg     SHORT $LN14@main3
    cmp    DWORD PTR tv64[ebp], 700        ; 000002bcH
    je     SHORT $LN5@main3
    cmp    DWORD PTR tv64[ebp], 250        ; 000000faH
    jg     SHORT $LN15@main3
    cmp    DWORD PTR tv64[ebp], 250        ; 000000faH
    je     SHORT $LN7@main3
    cmp    DWORD PTR tv64[ebp], 100        ; 00000064H
    je     SHORT $LN9@main3
    cmp    DWORD PTR tv64[ebp], 200        ; 000000c8H
    je     SHORT $LN8@main3
    jmp    SHORT $LN1@main3
$LN15@main3:
    cmp    DWORD PTR tv64[ebp], 500        ; 000001f4H
    je     SHORT $LN6@main3
    jmp    SHORT $LN1@main3
$LN14@main3:
    cmp    DWORD PTR tv64[ebp], 750        ; 000002eeH
    je     SHORT $LN4@main3
    cmp    DWORD PTR tv64[ebp], 800        ; 00000320H
    je     SHORT $LN3@main3
    cmp    DWORD PTR tv64[ebp], 900        ; 00000384H
    je     SHORT $LN2@main3
    jmp    SHORT $LN1@main3

The logic is not too hard to understand. At first, the condition value i is saved in both _i$[ebp] and tv64[ebp]. To simplify, we just make use of all the labels here by removing the leading "$" and the tailing "@main3". Rewrite the snippet like this:

    i2 = i;
    if i2 > 700 goto LN14;
    if i2 == 700 goto LN5;
    if i2 > 250 goto LN15;
    if i2 == 250 goto LN7;
    if i2 == 100 goto LN9;
    if i2 == 200 goto LN8;
    goto LN1;
LN15:
    if i2 == 500 goto LN6;
    goto LN1;
LN14:
    if i2 == 750 goto LN4;
    if i2 == 800 goto LN3;
    if i2 == 900 goto LN2;
    goto LN1;

Surely, the compiler optimizes the code, because it first chooses 700 to compare, instead of the beginning case value 100 in the switch block. Although this switch is implemented with ten if-then statements for nine conditions, it actually applies the binary search mechanism. The following shows the equivalent decision tree of above code, a Binary Search Tree that you may be familiar with:

For all internal nodes colored yellow in circle, we make the comparisons, while all leave nodes in oval represent the successful ending. Each failure goes to LN1 by default. The inorder traversal sequence for comparison nodes is 100, 200, 250, 500, 700, 750, 800, and 900; while the sequence for all leaves in the inorder traversal is exactly LN9, LN8, LN7, LN6, LN5, LN4, LN3, and LN2, as ordered in the previous table. Essentially, this is a binary search algorithm with the complexity of O(log n). When i=1, it will pass through the first six comparisons and then reach default at LN1. But when i=500, just four comparisons are required.

As we know, the prerequisite of binary search is that the input data is sorted. I wonder if this is just because I use the ascending case values in switch in switch3.cpp. The curiosity brings out the following disordered conditions in switch4.cpp.

C++

int main4()
{
    int i =1;

    switch (i)
    {
    case 750: f2(); break;
    case 700: f2(); break;
    case 250: f2(); break;
    case 500: f1(); break;
    case 800: f2(); break;
    case 900: f1(); break;
    case 100: f1(); break;
    case 200: f2(); break;

        default: f3(); 
    }

    return 0;
}

To my surprise, the same binary search strategy appears in code of switch4.asm, with exact the same decision tree as shown above. Only difference is that the labels are renumbered - that's quite reasonable, because we have just reordered them! You can examine the attached switch4.asm for details.

This experiment definitely brings us some hint to find out how the compiler does its magic to sort the case condition values. A sorting algorithm is over O(log n) and not worthwhile to have it at runtime. Notice that sorting is not visible from generated assemblies. This means that sorting is not contained in the assembly instructions which will execute at runtime. Also notice that all the condition values are constant and available before the compiler translates the C++ to machine code. Now it’s reasonable to think of the preprocessor; the sorting with all known case values could be simply done in compilation. This is why the translated assembly code only reflects the binary search without sorting code. The static sorting behavior (as opposed to dynamic behavior at runtime), could be implemented with a macro procedure, assembly directives and operators. Such an preprocessing example can be found in References at the end of the article.

The Hybrid

At this moment, I can show an example of combination of a jump table and the binary search as in switch5.cpp.

C++

int main5()
{
    int i =1;

    switch (i)
    {
        case 3: f1(); break;
        case 5: f2(); break;
        case 6: f2(); break;
        case 7: f2(); break;
        case 100: f1(); break;
        case 200: f2(); break;
        case 250: f2(); break;
        case 700: f2(); break;
        case 750: f2(); break;
        case 900: f2(); break;

        default: f3(); 
    }

    return 0;
}

The following is how this switch block is implemented in the corresponding switch5.asm:

ASM

; 5    :     int i =1;

    mov    DWORD PTR _i$[ebp], 1

; 6    : 
; 7    :     switch (i)

    mov    eax, DWORD PTR _i$[ebp]
    mov    DWORD PTR tv64[ebp], eax
    cmp    DWORD PTR tv64[ebp], 100        ; 00000064H
    jg     SHORT $LN16@main5
    cmp    DWORD PTR tv64[ebp], 100        ; 00000064H
    je     $LN7@main5
    mov    ecx, DWORD PTR tv64[ebp]
    sub    ecx, 3
    mov    DWORD PTR tv64[ebp], ecx
    cmp    DWORD PTR tv64[ebp], 4
    ja     $LN1@main5
    mov    edx, DWORD PTR tv64[ebp]
    jmp    DWORD PTR $LN18@main5[edx*4]

$LN16@main5:
    cmp    DWORD PTR tv64[ebp], 700        ; 000002bcH
    jg     SHORT $LN17@main5
    cmp    DWORD PTR tv64[ebp], 700        ; 000002bcH
    je     SHORT $LN4@main5
    cmp    DWORD PTR tv64[ebp], 200        ; 000000c8H
    je     SHORT $LN6@main5
    cmp    DWORD PTR tv64[ebp], 250        ; 000000faH
    je     SHORT $LN5@main5
    jmp    SHORT $LN1@main5
$LN17@main5:
    cmp    DWORD PTR tv64[ebp], 750        ; 000002eeH
    je     SHORT $LN3@main5
    cmp    DWORD PTR tv64[ebp], 900        ; 00000384H
    je     SHORT $LN2@main5
    jmp    SHORT $LN1@main5

It consists of two parts, a one-level jump table first and then the binary search code. Using the previous notations of i, i2 and label naming, let $LN18@main5 as a table and $LN1@main5 as default. This snippet does this:

    i2 = i;
    if i2 > 100 goto LN16;
    if i2 == 100 goto LN7;    // go case 100
                              // now i2 < 100
    i2 = i2-3;                // map to zero-based index
    if i2 > 4 goto LN1;       // go default over 7 
    goto table[4*i2];         // go jump table

LN16:                         // binary search
    if i2 > 700 goto LN17;
    if i2 == 700 goto LN4;    // go case 700
    if i2 == 200 goto LN6;    // go case 200
    if i2 == 250 goto LN5;    // go case 250
    goto LN1;                 // go default
LN17:
    if i2 == 750 goto LN3;    // go case 750
    if i2 == 900 goto LN2;    // go default    
    goto LN1;

where the table is defined as:

ASM

$LN18@main5:
    DD    LN11    ; index 0, go case 3
    DD    LN1     ; index 1 for value 4, go default
    DD    LN10    ; index 2, go case 5
    DD    LN9     ; index 3, go case 6
    DD    LN8     ; index 4, go case 7

An interesting change is to remove the case 6 in switch5.cpp. Just because of this, the hybrid becomes the combination of a two-level jump table and the binary search. For details, look at switch6.cpp and switch6.asm in downloadable samples. If you try to add extra cases in some order, you can find two individual jump tables combined with the binary search.

Summary

By scrutinizing the above examples, we expose something that you may not know about the switch at runtime. To analyze a C/C++ program in Visual Studio, we can have both static and dynamic analyses. For this article, we make use of the assembly listings generated by the compiler. On the other hand, we can also monitor binary executions at runtime with the VS Disassembly window in debug, where the labels and procedures in the listing are translated to memory addresses. This way, you can trace registers and memory allocations to understand data endianness, stack frames, etc. Here, for our purpose, the assembly listing seems sufficient and easy to indicate a jump from here to there. The downloadable zip file contains seven examples, with both the .cpp source code and the .asm listings in VS 2010. The same listings can be generated by an earlier version of VS.

This article does not involve hot technologies like .NET or C#. Assembly Language is comparatively traditional without much sensation these days. However, in applications, different types of assembly languages play an important role in new device development. Academically, Assembly Language Programming is considered as a demanding course in Computer Science. I am teaching Assembly Language Programming, CSCI 241, at Fullerton College, where the concept of high-level language interfacing with low-level execution is an essential part in teaching. In particular, one of the main topics that students are interested in is how C/C++ or Java runs on a machine, represented by low-level data structures and algorithms. Therefore, I hope this article could also serve as one of the basic problem-solving examples for students. Any comments and suggestions are welcome.

Acknowledgment

I would like to sincerely thank wtwhite, Charvak Karpe, and Harold Aptroot for their valuable discussions about the binary search here. Without them, the section "Using Binary Search" would not be like it as today.

References

From MSDN: Microsoft Macro Assembler Reference
War on Theism: x86 Instruction Set
Textbook: Assembly Language for Intel-Based Computers

History

9 August, 2010 -- Original version posted
8 September, 2010 -- Modified and rewrote some sections:
- Replaced the section "The Worst Case of the switch Statement" with "Using Binary Search". Added the Binary Search Tree image.
- Added the section "The Hybrid"
- Rewrote the section "More Questions and beyond"
- Adjusted the paragraphs between first two sections
- Added the explanation for a negative index in the section "The Two-Level Jump Table"
28 September, 2010 -- Article updated

Clarified the unanswered question about sort prerequisite for the binary search.
Added Acknowledgement and References

19 December, 2012 -- Incorporated the memory discussion example from Christian Scholze

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)