Introduction
What is the performance difference between C# and native C++? C# is much quicker as a development environment to make usable utility apps, but it is generally accepted that for true performance, C++ is the way to go. This article investigates that premise, and I expect the additional effort to develop in C++ yield significant runtime performance benefits.
Disclaimer
- The program is not representative of a typical application; it is meant to be a computationally intensive test case.
- I opted for testing all the default release settings, which hampers C++ more than C#. This will be addressed in part 2.
- There's a disproportionate amount of time spent in the square root library function; I'll change this to an approximation in part 2 (e.g. Carmack's approach), or remove it as it's not strictly required.
- The code is native C++ talking to Win32, not CLR C++.
- None of the rendering or window handling is timed; just the Mandelbrot creation.
Using the Code
I used Visual Studio 2013 for this project, but there's nothing special in the projects that should stop it running in Visual Studio 2012. The C# project is compiled against .NET4.5, and the C++ has all the default optimization settings. The code in both languages does the following:
- Creates a simple borderless window
- Generates a 640x640x256 Mandelbrot set 20 times
- Displays the last generated result as the background image of the window (to make sure it is calculated correctly)
- Waits for a mouse click
- Displays a message box containing the average time taken to generate a Mandelbrot set in milliseconds
- Closes
The code is not meant to be optimal, it is meant to be as simple and performance stressing as possible. The actual Mandelbrot routine itself is identical in both C++ and C#.
The window width and height, the number of iterations, and whether to use float
s or double
s is easily configurable with a cursory glance at the code.
The size of the Mandelbrot image is less than the size of the processor with the smallest L2 cache to try to avoid any cache complexity.
There are three configurations to compile for the C# project - Any CPU, Win2, and x64
There are two configurations to compile for the C++ project - Win32 and x64
The projects are configured to output a different executable depending on the configuration, so the previous five binaries will not overwrite each other.
I then ran all five executables with both single and double precision settings on five different machines to gauge performance. I also tried on my Surface Pro 1 (Core i5), and my Server 2012 (Core i3) machines, but the results were too inconsistent.
All the machines run Windows Update, so all had .NET4.5 installed and ran the C# versions out of the box. Only my machine had the VS2013 CRT redist installed, so I had to install those on all other machines.
Caveats:
- The machines were not 'clean' (they had background processes running, it wasn't a clean OS install, etc.); they all are machines that at in use. However, all the tests were done at the same time, so they were all equally 'unclean'.
- The tests were run several times to validate the results; they were normally within 2%-3%.
Results
The times are the average number of milliseconds taken to create the Mandelbrot set, so lower is quicker.
| | | Xeon E5-1620 v2 @ 3.7GHz (L2:10)
| Celeron 450 @ 2.2GHz (L2:.5) | | Xeon E31245 @ 3.3GHz (L2:8) | | Celeron 743 @ 1.3GHz (L2:1) | | Xeon E5420 @ 2.5GHz (L2:12) |
| | | AnyCPU | x86 | x64 | | AnyCPU | x86 | x64 | | AnyCPU | x86 | x64 | | AnyCPU | x86 | x64 | | AnyCPU | x86 | x64 |
| Precision | Language | | | | | | | | | | | | | | | | | | | |
640x640 | Float | C# | 105 | 107 | 122 | | 790 | 791 | 579 | | 136 | 136 | 135 | | 363 | 363 | 346 | | 186 | 185 | 176 |
| C++ | | 150 | 116 | | 874 | 288 | | 158 | 117 | | 933 | 370 | | 473 | 188 |
|
| Double | C# | 98 | 97 | 121 | | 570 | 570 | 581 | | 136 | 135 | 135 | | 331 | 331 | 346 | | 168 | 169 | 177 |
| C++ | | 137 | 135 | | 848 | 584 | | 147 | 136 | | 894 | 391 | | 450 | 198 |
|
| | | | | | | | | | | | | | | | | | | | | |
Points of Interest
The results are not what I expected at all, even with this very synthetic test.
- The C++ x64 was always quicker than its x86 equivalent, and sometimes very much quicker.
- The double precision version was slightly quicker overall, except in the case of C++ x64 code.
- The AnyCPU config has the 'Prefer x86' setting, so I would expect it to perform the same as the Win32 version. This is indeed the case.
- C# was quickest overall in 80% of the tests, and the AnyCPU config beat both C++ configurations in all of those cases (even though the C# x64 was quickest in two cases).
- The Xeon processors had performance proportional to their clock speeds. This would be an obvious expected result.
- The Celeron 450 at 2.2GHz ran C# significantly more slowly than the Celeron 743 at 1.3GHz. This is very confusing.
I did further comparisons of the Celeron processors with smaller Mandelbrot sets.
Results
| | | | Celeron 450 @ 2.2GHz (L2:.5) | | | | Celeron 743 @ 1.3GHz (L2:1) | | |
400x400 | Float | C# | | 309 | 358 | 225 | | 142 | 145 | 135 | |
| C++ | | 343 | 114 | | 359 | 145 | |
|
| Double | C# | | 226 | 225 | 225 | | 128 | 128 | 135 | |
| C++ | | 336 | 231 | | 356 | 153 | |
|
200x200 | Float | C# | | 77 | 77 | 56 | | 35 | 35 | 33 | |
| C++ | | 86 | 28 | | 89 | 37 | |
|
| Double | C# | | 55 | 55 | 56 | | 32 | 32 | 34 | |
| C++ | | 84 | 58 | | 89 | 39 | |
| | | | | | | | | | | | | | | | | | | | | |
Points of Interest
- C# was consistently about twice as quick on the 1.3GHz machine than the 2.2GHz machine.
- The x64 C++ was always quicker than the x86 C++.
- Double precision x86 C++ tends to be a fraction quicker than the single precision. This is consistent with the previous tests.
- Double precision x64 C++ tends to be slower than the single precision. This is also consistent with the previous tests.
Conclusions
This does not show that C# is quicker than C++, there isn't a wide enough sampling of processors, the test is not generic enough, and the testing procedure too ad hoc. However, it does show that C# is potentially a performance competitor and if different tests show similar results, C++ will become even less desirable to use.
Additional data points and/or critique would be greatly appreciated!
Further Work
- Experiment with C++ optimization settings to see what difference they make (will be addressed in part 2)
- Explain why the Penryn Celeron outperforms the Conroe Celeron that has nearly twice the processor speed.
- Run other tests that better emulate a real world application.
- Find Core i3, i5, and i7 processors in desktop machines to compare against.
History
- 5th January, 2015: First created
- 8th January, 2015: Added notes, added explanation of timings. Removed the smiley.