Native C++ and C# - Which is the Fastest? (Part 1)

John J. Scott

4.51/5 (38 votes)

6 Jan 2015Ms-PL6 min read

207.2K

955

A tip that compares the performance of a simple Mandelbrot generator in C# against native C++

Download source - 12.9 MB

Introduction

What is the performance difference between C# and native C++? C# is much quicker as a development environment to make usable utility apps, but it is generally accepted that for true performance, C++ is the way to go. This article investigates that premise, and I expect the additional effort to develop in C++ yield significant runtime performance benefits.

Disclaimer

The program is not representative of a typical application; it is meant to be a computationally intensive test case.
I opted for testing all the default release settings, which hampers C++ more than C#. This will be addressed in part 2.
There's a disproportionate amount of time spent in the square root library function; I'll change this to an approximation in part 2 (e.g. Carmack's approach), or remove it as it's not strictly required.
The code is native C++ talking to Win32, not CLR C++.
None of the rendering or window handling is timed; just the Mandelbrot creation.

Using the Code

I used Visual Studio 2013 for this project, but there's nothing special in the projects that should stop it running in Visual Studio 2012. The C# project is compiled against .NET4.5, and the C++ has all the default optimization settings. The code in both languages does the following:

Creates a simple borderless window
Generates a 640x640x256 Mandelbrot set 20 times
Displays the last generated result as the background image of the window (to make sure it is calculated correctly)
Waits for a mouse click
Displays a message box containing the average time taken to generate a Mandelbrot set in milliseconds
Closes

The code is not meant to be optimal, it is meant to be as simple and performance stressing as possible. The actual Mandelbrot routine itself is identical in both C++ and C#.

The window width and height, the number of iterations, and whether to use floats or doubles is easily configurable with a cursory glance at the code.

The size of the Mandelbrot image is less than the size of the processor with the smallest L2 cache to try to avoid any cache complexity.

There are three configurations to compile for the C# project - Any CPU, Win2, and x64

There are two configurations to compile for the C++ project - Win32 and x64

The projects are configured to output a different executable depending on the configuration, so the previous five binaries will not overwrite each other.

I then ran all five executables with both single and double precision settings on five different machines to gauge performance. I also tried on my Surface Pro 1 (Core i5), and my Server 2012 (Core i3) machines, but the results were too inconsistent.

All the machines run Windows Update, so all had .NET4.5 installed and ran the C# versions out of the box. Only my machine had the VS2013 CRT redist installed, so I had to install those on all other machines.

Caveats:

The machines were not 'clean' (they had background processes running, it wasn't a clean OS install, etc.); they all are machines that at in use. However, all the tests were done at the same time, so they were all equally 'unclean'.
The tests were run several times to validate the results; they were normally within 2%-3%.

Results

The times are the average number of milliseconds taken to create the Mandelbrot set, so lower is quicker.

			Xeon E5-1620 v2 @ 3.7GHz (L2:10)			Celeron 450 @ 2.2GHz (L2:.5)			Xeon E31245 @ 3.3GHz (L2:8)			Celeron 743 @ 1.3GHz (L2:1)			Xeon E5420 @ 2.5GHz (L2:12)
			AnyCPU	x86	x64	AnyCPU	x86	x64	AnyCPU	x86	x64	AnyCPU	x86	x64	AnyCPU	x86	x64
	Precision	Language
640x640	Float	C#	105	107	122	790	791	579	136	136	135	363	363	346	186	185	176
		C++		150	116		874	288		158	117		933	370		473	188

	Double	C#	98	97	121	570	570	581	136	135	135	331	331	346	168	169	177
		C++		137	135		848	584		147	136		894	391		450	198

Points of Interest

The results are not what I expected at all, even with this very synthetic test.

The C++ x64 was always quicker than its x86 equivalent, and sometimes very much quicker.
The double precision version was slightly quicker overall, except in the case of C++ x64 code.
The AnyCPU config has the 'Prefer x86' setting, so I would expect it to perform the same as the Win32 version. This is indeed the case.
C# was quickest overall in 80% of the tests, and the AnyCPU config beat both C++ configurations in all of those cases (even though the C# x64 was quickest in two cases).
The Xeon processors had performance proportional to their clock speeds. This would be an obvious expected result.
The Celeron 450 at 2.2GHz ran C# significantly more slowly than the Celeron 743 at 1.3GHz. This is very confusing.

I did further comparisons of the Celeron processors with smaller Mandelbrot sets.

Results

			Celeron 450 @ 2.2GHz (L2:.5)			Celeron 743 @ 1.3GHz (L2:1)
400x400	Float	C#	309	358	225	142	145	135
		C++		343	114		359	145

	Double	C#	226	225	225	128	128	135
		C++		336	231		356	153

200x200	Float	C#	77	77	56	35	35	33
		C++		86	28		89	37

	Double	C#	55	55	56	32	32	34
		C++		84	58		89	39

Points of Interest

C# was consistently about twice as quick on the 1.3GHz machine than the 2.2GHz machine.
The x64 C++ was always quicker than the x86 C++.
Double precision x86 C++ tends to be a fraction quicker than the single precision. This is consistent with the previous tests.
Double precision x64 C++ tends to be slower than the single precision. This is also consistent with the previous tests.

Conclusions

This does not show that C# is quicker than C++, there isn't a wide enough sampling of processors, the test is not generic enough, and the testing procedure too ad hoc. However, it does show that C# is potentially a performance competitor and if different tests show similar results, C++ will become even less desirable to use.

Additional data points and/or critique would be greatly appreciated!

Further Work

Experiment with C++ optimization settings to see what difference they make (will be addressed in part 2)
Explain why the Penryn Celeron outperforms the Conroe Celeron that has nearly twice the processor speed.
Run other tests that better emulate a real world application.
Find Core i3, i5, and i7 processors in desktop machines to compare against.

History

5^th January, 2015: First created
8th January, 2015: Added notes, added explanation of timings. Removed the smiley.

License

This article, along with any associated source code and files, is licensed under The Microsoft Public License (Ms-PL)