Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C#4.0

Performance Tests: Precise Run Time Measurements with System.Diagnostics.Stopwatch

4.95/5 (52 votes)
28 Feb 2010BSD3 min read 1  
How the .NET Framework Stopwatch class can be used to measure and compare algorithm runtime with a high accuracy

Introduction

Everybody who does performance optimization stumbles sooner or later over the Stopwatch class in the System.Diagnostics namespace. And everybody has noticed that the measurements of the same function on the same computer can differ 25% -30% in run time. This article shows how single threaded test programs must be designed to get an accuracy of 0.1% - 0.2% out of the Stopwatch class. With this accuracy, algorithms can be tested and compared.

Background

Modern CPUs have multiple cores, large caches, instruction pipelines and many other stuff affecting the run time of an algorithm in a particular test scenario. White box testing techniques - like attached Debugger or Profiler - dismiss the cache lines, pipelines, etc. on the CPU. The real run time is hidden, so that an algorithm optimized for these modern superscalar CPUs performs slower (due to more instructions) than a non optimized one with an attached profiler. Black box testing (run time measurement) without an attached debugger or profiler discovers the real performance of an algorithm and completes the performance analysis of an algorithm.

Setting Up the Test Scenario

The most important thing is to prevent switching between CPU cores or processors. Switching dismisses the cache, etc. and has a huge performance impact on the test. This can be done by setting the ProcessorAffinity mask of the process:

C#
Process.GetCurrentProcess().ProcessorAffinity = new IntPtr(2); // Use only the second core 

To get the CPU core more exclusively, we must prevent that other threads can use this CPU core. We set our process and thread priority to achieve this:

C#
Process.GetCurrentProcess().PriorityClass = ProcessPriorityClass.High;
Thread.CurrentThread.Priority = ThreadPriority.Highest;

And last but not least, we need a warmup phase in our tests. On my system, the results are stable after a warm-up phase of 1000-1500 milliseconds. We can use the stopwatch itself to control the warm-up (at least 1200mS here):

C#
stopwatch.Start();
while (stopwatch.ElapsedMilliseconds < 1200)
{
    result = TestFunction(seed, count); 
}
stopwatch.Stop(); 

Here is the complete sample:

C#
 using System;
using System.Diagnostics;
using System.Threading;

namespace PreciseMeasure
{
    class Program
    {
        static void Main(string[] args)
        {
            Stopwatch stopwatch = new Stopwatch();

            long seed = Environment.TickCount; 	// Prevents the JIT Compiler 
					// from optimizing Fkt calls away
            long result = 0;
            int count = 100000000;

            Console.WriteLine("20 Tests without correct preparation"); 
            Console.WriteLine("Warmup");
            for (int repeat = 0; repeat < 20; ++repeat)
            {
                stopwatch.Reset();
                stopwatch.Start();
                result ^= TestFunction(seed, count);
                stopwatch.Stop();
                Console.WriteLine("Ticks: " + stopwatch.ElapsedTicks + 
				" mS: " +stopwatch.ElapsedMilliseconds);
            }

            Process.GetCurrentProcess().ProcessorAffinity = 
		new IntPtr(2); // Uses the second Core or Processor for the Test
            Process.GetCurrentProcess().PriorityClass = 
		ProcessPriorityClass.High;  	// Prevents "Normal" processes 
					// from interrupting Threads
            Thread.CurrentThread.Priority = 
		ThreadPriority.Highest;  	// Prevents "Normal" Threads 
					// from interrupting this thread
    
            Console.WriteLine();
            Console.WriteLine();
            Console.WriteLine("20 Tests with correct preparation"); 
            Console.WriteLine("Warmup");
            stopwatch.Reset();
            stopwatch.Start();
            while (stopwatch.ElapsedMilliseconds < 1200)  // A Warmup of 1000-1500 mS 
					// stabilizes the CPU cache and pipeline.
            {
                result = TestFunction(seed, count); // Warmup
            }
            stopwatch.Stop();

            for (int repeat = 0; repeat < 20; ++repeat)
            {
                stopwatch.Reset();
                stopwatch.Start();
                result ^= TestFunction(seed, count);
                stopwatch.Stop();
                Console.WriteLine("Ticks: " + stopwatch.ElapsedTicks + 
				" mS: " + stopwatch.ElapsedMilliseconds);
            }
            Console.WriteLine(result); // prevents optimizations (current compilers are 
		// too silly to analyze the dataflow that deep, but we never know )
        }

        public static long TestFunction(long seed, int count)
        {
            long result = seed;
            for (int i = 0; i < count; ++i)
            {
                result ^= i ^ seed; // Some useless bit operations
            }
            return result;
        }
    }
}

Results

Without Correct Preparation

  1. Ticks: 1580367 mS: 572 <-- highest Value
  2. Ticks: 1577003 mS: 571
  3. Ticks: 1576140 mS: 571
  4. Ticks: 1560964 mS: 565
  5. Ticks: 1351663 mS: 489
  6. Ticks: 1248383 mS: 452
  7. Ticks: 1115361 mS: 404
  8. Ticks: 1112813 mS: 403
  9. Ticks: 1113112 mS: 403
  10. Ticks: 1112012 mS: 402 <-- lowest Value
  11. Ticks: 1330444 mS: 482
  12. Ticks: 1558493 mS: 564
  13. Ticks: 1501097 mS: 543
  14. Ticks: 1517796 mS: 549
  15. Ticks: 1542712 mS: 558
  16. Ticks: 1574959 mS: 570
  17. Ticks: 1483975 mS: 537
  18. Ticks: 1390578 mS: 503
  19. Ticks: 1546904 mS: 560
  20. Ticks: 1349507 mS: 488

The run time differs between 402 mS and 572 mS. That is a difference of 170 mS or 42%. These results are useless.

With Correct Preparation

  1. Ticks: 1110518 mS: 402
  2. Ticks: 1110540 mS: 402
  3. Ticks: 1110543 mS: 402
  4. Ticks: 1110684 mS: 402 <-- highest Value
  5. Ticks: 1110508 mS: 402
  6. Ticks: 1110553 mS: 402
  7. Ticks: 1110600 mS: 402
  8. Ticks: 1110433 mS: 402 <-- lowest Value
  9. Ticks: 1110509 mS: 402
  10. Ticks: 1110508 mS: 402
  11. Ticks: 1110489 mS: 402
  12. Ticks: 1110568 mS: 402
  13. Ticks: 1110503 mS: 402
  14. Ticks: 1110566 mS: 402
  15. Ticks: 1110625 mS: 402
  16. Ticks: 1110474 mS: 402
  17. Ticks: 1110571 mS: 402
  18. Ticks: 1110448 mS: 402
  19. Ticks: 1110555 mS: 402
  20. Ticks: 1110495 mS: 402

The 20 samples produce the same result of 402 mS and can only be distinguished by the tick count (the internal CPU performance counter value). The difference is 251 ticks or 0,02 %. On my system, the Stopwatch frequency is 2760029 ticks per second. The difference between the test runs is only 0,09 mS. That is really good and can be used to measure and compare algorithm runtime.

Points of Interest

One very important thing should be kept in mind. The best (lowest) value without preparation isn't as good as the worst value with preparation. CPU context and core switches have an huge impact to the run time of applications.

Links and Additional Resources

This article is one of the performance tests that can be found on benchmarks.codeplex.com. These are research for the "Second WAF" web application framework.

History

  • 28th February, 2010: Initial post

License

This article, along with any associated source code and files, is licensed under The BSD License