Abstract
This article presents an unsophisticated class for comparing the raw speed of two programming design alternatives. It is not a scientific benchmarking exercise, but is a very simple way of implementing two coded processes to gauge which is probably faster and approximately how much faster. The class allows the comparison with a minimum of infrastructure coding, it is not rocket-science.
Contents
Introduction
In the process of writing some imaging filter methods, I continually found myself in the position of having to optimise my code for raw speed. Although the approach I had taken was fundamentally quite an efficient one, the need to regularly process more than six million pixels made speed paramount, sometimes at the expense of transparency and good programming practice. The simple process of converting a 24-bit image to 8-bit indexed grayscale, then finding edges meant 20 million pixel manipulations. One microsecond gained per pixel saved 20 seconds of processing time.
Routine practices soon became critical decisions, for example: do I protect a class member/field (like a pixel array) and access it via a read-write property, or do I make it public and allow read-write directly to the data? Until I knew what the time penalty was for access via a property, I had no more than a strong suspicion that reading-writing directly would be slightly faster, and even less idea of how much faster it might be.
Eventually I had such a backlog of outstanding comparisons to make that I wrote a simple abstract class that enabled me to get through them as quickly as possible with sufficiently scientific results that I could choose which options to use.
Some surprising results
Well, they were to me anyway...
I was not interested in any rigorous scientific comparison that would withstand serious academic scrutiny, I was merely interested in making some design choices that would result in a shorter wait for some imaging filter to execute. Some of the differences in the tests I ran were a little more pronounced than I expected, some quite surprising.
Example - property versus direct member access
An example; direct, unprotected read access to class members is about five times as efficient as the same access via a property. I did not test write-access; the test outcome was sufficient evidence for me to make several class members public.
Later I discovered, to my surprise, that if the "Release" version of the executable is run, the outcome is indeterminate - one pass gives property access the advantage, another direct access. My conclusion is that either will do, and my assumption is that the optimiser builds similar code for both in the Release binary. I have included two executables in the download, SpeedTests_Debug.exe and SpeedTests_Release.exe, so you can see the disparity for yourself.
Example - recursion versus stack-and-loop
Another example: a simple recursive method surprisingly seemed ten times as efficient as pushing a value on a .NET Stack
instance and looping. I did not test a hand-crafted stack which may have led to different results; I just went with the recursive method.
Class SpeedTestsAB
The speed test runs a rudimentary control method (SpeedTestControl
) which is simply an empty method to provide an overhead metric which can be subtracted from the total time of each test to provide a net running time.
The abstract class SpeedTestsAB
defines four abstract
/MustOverride
methods:
Method |
Description |
SetUpTestA
|
Initialization for SpeedTestA, for example:
C#: base.f_DescribeA = "Access a public field directly.";
|
SetUpTestB
|
Initialisation for SpeedTestB, for example:
VB: me.f_DescribeB = "Access a field via a property."
|
SpeedTestA
|
The speed-test A code to execute.
|
SpeedTestB
|
The speed-test B code to execute.
|
In addition, there are methods and properties that enable simple reporting of the results.
Property |
Description |
TotalTimeA
TotalTimeB
TotalTimeControl
|
The total times taken for each test.
|
NetTimeA
NetTimeB
|
The net time taken for each test - TotalTime - TotalTimeControl .
|
Repetitions
|
The number of repetitions.
|
Method
|
Description
|
Results
|
Returns a very simple results report string.
|
AppendResults
|
Appends the result string to a file.
|
ShowResults
|
Shows the result string in a MessageBox .
|
WriteResults
|
Writes the result string to a file or stream.
|
The resulting output looks like:
Test results:
10,000,000 repetitions.
Test A: Access a public field directly.
Test B: Access a field via a property.
00:00:00.0937500 hh:mm:ss.ff Equivalent Elapsed Time Control Process.
00:00:00.1406250 hh:mm:ss.ff Total Elapsed Time Process A.
00:00:00.1718750 hh:mm:ss.ff Total Elapsed Time Process B. 00:00:00.0468750 hh:mm:ss.ff
Net Elapsed Time Process A.
00:00:00.0625000 hh:mm:ss.ff Net Elapsed Time Process B.
Net Unit Processing Time A: 4.688 nanosecs
Net Unit Processing Time B: 6.250 nanosecs
75.000% Percentage: Process A divided by Process B.
Using SpeedTestsAB
Create a class that inherits clsBase
SpeedTestsAB
, e.g.:
Public Class clsSpeedTestAB_Properties _
Inherits clsBaseSpeedTestAB
Create a code section that defines properties, methods, and data required to run the tests, e.g.:
#Region "[=== SPEED TEST COMPONENTS ===]"
Protected f_SomeInteger As Int32 = 123456
Public Property SomeInteger() As Int32
Get
Return Me.f_SomeInteger
End Get
Set(ByVal value As Int32)
Me.f_SomeInteger = value
End Set
End Property
#End Region
Override the methods SetUpTestA
and SetUpTestB
with, at least, the description of the tests, e.g.:
Protected Overrides Sub SetUpTestB()
Me.f_DescribeB = "Access a field via a property."
End Sub
Override the methods SpeedTestA
and SpeedTestB
with the code to be tested, e.g.:
Protected Overrides Sub SpeedTestA()
Dim xInt As Int32
xInt = Me.f_SomeInteger
End Sub
Protected Overrides Sub SpeedTestB()
Dim xInt As Int32
xInt = Me.SomeInteger
End Sub
Define the class somewhere and call the RunTest
method, e.g.:
Private Sub Button1_Click(ByVal sender As System.Object, _
ByVal e As System.EventArgs) Handles Button3.Click
Dim xTest As New clsSpeedTestAB_Properties(10000000)
xTest.RunTest()
xTest.ShowResults
End Sub
The downloads
The downloads are in four packs:
Pack
|
Contents
|
SpeedTests_Src_CS.zip
|
C# source code for the demonstration, including:
- clsBaseSpeedTestAB.cs
- clsSpeedTestAB_Delegates.cs
- clsSpeedTestAB_Properties.cs
- clsSpeedTestAB_Recursion.cs
- Form1.cs
Please note that the C# source code is the disassembler output from Lutz Roeder's .NET Reflector v5.1, and not the result of my porting the source code. As a consequence, there may be some errors in the code as the disassembly created several which I have tried to fix.
|
SpeedTests_Src_VB.zip
|
VB.NET source code for the demonstration, including:
- clsBaseSpeedTestAB.vb
- clsSpeedTestAB_Delegates.vb
- clsSpeedTestAB_Properties.vb
- clsSpeedTestAB_Recursion.vb
- Form1.vb
|
SpeedTests_EXE.zip
|
The two speed test compiles, one in debug mode (SpeedTests_Debug.exe), and one in release mode (SpeedTests_Release.exe).
|
SpeedTests_Article.zip
|
This article.
|
Points to note
This is not an example of proper scientific speed benchmarking, but is only a mechanism for making a choice as to which code design to use when raw speed is of primary importance.
There is often quite a difference between the efficiency/speed of code which is compiled in Debug mode when compared to the same code compiled in Release mode. Assuming that the final version of your software will be compiled in Release mode, I recommend speed testing in that mode. It is of interest, however, to compare execution speed in the two modes.
Sample test results
(In Debug mode.)
These results are from an earlier incarnation of the class and were actively used in making image coding design decisions. The differences may disappear when tested with a Release compiled version.
Speed test: If, ElseIf compared with Select Case
Public Overrides Sub SpeedTestA()
If Me.f_Int32 = 0 Then
Me.f_Int32 = 1
ElseIf Me.f_Int32 = 1 Then
Me.f_Int32 = 1
ElseIf Me.f_Int32 = 2 Then
Me.f_Int32 = 2
ElseIf Me.f_Int32 = 3 Then
Me.f_Int32 = 3
ElseIf Me.f_Int32 = 4 Then
Me.f_Int32 = 4
ElseIf Me.f_Int32 = 5 Then
Me.f_Int32 = 5
ElseIf Me.f_Int32 = 6 Then
Me.f_Int32 = 6
ElseIf Me.f_Int32 = 7 Then
Me.f_Int32 = 7
ElseIf Me.f_Int32 = 8 Then
Me.f_Int32 = 8
ElseIf Me.f_Int32 = 9 Then
Me.f_Int32 = 9
End If
End Sub
Public Overrides Sub SpeedTestB()
Select Case Me.f_Int32
Case 0
Me.f_Int32 = 0
Case 1
Me.f_Int32 = 1
Case 2
Me.f_Int32 = 2
Case 3
Me.f_Int32 = 3
Case 4
Me.f_Int32 = 4
Case 5
Me.f_Int32 = 5
Case 6
Me.f_Int32 = 6
Case 7
Me.f_Int32 = 7
Case 8
Me.f_Int32 = 8
Case 9
Me.f_Int32 = 9
End Select
End Sub
Protected f_Int32 As Int32
Test results:
600,000,000 repetitions.
Test A: Use if, elseif.
Test B: Use select case.
00:00:07.4270833 hh:mm:ss.ff Equivalent Elapsed Time Control Process.
00:00:10.0312500 hh:mm:ss.ff Total Elapsed Time Process A.
00:00:09.2968750 hh:mm:ss.ff Total Elapsed Time Process B.
00:00:02.6041667 hh:mm:ss.ff Net Elapsed Time Process A.
00:00:01.8697917 hh:mm:ss.ff Net Elapsed Time Process B.
Net Unit Processing Time A: 2.604 secs
Net Unit Processing Time B: 1.870 secs
139.276% Percentage: Process A divided by Process B.
Conclusion
Select Case
may be approximately 40% faster than If
, ElseIf
.
Speed test: If, ElseIf compared with nested IIF
Public Overrides Sub SpeedTestA()
If Me.f_Int32 = 0 Then
Me.f_Int32 = 1
ElseIf Me.f_Int32 = 1 Then
Me.f_Int32 = 1
ElseIf Me.f_Int32 = 2 Then
Me.f_Int32 = 2
ElseIf Me.f_Int32 = 3 Then
Me.f_Int32 = 3
ElseIf Me.f_Int32 = 4 Then
Me.f_Int32 = 4
ElseIf Me.f_Int32 = 5 Then
Me.f_Int32 = 5
ElseIf Me.f_Int32 = 6 Then
Me.f_Int32 = 6
ElseIf Me.f_Int32 = 7 Then
Me.f_Int32 = 7
ElseIf Me.f_Int32 = 8 Then
Me.f_Int32 = 8
ElseIf Me.f_Int32 = 9 Then
Me.f_Int32 = 9
End If
End Sub
Public Overrides Sub SpeedTestB()
IIf(Me.f_Int32 = 0, Me.f_Int32 = 0 _
, IIf(Me.f_Int32 = 1, Me.f_Int32 = 1 _
, IIf(Me.f_Int32 = 2, Me.f_Int32 = 2 _
, IIf(Me.f_Int32 = 3, Me.f_Int32 = 3 _
, IIf(Me.f_Int32 = 4, Me.f_Int32 = 4 _
, IIf(Me.f_Int32 = 5, Me.f_Int32 = 5 _
, IIf(Me.f_Int32 = 6, Me.f_Int32 = 6 _
, IIf(Me.f_Int32 = 7, Me.f_Int32 = 7 _
, IIf(Me.f_Int32 = 8, Me.f_Int32 = 8 _
, IIf(Me.f_Int32 = 9, Me.f_Int32 = 9 _
, Me.f_Int32 = 9))))))))))
End Sub
Protected f_Int32 As Int32 = -1
Test results:
100,000,000 repetitions.
Test A: Use if, elseif.
Test B: Use nested IIF.
00:00:01.2343750 hh:mm:ss.ff Equivalent Elapsed Time Control Process.
00:00:03.2812500 hh:mm:ss.ff Total Elapsed Time Process A.
00:00:39.7500000 hh:mm:ss.ff Total Elapsed Time Process B.
00:00:02.0468750 hh:mm:ss.ff Net Elapsed Time Process A.
00:00:38.5156250 hh:mm:ss.ff Net Elapsed Time Process B.
Net Unit Processing Time A: 2.047 secs
Net Unit Processing Time B: 38.516 secs
5.314% Percentage: Process A divided by Process B.
Conclusion
Do not use IIF
.
Speed test: Compare Shift-Right 4 (X >> 4) with Divide by 16
Public Overrides Sub SpeedTestA()
Dim xInt As Int32 = _
CInt((((((((Me.f_Int >> 4) >> 4) >> 4) >> 4) >> 4) >> 4) >> 4) >> 4)
End Sub
Public Overrides Sub SpeedTestB()
Dim xInt As Int32 = _
CInt((((((((Me.f_Int / 16 / 16 / 16 / 16 / 16 / 16 / 16 / 16)
End Sub
Protected f_Int As Int32 = 123456
Test results:
100,000,000 repetitions.
Test A: Shift-Right 4.
Test B: Divide by 16.
00:00:01.2500000 hh:mm:ss.ff Equivalent Elapsed Time Control Process.
00:00:01.3593750 hh:mm:ss.ff Total Elapsed Time Process A.
00:00:20.9843750 hh:mm:ss.ff Total Elapsed Time Process B.
00:00:00.1093750 hh:mm:ss.ff Net Elapsed Time Process A.
00:00:19.7343750 hh:mm:ss.ff Net Elapsed Time Process B.
Net Unit Processing Time A: 109.375 millisecs
Net Unit Processing Time B: 19,734.375 millisecs
0.554% Percentage: Process A divided by Process B.
Conclusion
Getting close to 200 times as fast to shift rather than divide.
History