Big O Algorithm Analyzer for .NET

dawright

4.88/5 (20 votes)

16 Feb 2010CPOL10 min read

119.7K

1.7K

A heurisitc graphing tool to help discover 'Big O Notation' function thru infinite asymptotic's and instrumentation.

This Developer centered article with an index rating 5 out of 5

Article Index Rating:
- Developer            (5)     of 5      (Intermediate)
- Architect             (1.67) of 5     (Beginner)
- Graphic Designer   (3.33) of 5     (Intermediate)

Download Article_Demo.zip - 64.83 KB

Download Article_Src.zip - 110.18 KB

Introduction
Mathematical Approach for finding Big 'O'
About the Code
About the Examples
Points of Interest
Prototype Research
Revision history
Conclusion and next phase

Introduction

After doing some research on 'Big O Notation' I read some posts from some developers explaining that you have to do 'Big O Equations' in your head. Perhaps they were pulling some ones leg? I don't know. But it gave me the idea to create this tool, which can help to find the 'Big O function' for a given algorithm. The application uses C# Reflections to gain access to the target .net assembly. Infinite asymptotic's and instrumentation are used to graph the function in relationship to time. When using infinite asymptotic's it is necessary to create a method or property which takes an integer as input. The integer value is the infinity point, the point at which the function evaluates to an infinite quantity. This point can be time, value, etc. In the examples Fibonacci functions are evaluated.

Here is a graph of Two linear recursive Fib's:

Here is the original prototype graphing an exponential recursive Fib:

Image 1 - Exponential Fibonacci

I used Derek Bartram's WPFGraph in the Demo which is not licensed for commercial or non profit. If you need to use this tool for commercial or non profit, you can extend the functionality of the prototype. There are a few minor glitches with my implementation of Derek's WPFGraph. Major problem is with the performance scaling. In my prototype I use modulus scaling which is very exacting. I tried to use the same scaling with Derek's graph, however the performance grid lines don't exactly match the points on the grid. If you need exact grid lines and fast performance extend the prototype.

Image 2 - Example of the prototype displaying O(n log n) Fibonacci

The application is broiler plate, this should give you the ability to extend the features to your own specification. Here is the same graph using Derek's graph:

Image 3 - Modified graph with grid lines and power graph

Another minor glitch is the time base jump at the start of the graph line. Graphing with high infinity values improves this, however a threading fix is probably necessary. I have a request into MSDN for a fix. I will post the fix once I get it.

The tool uses heuristic to find the 'best guess' Big O Function:

Image 4 - The heuristic graph can also be displayed

And allows the user to review the details:

Image 5 - The calculated results from the heuristic

Special thanks to Derek for all his hard work! Great Graph!

Mathematical Approach:

The mathematical approach to fining the Big 'O' is to mark up the algorithm using function indexes. The is the 'it takes brains' method of finding the solution the developers were discussing in the online help thread. Here is an example I found on the internet which illustrates the 'mark-up' and calculation.

int foo(int n)
{
  int p = 1;	//c1	  x	1
  int i = 1;	//c1	  x	1
  while(i < n)	//c2	  x	n
  {
    int j = 1;	//c1	  x	(n - 1)
    
    while(j < i)//c2	  x	((1/2)n^2 - (3/2)n + 2)
    {
      p = p * j;//(c1+c3) x	((1/2)n^2 - (3/2)n + 1)
      j = j + 1;//(c1+c4) x	((1/2)n^2 - (3/2)n + 1)
    }
    i = i + 1;	//(c2+c4) x	(n - 1)
  }
  return p;	//c5	  x	1
}

Where:
c1 = Assignment
c2 = Comparison
c3 = Multiplication
c4 = Addition
c5 = Return

The mathematical equation is simplified:

Step 1:
(c1 + 1/2c2 + 1/2c3 + 1/2c3)n^2 +
(-c1 - 1/2c2 - 3/2c3 - 1/2c4)n +
(2c1 + 2c2 + c3 + c5)

Step 2:
(c1 + 1/2c2 + 1/2c3 + 1/2c3)n^2

Step 3:
n^2

Therefore the Big 'O' for this foo algorithm is n^2 or quadratic.

The full example can be viewed at:

Big O Notation - CS Animated

About the Code:

The application uses the Model, Viewer, Presenter; Factory, and Publisher / Subscriber patterns. The viewer is the XAML Window which allows for easy design by designers, the presenter is implemented in the xaml.cs of the window. The factory and publisher-subscriber pattern is implemented in the AnalysisFactory. The MVP pattern is a good stepping stone in prototype development as it is easy to make custom abstractions with out sacrificing code base.

About the Examples:

The examples featured in this demo use the Fibonacci sequence to illustrate different methods of calculating the Fibonacci number for a given index. The examples are to be used with the infinite asymptotic approach in the future this article will be updated with examples using the instrumented approach.

The asymptotic is a 'running time' asymptotic not an 'operation count' asymptotic. This is important to understand. Mathematics and computer code can both be represented using Big 'O' notation for operation count which are evaluated differently then currently featured in this article. In the future an addition will decompile the reference code and calculate both the operational count and the running time. The running time does offer some 'real world' values which would not be discovered suing the operation counting method. For example: The code under test has an algorithm running in log time and at the surface the algorithm might appear to be linear. The operating system, and the application background process can also affect real world results.

To create a simple asymptotic test all that is required is that the method invocation be publicly visible and that it take a single argument which is the number of iterations to run. The code performing the logic uses .NET Reflections to gain access to the assembly and display the public methods.

Here are some examples of the Fibonacci algorithms used in the example:

Figure 1 - Closed Form Fib Class

class fib
{
    double n = 0;
    int n2 = 0;
    public double next
    {
        get
        {
            return n;
        }

        set
        {
            n = Math.Floor(((Math.Pow(fib.golden(),
                Convert.ToDouble(value))) -
                Math.Pow((1 - fib.golden()),
                Convert.ToDouble(value))) / Math.Sqrt(5));
        }
    }

    private static double golden()
    {
        return (1 + Math.Sqrt(5)) / 2;

    }
}

Figure 2 - Closed Form Test Code O(1)

public static double ClosedFormFib(double n)
{
    fib f = new fib();
    f.next = n;
    return f.next;
}

Figure 3 - Linear Fib - Actually O(n log n)

The author who posted this algorithm incorrectly marked it as a linear solution when in fact it runs in log time.

public static double LinerFib(double n)
{
    double previous = -1;
    double result = 1;
    double sum = 0;

    for (double i = 0; i <= n; ++i)
    {
        sum = result + previous;
        previous = result;
        result = sum;
    }
    return result;
}

Figure 4 - Linear Recursive Fib - Again O(n log n)

public static double LinerRecursiveFib(double n)
{
    return Fib(n, 1, 1);
}

private static double Fib(double n, double n1, double n2)
{
    if (n == 1 || n == 2)
    {
        return 1;
    }
    else if (n == 3)
    {
        return n1 + n2;
    }
    else
    {
        return Fib(n - 1, n1 + n2, n1);
    }
}

Figure 5 - Exponential Fib - O(n^2)

public static int ExponentialRecursiveFib(double n)
{
    if (n == 1 || n == 2)
        return 1;
    else
        return ExponentialRecursiveFib(n - 1) +
               ExponentialRecursiveFib(n - 2);
}

At this point you may be thinking 'well this is good if the algorithm takes an integer as input for loading. What about other algorithms which don't such as sorting etc.' To answer this question first the data type and algorithm must first be considered. For example a string array bubble sort. For testing this algorithm using asymptotic approach I would create a constructor in my class which will be tested. Something like:

Figure 6 - Testing non-numeric data types and algorithms

ArrayList al = new ArrayList();

void LoadMe(){ LoadMe(1000); }

void LoadMe(double d)
{
  //create random string elements and add to string array
  //add each element to the ArrayList.
}

public static double TestBubbleSort(double n)
{
  //Get each string array from ArrayList then call the bubble sort
  //on the string array.
}

This method will work for loading and testing with out having to custom instrument the test code. The code of data load the ArrayList should be incurred at start up, however the size must be hard coded into the test code.

Points of Interest:

Interesting things about heuristics

The heuristics uses the Meta.Numerics.Statistics class library to find a strong correlation between the items in two series. Series 1 (test results) contains the values of the test run in actual form and value. Series 2 (control group) is a series of values from the equation in form of the following:

Figure 7 - Big 'O' function series generation

//O(n log n)
static double nLogn(int n)
{
    double r = Math.Log(n) * n;

    if (r.ToString() == "NaN" ||
        r.ToString().IndexOf("Infinity")
        != -1)
        return 0;
    else
        return r;
}

public static DecoratedMarker nLognLine(int n)
{
    double[] ret = new double[n];

    for (int y = 0; y < n; y++)
        ret[y] = Functions.nLogn(y);

    return new DecoratedMarker(ret, "O(n log n)", null);
}

The above is computed for each Big 'O' function under test. A function library can be found in the solution and is easily updated. The exact statistical formula used is 'Person's Rio'. More information on Person's Rio can be found on other wiki's. Other formulas could also be used. Some ideas I have considered are: Finding the difference under the arch of the curve formula, correlation of ratios, etc. The statistical package placed in its own class so additional packages can easily be loaded or overloaded.

Interesting things about the graph

I really don't totally understand the magic of modulus. I do value it's power though. I had to tinker with the modulus equations to get the scaling functions to work in the prototype. Perhaps I need to brush up on modulus. Here are the modulus functions for scaling:

((dm.Point[1] * PointScale % infinityPoint) / 10) * PointRullerDivision; //Gets the vertical scaling.

(aggT * TimeScale % totTime) / 10 * TimeScaleDivision; //Gets the horizontal scaling.

PointScale is calculated by dividing 100 by infinityPoint (this is an inverse function), making the function 100 base. PointRullerDivision is calculated by dividing the graph Canvas by 100 (compliment to inverse function). The horizontal (x axis, time) is caculate the same way, however the aggregate time ; aggT, is used in place of the point index dm.Point[1]. Total time is calculated using a LINQ to Object lambda function (also pure magic as far as I know).

Math.Round(dms.Sum(p => p.Point[0]),0)

I thought it was interesting that after refactoring the prototype to use Derek's graph the modulus functions still worked. Derek's scaling calculates the scale by dividing sections of the scale in a recursive loop. I did run into a small problem with a rounding error after refactoring. Sometimes it's the little things that make a big difference. Rounding the totTime to Zero decimal places and not rounding the aggT causes a displacement in the performance graph where the x axis grid lines do not match a point on the graph, but will not work with Derek's graph unless the total time is rounded. The prototype works perfectly.

Another point of interest is Derek's grid points. If you get the rendered geometry of one of his grid points, the bounds evaluates to infinity. I tried all methods of getting the position to correct the rounding error. Nothing worked! Must be magic. So the featured application has many workarounds for the integration of the WPFGraph.

The Power Graph

The power graph is a what I think of as a clever way to visualize the performance of the algorithm under test. The formula used is only a weighted metric.

Pseud Code

If unit n-1 is two times n
 then color value = color value - 10

this does of coarse have a dependency on the grid lines which are equally spaced vertically in increments of 10 and then asymmetrically spaced on the horizontal according to where the vertical line and the new point cross x,y axis. This produces a visual which has does not distort the mathematical formula of the graph Ie. Linear, Log, or Exponential.

Conclusion and the Next Phase:

In conclusion I discovered this research is unique to what I have done and demonstrated. While it is currently causing controversy I believe it is a significant advancement in where the 'Rubber Meets the Road'. Mathematical proofs for algorithms can be useful, but when engineering is concerned the proof doesn't meet the pudding! This is often the case in theory and development and engineering. It is my hope that this tool will become useful to mathematicians and engineers for establishing real and live estimations of how an algorithm runs in real time. I plan to complete this prototype and add features that allow an algorithm to be written with out the use of Visual Studio, right in the application it self. I will also be converting the MVP pattern to MVVM. And take a further look at the heuristic model being used, which is what seems to be causing controversy. If anyone with expertise would like to join me in this venture just let me know. The final product will be posted on The CodePlex, and have follow up articles here on The Code Project to demonstrate it's usefulness, and areas that it is not acceptable for use. That is it for now this article is a wrap!

External Links:

Prototype Research

- Graphic Designer (3.33)

- Developer (5)

- Architect (1.67)

Revision History:

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)