Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / CUDA

Cudafy Me: Part 1 of 4

5.00/5 (1 vote)
12 Oct 2012CPOL5 min read 12.1K  
These posts are meant to inspire you to enter into the world of graphics processor programming.

As seen in GPU Science and CUDA Week in Review

These posts are meant to inspire you to enter into the world of graphics processor programming.

CUDALibs

Posts in This Series

  • Part 1 – Introduction & Single Threaded CPU Solution
  • Part 2 – Multi-Threaded CPU Solution
  • Part 3 – Massively Parallel GPGPU Solution
  • Part 4 – Discussion

    Full Source Code

    http://cudafytsp.codeplex.com/

    Introduction

    It may come as a surprise to many that we only use 10% of our brain capacity, regardless of what Snopes says. It is also true that we use only 10% of the computing power of our PCs. This is because our programs are not written to utilize our computer’s “Cudabral Cortex” (commonly called the GPGPU). It is time for us as developers to unleash 100% of our computer’s potential by tapping the GPGPU. By doing this, we can in some cases increase the performance of our programs by a factor of 10 or even 100.

    Yet your hesitation to consider using the GPGPU may come from an innate fear of the learning curve involved. Yet the learning curve has recently been straightened out by a wonderful library called Cudafy. Cudafy allows methods written in C# to be executed on the GPGPU. This series of posts will meander down the path of creating a massively parallel program and explain some of the details along the way. After reading these posts, you will see the world as a massively parallel phenomena just begging to be modeled.

    The Problem

    Before diving into GPGPU programming, I want to first present the challenge we will be solving. It is called the Traveling Salesman Problem. Our program will be given the coordinates of several cities. Our task is to find the shortest path that connects all the cities. To keep the problem simple we will assume a flat earth, that 1 degree of latitude equals 1 degree of longitude, and that we will not constrain which city we start in or end up at. To further simplify the task, we will use a brute force method. That is, we will test the length of every permutation of cities to find the shortest one. So, if we are given 5 cities, we will be computing the length of 120 (5!) different paths; if we are given 10 cities, we will be testing 3628800 (10!) paths.

    The Method

    We will review code that solves this problem in three ways:

    1. A single-threaded application on the CPU,
    2. A multi-threaded application on a multi-core CPU, and
    3. A massively parallel application on the GPGPU.

    As you examine the CPU-based solutions, you will no doubt notice that parameters are passed to functions that seem unneeded and that the algorithm could be optimized in many ways. These may be legitimate concerns on your part, especially if you are itching to challenge my inevitable benchmark comparisons or to delve into the world of heuristic algorithms. Yet, my goal is not to compare a performance-crafted CPU solution with a performance-crafted GPGPU solution. Instead, my goal is to show the same problem being solved in exactly the same way across all three solutions computing platforms. Where you take it from here is your business.

    PathFromRoutePermutation

    Here is a function that will be used in all three solutions. It calculates the sequence for given a permutation number. This function will be called for each permutation to retrieve the order in which the cities could be visited by our salesman.

    public static float PathFromRoutePermutation(int permutations, int permutation, int cities, int[,] paths, int pathIndex)
    {
        for (var i = 0; i < cities; i++)
        {
            paths[pathIndex, i] = i;
        }

        for (int remaining = cities, divisor = permutations; remaining > 0; remaining--) // http://www.daniweb.com/software-development/cpp/code/274075/all-permutations-non-recursive
        {
            divisor /= remaining;
            var index = (permutation / divisor) % remaining;

            var swap = paths[pathIndex, index];
            paths[pathIndex, index] = paths[pathIndex, remaining - 1];
            paths[pathIndex, remaining - 1] = swap;
        }

        return 0;
    }

    This function takes a permutation number and fills in the paths array with the sequence of city indexes that make up the path. For example, for 5 cities, permutation 0, yields the path (1,2,3,4,0). Permutation 40 yields the path (4,0,3,2,1):

    Untitled     Untitled

    Ignore the fact that this function returns a 0 and that the parameter “paths” is provided as a two-dimensional array (instead of a single dimensional array).

    FindPathDistance

    Here is another function that will be used in all three solutions. It calculates the distance for given a permutation number. This function will be called for each permutation to retrieve the length that could be traveled by our salesman.

    public static float FindPathDistance(int permutations, int permutation, int cities, float[] latitudes, float[] longitudes, int[,] paths, int pathIndex)
    {
        PathFromRoutePermutation(permutations, permutation, cities, paths, pathIndex);

        float distance = 0;
        var city = paths[pathIndex, 0];
        var previousLatitude = latitudes[city];
        var previousLongitude = longitudes[city];

        for (var i = 1; i < cities; i++)
        {
            city = paths[pathIndex, i];
            var latitude = latitudes[city];
            var longitude = longitudes[city];
            distance += (float)Math.Sqrt(Square(latitude - previousLatitude) * Square(longitude - previousLongitude));
            previousLatitude = latitude;
            previousLongitude = longitude;
        }

        return distance;
    }

    This function takes a permutation number, and then armed with the order in which the cities could be visited from PathFromRoutePermutation, computes (and returns) the total distance of the given permutation.

    The Single-Threaded CPU Solution

    Finally, here is the code that iterates through all the permutations and finds the one with the shortest distance as a single threaded CPU solution.

    private Answer CpuTsp()
    {
        var bestDistance = float.MaxValue;
        var bestPermutation = -1;
        var stopWatch = Stopwatch.StartNew();

        for (var permutation = 0; permutation < _permutations; permutation++)
        {
            var path = new int[1, Cities];
            var distance = FindPathDistance(_permutations, permutation, Cities, _latitudes, _longitudes, path, 0);
            if (distance < bestDistance)
            {
                bestDistance = distance;
                bestPermutation = permutation;
            }
        }

        return new Answer { Distance = bestDistance, Permutation = bestPermutation, Milliseconds = stopWatch.ElapsedMilliseconds };
    }

    Next Step

    Our next step is to write a method that takes advantage of the code we have already written and runs it in parallel on a multi-core CPU.

    >>> Part 2 – Multi-Threaded CPU Solution

  • License

    This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)