Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

C# Native Interop: Methods and Performance

0.00/5 (No votes)
19 Feb 2013 2  
Introduction to C#/C++ interoperability, and a performance evaluation.

Introduction

C# is a great language. It allows you to be extremely productive and efficient by eliminating the need for manual memory management, and providing fast compile times, an extensive standard library, and various other handy features. However, for applications requiring heavy number crunching, its performance can be less than adequate. In this article, I will show you how C# can call into C++ functions when necessary, and provide an analysis of its performance.

The Problem

Given a large number of random rectangles lying between (0, 0) and (2, 2), let us find the percentage of these rectangles that lie between (0, 0) and (1, 1). We will solve this problem using brute force in order to stress the CPU. The code for this algorithm is pretty straightforward. Basically, we generate four random numbers in the interval (0, 2) for each rectangle, and assign them to the corner points. Then we count how many of these rectangles lie in the desired interval. All tests are carried out with 10 million rectangles on a q6600 with 4gb ram.

C# Reference Solution

This is all C#, and is used as a reference to measure relative performance. For 10 million rectangles, this method takes around 146 ms.

Interop Take 1: Marshaling

This is the easiest way to do interop. The C# side code for this is:

[DllImport("DllFuncs.dll", CallingConvention = CallingConvention.Cdecl, EntryPoint= "nativef")]
public static extern float getPercentBBMarshal(
    [MarshalAs(UnmanagedType.LPArray, SizeParamIndex = 1)] BBox[] boxes, int size); 

Basically, this tells the runtime to look for a function named nativef using the calling convention cdecl in the native library called "DllFuncs.dll". It also tells the runtime to transparently convert the C# array of BBoxes into a C++ array. We need to pass the size, as arrays in C++ are unaware of their length. The corresponding C++ function is this:

struct BBox 
{ 
float x1, y1, x2, y2;
    int isValid()
    {
        return (x1 < 1) && (x2 < 1) && (y1 < 1) && (y2 < 1) && (x1 > 0) &&
            (x2 > 0) && (y1 > 0) && (y2 > 0);
    }
};
 
__declspec(dllexport) float __cdecl nativef(BBox * boxes, int size)
{
    int sum = 0;
    for (int i = 0; i < size; i++)
    {
        sum+= boxes[i].isValid();
    }
    return (float)sum/(float)size * 100;
}  

Marshaling Performance

Measuring the performance of this function with a timer, we see an interesting result. The native function takes 341 ms for 10 million elements, which is around Twice the time taken by the C# equivalent! Moreover, for 1000 elements, marshaling takes 239.3 ms, which is way above the 0.054 ms taken by pure C#. Surely, marshaling is adding a huge overhead, the relative importance of which is diminishing with the amount of work. To know where this overhead comes from, we need to know how marshaling works:

  • Allocate a C++ equivalent of the C# array passed to the function.
  • Copy values for the C# array to the C++ array.
  • Call the C++ function.
  • Copy the return value of the C++ function into the C# equivalent.
  • Return control to the C# assembly.

Now, we can easily see the reason for bad performance. We are essentially allocating 10 million rectangles and copying the values of each of these. That is allocating and moving around 160 MB of data! No wonder performance is horrible. You may be asking yourself why the whole exercise of copying is necessary in the first place. There are two reasons for this:

  • The Memory Layout of a structure in C# may not match that of the same structure in C++. Hence, a simple pointer assignment may not do the trick, as C++ may interpret this memory area differently than C#.
  • The Garbage Collector in C# is free to move data physically in memory, in order to do compacting garbage collection. Hence, a pointer passed from C# to C++ may not be valid by the time control reaches C++, as the GC may have already moved the underlying memory to another physical location.

So, is it possible to get around these problems? Let's find out!

Interop Take 2: Direct pointer access

  • The Memory Layout problem is simple to deal with. It's just a matter of telling the runtime to lay out the structure in memory just like C++ would. C++ lays out the data sequentially using certain alignment rules which can be mimicked by C# using the following:
  • [StructLayout(LayoutKind.Sequential)]
    struct BBox
    {
        public float x1, y1, x2, y2; //Corner points of the rectangle
    }
  • The Garbage Collector problem can be taken care of using the fixed statement. This statement makes sure that memory is not moved around by the GC for the lifetime of the statement. The fixed statement can only be used form unsafe functions in assemblies compiled with the /unsafe option.
  • [DllImport("DllFuncs.dll", CallingConvention = CallingConvention.Cdecl)]
    public static extern unsafe float nativef(IntPtr p, int size);
     
    public static unsafe float getPercentBBInterop(BBox[] boxes)
    {
        float result;
        fixed (BBox* p = boxes)
        {
            result = nativef((IntPtr)p, boxes.Length);
        }
     
        return result;
    }

Pointer Access Performance

The function returns in 115 ms, which is around 26% faster that the C# equivalent. The performance gain is likely to increase with the complexity of the functions delegated to native code.

Performance Numbers

Charts displaying the performance with respect to various numbers of elements processed are shown below:

Sample Code

The source code for this is hosted on Github:

Make sure to check it out.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here