Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C#

Reading Unmanaged Data Into Structures

4.93/5 (34 votes)
8 May 2008CPOL9 min read 2   1.1K  
In this article, we will look into reading data from an unmanaged array of bytes into a managed data structure. We will use multiple approaches to optimize the process.

Introduction

We live in a less than perfect world. If the world were perfect, we could spend our entire time writing managed code and interacting with managed components. (Well, if it were really perfect, we could be sipping tropical cocktails on the ocean beach, but I digress.)

The world that we live in often requires interaction with unmanaged components. Lots has been written on the subject of P/Invoke, COM interop, C++/CLI -- the trifecta of interoperability solutions. In this article, we will look into an apparently trivial scenario. All we are going to do is copy data from one place to another.

Problem Statement

Assume that you have been given an array of bytes containing some structured information. For example, it might be a data structure corresponding to some kind of network packet. You need to parse this array of bytes into a representation that will make it easier to interpret. Unfortunately, there isn't just a single data structure you need -- there's a variety of information types coming as arrays of bytes.

The classical C-style solution of this kind of problem is using a data structure definition and casting the raw memory into an instance of the structure. Since data structures in C are sequential and well-aligned, and since memory manipulation is one of the primary language traits, this all boils down to something as trivial as:

C++
typedef struct t_Packet {
    short Source;
    short Destination;
    int Checksum;
} Packet;

Packet FromRawBytesToPacket(unsigned char* rawBytes) {
    return *(Packet*)rawBytes;
}

This works perfectly in C and C++. But we can't use pointers in C#, can we? Take a couple of minutes to think about this problem. It's a mini-problem. It's a kindergarten problem for a C++ developer. How could it possibly be difficult in managed code?

Remember that one of the primary benefits of a managed environment is that, well, it's managed. You don't take care of memory allocation and deallocation, and as a consequence, the burden of dealing with pointers is removed from your shoulders. With that burden, some additional minor things are removed. This scenario is one of them.

Solution Attempt 1: BinaryReader Approach

Assuming full knowledge about the specific structure of the data, we can use BinaryReader -- a class perfectly suited for reading primitive types from a binary form. Here's a first attempt at a solution:

C#
struct Packet
{
    public short Source;
    public short Destination;
    public int Checksum;
}

static Packet ReadUsingBinaryReader(byte[] data)
{
    Packet packet;
    using (BinaryReader reader = new BinaryReader(new MemoryStream(data, false)))
    {
        packet.Source = reader.ReadInt16();
        packet.Destination = reader.ReadInt16();
        packet.Checksum = reader.ReadInt32();
    }
    return packet;
}

This doesn't look so bad, now does it? However, considering that I might need a hundred structures like this one, and the structures are going to change every couple of weeks, maintaining this kind of solution becomes a nightmare. (As a side note, it's always possible to automate the process by generating the necessary code from the structure definition. It's not trivial, because structures can be nested recursively, but doable, and I will leave it as an exercise for the reader.)

Unfortunately, maintainability is not the only problem with this code. The performance is not spectacular either. Reading 1,000,000 instances of this trivial data structure takes 490 milliseconds on my test machine. That's quite a lot of time, and it caps our throughput at about 2 million messages per second. Sometimes it's a lot -- at other times, it isn't.

Solution Attempt 2: A Generic Approach

A generic approach to this problem requires a generic method that accepts an array of bytes and returns an instance of a generic type parameter. However, in order to do that, we need a mechanism that will automatically read the binary representation of our structure fields. The framework happens to have a mechanism handy for doing just that, in a generic fashion, as part of the System.Runtime.InteropServices.Marshal class.

The Marshal class has an interesting method called Marshal.PtrToStructure, which seems to be a good match for our scenario. The documentation discusses marshaling from an unmanaged block of memory, but why would our raw byte array be any worse than an unmanaged block of memory? All we need to do is figure a way to comply with the signature: object PtrToStructure(IntPtr, Type). This IntPtr there is annoying -- it means we have to find the memory address of our byte array.

The intrinsic facility for finding the memory address of a managed object is the GCHandle structure. Why are there special precautions we need to take when obtaining the address of a managed object? Well, the primary precaution is that the managed object can move in memory! During garbage collection, it's perfectly natural for a managed object to be shifted around, and we certainly can't have that happening if we need a stable memory address for our object. This is alleviated by allocating a GCHandleType.Pinned handle, which will ensure the object isn't shifted in memory by the garbage collector.

Eventually, we come up with the following implementation:

C#
static T ReadUsingMarshalSafe<T>(byte[] data) where T : struct
{
    GCHandle gch = GCHandle.Alloc(data, GCHandleType.Pinned);
    try
    {
        return (T)Marshal.PtrToStructure(gch.AddrOfPinnedObject(), typeof(T));
    }
    finally
    {
        gch.Free();
    }
}

Note that if the GCHandle is not explicitly freed, we will have a memory leak. Since the handle is a value type, it doesn't have a finalizer or any implicit mechanism for unpinning and freeing the memory.

This is a generic solution, and one that works for any blittable data structure -- not only our Packet as defined earlier. However, its performance characteristics are below par -- 850 milliseconds for a million objects, almost twice as slow as the BinaryReader solution.

On the one hand, we have a specifically-tailored solution which gives us the best performance. On the other hand, we have a generic solution which is almost two times slower. What can possibly be improved?

Solution Attempt 3: Unsafe Non-Generic Approach

Well, it appears that we can use pointers from C# after all. Unsafe code is not one of the well-known or best-advertised features of the CLR, but there is nothing inherent in its design or implementation to prevent us from directly accessing memory via pointers.

To begin with, we need to compile our project with the /unsafe switch. Its Visual Studio equivalent is under the Project Properties, Build, Allow unsafe code checkbox. Next, whenever we use a pointer, we will need to wrap the code using it in an unsafe block. These are minor nuisances, however -- let's take a look at a possible solution:

C#
static Packet ReadUsingPointer(byte[] data)
{
    unsafe
    {
        fixed (byte* packet = &data[0])
        {
            return *(Packet*)packet;
        }
    }
}

Is that all? Yes, that's all! The fixed statement makes sure the byte array is pinned and its address is available, and the single return statement inside the block casts the byte* around to obtain a Packet instance.

What about performance? That's where this solution really shines: 13 milliseconds for a million instances, 65 times faster than Marshal.PtrToStructure and 37 times faster than BinaryReader! The only setback is that this code is not generic, but what stops us from changing this fact?

Solution Attempt 4: Unsafe Generic Approach

A naive attempt at a generic solution using C# pointers would be something along the following lines:

C#
static T ReadUsingPointer<T>(byte[] data) where T : struct
{
    unsafe
    {
        fixed (byte* packet = &data[0])
        {
            return *(T*)packet;
        }
    }
}

Unfortunately, this doesn't compile, complaining that our code "Cannot take the address of, get the size of, or declare a pointer to a managed type ('T')". What's wrong with our code? Just one thing -- we are assuming that T is a type that we can declare a pointer to. And not every type is that kind of pointer-friendly type. Specifically, the only types we are allowed to declare a pointer to are:

  • Primitive types, except string
  • Structures containing only primitive types
  • Structures containing only structures containing only . . . (recursively)

Why does the compiler complain then? Because we are writing a generic method, and the compiler must be satisfied that any type T that our users might use will be a type that we are allowed to point at. But since there is no generic constraint for expressing that, the compiler is never going to accept our code as it is.

Now we stand at a crossroad. If we're looking for a specific solution for a specific data structure, then this might be enough. We don't need a generic method. But if we have multiple data structures, we might at least attempt to improve upon the Marshal.PtrToStructure approach, armed with our knowledge of pointers:

C#
static T ReadUsingMarshalUnsafe<T>(byte[] data) where T : struct
{
    unsafe
    {
        fixed (byte* p = &data[0])
        {
            return (T) Marshal.PtrToStructure(new IntPtr(p), typeof(T));
        }
    }
}

Why is this any better than using a GCHandle? Because internally, the code generated for the fixed statement is more efficient than using a GCHandle. Specifically, the emitted IL will contain a so-called pinned pointer, which is a short-circuit pinning mechanism.

Namely, the performance improvement is significant: 555 milliseconds for a million instances, only slightly slower than the BinaryReader approach.

We might be ready to give up at this point. Either embrace pointers and accept the lack of genericity, or use Marshal.PtrToStructure and accept the poor performance. But remember what we had in mind in the beginning, the C-style solution to this problem? If we can't cleanly solve this in C#, why don't we cleanly solve it in... C++/CLI?

Solution Attempt 5: C++/CLI Approach

C++/CLI is a managed programming language based on C++. In fact, it's a set of extensions to the standard C++ syntax, so everything you know about C++ is true for C++/CLI. This is neither the time nor the place to elaborate about C++/CLI, but its primary usage scenario is interoperability. One of my blog articles elaborates on the usefulness of C++/CLI as a bridging mechanism between native and managed code, in both directions.

However, we aren't really interested in the interoperability scenario. We're looking at improving the performance of our solution. As it appears, C++/CLI can be used for performance reasons where C# doesn't give us the necessary facilities for the fastest possible solution. What we are going to do is write the actual copying code in C++/CLI, and call it from C#. Since even C++/CLI won't let us declare a direct pointer to a generic parameter, we will outsmart it and use memcpy:

C++
public ref class Reader abstract sealed
{
public:
    generic <typename T> where T : value class
    static T Read(array<System::Byte>^ data)
    {
        T value;

        pin_ptr<System::Byte> src = &data[0];
        pin_ptr<T> dst = &value;

        memcpy((void*)dst, (void*)src,
            /*System::Runtime::InteropServices::Marshal::SizeOf(T::typeid)*/
            sizeof(T));

        return value;
    }
};

This appears to be very different from your common C# code, but it's not that different after all. Some highlights:

  • An abstract and sealed class is the equivalent of a static class
  • A generic method is just that, and note that you can still use the C++ template mechanism
  • pin_ptr is the intrinsic for generating a pinned pointer, similarly to the C# fixed keyword

Note that we could either use Marshal.SizeOf or the built-in sizeof operator. If the structure had any elements without straightforward marshaling (such as characters), sizeof might have been inaccurate -- but it's significantly faster.

How is the performance? With no significant optimizations, this code performs at 60 milliseconds for a million instances. That's blazingly fast, even though still 4 times slower than the trivial C# pointer manipulation in the non-generic case. The primary cost factor here is the interop transition -- after all, we're calling from C# through C++/CLI to the hand-coded assembler implementation of memcpy. The costs of this transition will be smaller if the structure is bigger, or if we can introduce an implementation that performs multiple reads on the native side. But even as is, it's clear that we have a solution that is both fast and generic. It's not the fastest one, but it's significantly better than the other generic solutions.

Consolidation

Summing this up, we have looked at multiple solutions for a problem that seemed really simple at first -- copying a chunk of data from raw memory into a data structure. I feel that taking a language and a framework to its limits, and then crossing these limits, provides the best insight into what's possible in your code.

History

  • Version 1 - 8th May, 2008

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)