Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / desktop / Win32

Handling of Large Byte Arrays

4.81/5 (14 votes)
9 Feb 2009LGPL34 min read 68.1K   626  
Allocation and copy of large byte[]

Introduction

This article compares different allocation- and copy-methods in using large byte[] in managed code.

Background

Sometimes you have to deal with frequently allocated and copied large byte arrays. Especially in video processing RGB24 of a 320x240-picture needs a byte[] of 320*240*3 = 230400 bytes. Choosing the right memory allocation and memory copying strategies may be vital for your project.

Using the Code

In my current project, I have to handle hundreds of uncompressed RGB24-Frames on multi core servers in real time. To be able to choose the best architecture for my project, I compared different memory allocations and copy mechanisms.

Because I know how difficult it is to find a good way to measure, I decided to do a really simple test and get a raw comparable result. I simply run a loop for 10 seconds and count the number of loops.

Allocation

Looking around, I found 5 different methods to allocate large byte arrays:

  • new byte[]
  • Marshal.AllocHGlobal()
  • Marshal.AllocCoTaskMem()
  • CreateFileMapping() // This is shared memory
  • stackalloc byte[]

new byte[]

Here is a typical loop showing the new byte[]:

C#
private static void newbyte()
{
    Console.Write("new byte[]: ");
    long start = DateTime.UtcNow.Ticks;
    int i = 0;
    while ((start + duration) > DateTime.UtcNow.Ticks)
    {
        byte[] buf = new byte[bufsize];
        i++;
    }
    Console.WriteLine(i);
}

new byte[] is completely managed code.

Marshal.AllocHGlobal()

Allocates memory from the unmanaged memory of the process.

C#
IntPtr p = Marshal.AllocHGlobal(bufsize);
Marshal.FreeHGlobal(p);

Marshal.AllocHGlobal() returns an IntPtr but still does not need to be unsafe code. But when you want to access the allocated memory, you most often need unsafe code.

Marshal.AllocCoTaskMem()

Allocates a block of memory of specified size from the COM task memory allocator.

C#
IntPtr p = Marshal.AllocCoTaskMem(bufsize);
Marshal.FreeCoTaskMem(p);

Same need of unsafe code like Marshal.AllocHGlobal().

CreateFileMapping()

For using shared memory in a managed code project, I wrote my own little helper class for using CreateFileMapping()-Functions.

Using shared memory is quite simple:

C#
using (SharedMemory mem = new SharedMemory("abc", bufsize, true))
// use mem;

mem has a void* to the buffer and a length-property. From inside another process, you can get access to the same memory by simple using false in the constructor (and the same name).

SharedMem uses unsafe.

stackalloc byte[]:

Allocated a byte[] on the stack. Therefore it will be freed when you return from the current method. Using the stack may result in stack overflows when you don't do it wisely.

C#
unsafe static void stack()
{
    byte* buf = stackalloc byte[bufsize];
}

Using stackalloc requires using unsafe, too.

Test Results

I don't want to talk about single/multicore, NUMA/non-NUMA-architecures and so on. Therefore I just print some interesting results. Feel free to run the test on your machines!

Debug/Release

Running the test in Debug and Release offers dramatic differences in the number of loops in 10 seconds:

Release
new byte[]:            425340907   100%
Marshal.AllocHGlobal:   19680751     5%
Marshal.AllocCoTaskMem: 21062645     5%
stackalloc:            341525631    80%
SharedMemory:             792007   0.2% 
Debug
new byte[]:                71004   0.3%
Marshal.AllocHGlobal:   22660829    89%
Marshal.AllocCoTaskMem: 25557756   100%
stackalloc:               558497     2%
SharedMemory:             785470     3%

As you can see, new byte[] and stackalloc byte[] dramatically depend on the debug/release switch. And the other three do not depend on it. This may be because they are mainly kernel-managed.

And new byte[] and stackalloc byte[] are the fastest in managed code in release-mode and the slowest in debug-mode. But remember that the garbage collector has to handle the new byte[], too.

PC/Server

These two runs were done on my PC (intel dualcore, vista64). So let's compare it to a typical server (dual xeon quadcore, Windows server 2008 64bit) in release:

                         Server      Workstation
new byte[]:            553541729      425340907   
Marshal.AllocHGlobal:   26460746       19680751 
Marshal.AllocCoTaskMem: 28294494       21062645 
stackalloc:            466980755      341525631    
SharedMemory:             817317         792007    

Because we are single-threaded, the number of cores does not matter. Remember, the garbage collector has its own thread.

32Bit/64Bit

Let's compare 32bit to 64bit (release):

                         x86-32bit   x64-64bit
new byte[]:             1046577767   516441931
Marshal.AllocHGlobal:     21034715    25152330
Marshal.AllocCoTaskMem:   23467574    27787971
stackalloc:               83956017   416630753
SharedMemory:               728858      793750 

Marshal.* and SharedMemory is a little bit faster on x64. new byte[] is up to twice as fast on x86 than on x64. And stackalloc byte[] is 5 times faster on x64 than on x86. I didn't expect this result!
The same result is true on my server.

Conclusion

So think twice before you decide which allocation method and target-platform you choose!

MemCopy

And now let's look for some memcopy-variants. I use the same algorithm to measure. Let one thread do a loop copying a byte[] to another byte[] for 10 seconds and count the number of copies.

  • Array.Copy()
  • Marshal.Copy()
  • Kernel32.dll CopyMemory()
  • Buffer.BlockCopy()
  • MemCopyInt()
  • MemCopyLong()
Release/Debug
                                  Release    Debug
Array.Copy:                       360741    361740
Marshal.Copy:                     360680    359712
Kernel32NativeMethods.CopyMemory: 361314    358927
Buffer.BlockCopy:                 375440    374004
OwnMemCopyInt:                    217736     33833
OwnMemCopyLong:                   295372     54601

As expected only my own MemCopy was a lot slower in Debug-Mode. Let's take a look at my own MemCopy:

C#
static readonly int sizeOfInt = Marshal.SizeOf(typeof(int));
static public unsafe void MemCopy(IntPtr pSource, IntPtr pDest, int Len)
{
    unchecked
    {
        int size = sizeOfInt;
        int count = Len / size;
        int rest = Len % count;
        int* ps = (int*)pSource.ToPointer(), pd = (int*)pDest.ToPointer();
        // Loop over the cnt in blocks of 4 bytes, 
        // copying an integer (4 bytes) at a time:
        for (int n = 0; n < count; n++)
        {
            *pd = *ps;
            pd++;
            ps++;
        }
        // Complete the copy by moving any bytes that weren't moved in blocks of 4:
        if (rest > 0)
        {
            byte* ps1 = (byte*)ps;
            byte* pd1 = (byte*)pd;
            for (int n = 0; n < rest; n++)
            {
                *pd1 = *ps1;
                pd1++;
                ps1++;
            }
        }
    }
}

Even when you use unchecked unsafe code, the built in copy-functions perform much faster in debug mode than doing the copy in a loop yourself.

32Bit/64Bit
                                   32Bit    64Bit
Array.Copy:                       230788   360741    
Marshal.Copy:                     460061   360680
Kernel32NativeMethods.CopyMemory: 365850   361314
Buffer.BlockCopy:                 368212   375440 
OwnMemCopyInt:                    218438   217736  
OwnMemCopyLong:                   286321   295372      

In 32Bit x86-Code, the Marshal.Copy is significantly faster than in 64bit-code. Array.Copy is much slower in 32bit than in 64bit. And my own memcopy-loop uses 32bit integers and therefore has the same speed. And the kernel-method is not affected by this setting.

Conclusion

It is a good idea to use the built-in memcopy functions.

Points of Interest

Try the source on your machine and compare the results.

History

  • 9th February, 2009: Initial post

License

This article, along with any associated source code and files, is licensed under The GNU Lesser General Public License (LGPLv3)