Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Binary operations on byte arrays, with parallelism and pointers

0.00/5 (No votes)
11 Feb 2012 1  
The basic binary operations (AND, OR, XOR, NOT, ShiftLeft, ShiftRight) applied to byte arrays, made faster thanks to the use of parallelism combined with the use of pointers.

Introduction

The Binary Operations extension functions apply to byte arrays, to provide an easy and hopefully fast way to use the basic binary operators. The operators AND, OR, XOR, NOT, Shift Left, Shift Right are provided.

The provided functions rely on the System.Threading.Tasks.Parallel library and on many unsafe zones where pointers are used to access sequentially the elements of the byte arrays.

Background

In the beginning, I was using a simple for cycle that repeated the same binary operation on all the elements of the arrays, like in the following statement:

public static byte[] Bin_And_noParallel_noPointers(this byte[] ba, byte[] bt)
    {
    int longlen = Math.Max(ba.Length, bt.Length);
    int shortlen = Math.Min(ba.Length, bt.Length);
    byte[] result = new byte[longlen];
    for (int i = 0; i < shortlen; i++)
        {
        result[i] = (byte)(ba[i] & bt[i]);
        }
    return result;
    }

Then a question popped into my mind: "how can I make it faster?" I then tried three solutions (which are implemented in the code at the end of the article): parallelism, use of pointers, and substitute the pointers to bytes with pointers to integers.

Using the code

The provided class was written using C# 4.0 (in the .NET environment). Once you can access its namespace, this class extends the functionality of the byte array (byte[]) with these six functions:

public static byte[] Bin_And(this byte[] ba, byte[] bt)
public static byte[] Bin_Or(this byte[] ba, byte[] bt)
public static byte[] Bin_Xor(this byte[] ba, byte[] bt)
public static byte[] Bin_Not(this byte[] ba)
public static byte[] Bin_ShiftLeft(this byte[] ba, int bits)
public static byte[] Bin_ShiftRight(this byte[] ba, int bits)

These functions are extended methods and are to be used in the way explained in the following example, where a binary 'AND' operation is executed, between two byte arrays:

byte[] MyFirstByteArray, MySecondByteArray;
// fill the arrays here, for example by loading them from files
// (System.IO.File.ReadAllBytes) or filling them manually or with random bytes
byte[] result = MyFirstByteArray.Bin_And(MySecondByteArray);

If you are going to try these functions, remember to check the box 'Allow unsafe code' in Project -> Properties -> Build, since the pointers are used in usafe zones...

I've tested these operations by filling the byte arrays with literature texts (like the 'Divine Comedy' from the famous Italian poet Dante Alighieri, who wrote it around the years 1308 and 1321; so there shouldn't be copyright problems) longer than 500,000 bytes, and it worked fine.

Explanation of the code

As I mentioned before, during my journey to improve the simple binary functions, I tried three different strategies to make them faster: parallelism, pointers, and integers instead of bytes.

At first I tried using the simple System.Threading.Tasks.Parallel.For and the execution speed increased a bit, but not so much.

Then I tried the pointer approach: accessing the array through the index (as, for example, in MyArray[175]) is slower than accessing the same data through a pointer that is moved sequentially from the first to the last byte. This second approach gave good results, similar, if not even faster, than the parallel approach.

Then... why not try and combine them?

Well, using the Parallel.For was a bad idea, since the overall speed dropped very much.

The final solution was a bit more complicated, but the increase of speed (especially with large arrays) has been worth the while: look hereinafter to the sample code for the 'AND' binary operation:

public static byte[] Bin_And(this byte[] ba, byte[] bt)
    {
    int lenbig = Math.Max(ba.Length, bt.Length);
    int lensmall = Math.Min(ba.Length, bt.Length);
    byte[] result = new byte[lenbig];
    int ipar = 0;
    object o = new object();
    System.Action paction = delegate()
        {
        int actidx;
        lock (o)
            {
            actidx = ipar++;
            }
        unsafe
            {
            fixed (byte* ptres = result, ptba = ba, ptbt = bt)
                {
                byte* pr = (byte*)ptres;
                byte* pa = (byte*)ptba;
                byte* pt = (byte*)ptbt;
                pr += actidx; pa += actidx; pt += actidx;
                while (pr < ptres + lensmall)
                    {
                    *pr = (byte)(*pt & *pa);
                    pr += paralleldegree; pa += paralleldegree; pt += paralleldegree;
                    }
                }
            }
        };

    System.Action[] actions = new Action[paralleldegree];
    for (int i = 0; i < paralleldegree; i++)
        {
        actions[i] = paction;
        }
    Parallel.Invoke(actions);
    return result;
    }

In this function, a delegate (containing the code to execute the AND operation) is declared, and then the same delegate is invoked a number of times equal to the paralleldegree variable (which is a class level variable assigned inside the static constructor (as you can see in the complete class code, at the end of the article) and contains the number of processors available for the current machine). The various instances of the delegate are, of course, invoked in a parallel manner.

Inside the delegate, pointers are declared to the input and output byte arrays, and then, inside the while cycle, they are moved to the next element at each cycle. The core of the function is the simple line *pr = (*pt & *pa);, where the AND binary operation is executed between the input values (*pt and *pa) putting the result in the output (*pr).

Now, if paralleldegree is 1 (i.e., no parallelism), this delegate is quite easy to understand.

But let's assume, for example, paralleldegree is 4. This creates the problem that the four instances of the delegate would execute exactly the same code four times, resulting in a very slow execution time. So, the idea, when paralleldegree is 4 (or any number greater than 1), is to make each delegate start from a different byte of the input byte arrays, and then, inside the while cycle, jump to the next 4 bytes, at every cycle, thus calculating the result without collisions with the other delegates running in parallel.

In this case, the first delegate would calculate the bytes 0, 4, 8, 12, etc; the second delegate would calculate the bytes 1, 5, 9, 13, etc., the third 2, 6, 10, 14 ..., and the fourth 3, 7, 11, 15... The initial lock statement serves exactly the purpose of making the delegates start from sequential points, thanks to the ipar variable (function level) that is increased at each call (four times when paralleldegree is 4), and the actidx variable (delegate level), which is assigned only once (for every delegate) and acts something like the 'initial offset' of the pointers, for every delegate (as can be seen in the line pr += actidx; pa += actidx; pt += actidx;).

Using the parallelism together with the pointers resulted in a very good increase in execution speed (by the waw: to analyze the execution speed, I used to put the function to be tested inside a loop, and executing it 100 or 1000 times, measuring the time with the System.Diagnostics.Stopwatch class); but a further improvement can be done, by replacing the byte pointers (byte*) with uint pointers (uint*): uint is four times bigger than byte, so every cycle would be four times shorter. Well, the speed increased again; not four times, but 'only' two times. So, this last improvement was, in my opinion, worth the while.

Full class code

using System;
using System.Threading.Tasks;

namespace MySpace
    {
    public static class BinOps
        {
        private const int bitsinbyte = 8;
        private static readonly int paralleldegree;
        private static readonly int uintsize;
        private static readonly int bitsinuint;
        static BinOps()
            {
            paralleldegree = Environment.ProcessorCount;
            uintsize = sizeof(uint) / sizeof(byte); // really paranoid, uh ?
            bitsinuint = uintsize * bitsinbyte;
            }

        public static byte[] Bin_And(this byte[] ba, byte[] bt)
            {
            int lenbig = Math.Max(ba.Length, bt.Length);
            int lensmall = Math.Min(ba.Length, bt.Length);
            byte[] result = new byte[lenbig];
            int ipar = 0;
            object o = new object();
            System.Action paction = delegate()
            {
                int actidx;
                lock (o)
                    {
                    actidx = ipar++;
                    }
                unsafe
                    {
                    fixed (byte* ptres = result, ptba = ba, ptbt = bt)
                        {
                        uint* pr = (uint*)ptres;
                        uint* pa = (uint*)ptba;
                        uint* pt = (uint*)ptbt;
                        pr += actidx; pa += actidx; pt += actidx;
                        while (pr < ptres + lensmall)
                            {
                            *pr = (*pt & *pa);
                            pr += paralleldegree; pa += paralleldegree; pt += paralleldegree;
                            }
                        }
                    }
            };
            System.Action[] actions = new Action[paralleldegree];
            for (int i = 0; i < paralleldegree; i++)
                {
                actions[i] = paction;
                }
            Parallel.Invoke(actions);
            return result;
            }
            
        public static byte[] Bin_Or(this byte[] ba, byte[] bt)
            {
            int lenbig = Math.Max(ba.Length, bt.Length);
            int lensmall = Math.Min(ba.Length, bt.Length);
            byte[] result = new byte[lenbig];
            int ipar = 0;
            object o = new object();
            System.Action paction = delegate()
                {
                    int actidx;
                    lock (o)
                        {
                        actidx = ipar++;
                        }
                    unsafe
                        {
                        fixed (byte* ptres = result, ptba = ba, ptbt = bt)
                            {
                            uint* pr = (uint*)ptres;
                            uint* pa = (uint*)ptba;
                            uint* pt = (uint*)ptbt;
                            pr += actidx; pa += actidx; pt += actidx;
                            while (pr < ptres + lensmall)
                                {
                                *pr = (*pt | *pa);
                                pr += paralleldegree; pa += paralleldegree; pt += paralleldegree;
                                }
                            uint* pl = ba.Length > bt.Length ? pa : pt;
                            while (pr < ptres + lenbig)
                                {
                                *pr = *pl;
                                pr += paralleldegree; pl += paralleldegree;
                                }
                            }
                        }
                };
            System.Action[] actions = new Action[paralleldegree];
            for (int i = 0; i < paralleldegree; i++)
                {
                actions[i] = paction;
                }
            Parallel.Invoke(actions);

            return result;
            }
            
        public static byte[] Bin_Xor(this byte[] ba, byte[] bt)
            {
            int lenbig = Math.Max(ba.Length, bt.Length);
            int lensmall = Math.Min(ba.Length, bt.Length);
            byte[] result = new byte[lenbig];
            int ipar = 0;
            object o = new object();
            System.Action paction = delegate()
                {
                    int actidx;
                    lock (o)
                        {
                        actidx = ipar++;
                        }
                    unsafe
                        {
                        fixed (byte* ptres = result, ptba = ba, ptbt = bt)
                            {
                            uint* pr = ((uint*)ptres) + actidx;
                            uint* pa = ((uint*)ptba) + actidx;
                            uint* pt = ((uint*)ptbt) + actidx;
                            while (pr < ptres + lensmall)
                                {
                                *pr = (*pt ^ *pa);
                                pr += paralleldegree; pa += paralleldegree; pt += paralleldegree;
                                }
                            uint* pl = ba.Length > bt.Length ? pa : pt;
                            while (pr < ptres + lenbig)
                                {
                                *pr = *pl;
                                pr += paralleldegree; pl += paralleldegree;
                                }
                            }
                        }
                };
            System.Action[] actions = new Action[paralleldegree];
            for (int i = 0; i < paralleldegree; i++)
                {
                actions[i] = paction;
                }
            Parallel.Invoke(actions);

            return result;
            }
            
        public static byte[] Bin_Not(this byte[] ba)
            {
            int len = ba.Length;
            byte[] result = new byte[len];
            
            int ipar = 0;
            object o = new object();
            System.Action paction = delegate()
            {
                int actidx;
                lock (o)
                    {
                    actidx = ipar++;
                    }
                unsafe
                    {
                    fixed (byte* ptres = result, ptba = ba)
                        {
                        uint* pr = (uint*)ptres;
                        uint* pa = (uint*)ptba;
                        pr += actidx; pa += actidx;
                        while (pr < ptres + len)
                            {
                            *pr = ~(*pa);
                            pr += paralleldegree; pa += paralleldegree;
                            }
                        }
                    }
            };
            System.Action[] actions = new Action[paralleldegree];
            for (int i = 0; i < paralleldegree; i++)
                {
                actions[i] = paction;
                }
            Parallel.Invoke(actions);
            return result;
            }
            
        public static byte[] Bin_ShiftLeft(this byte[] ba, int bits)
            {
            int ipar = 0;
            object o = new object();

            int len = ba.Length;
            if (bits >= len * bitsinbyte) return new byte[len];
            int shiftbits = bits % bitsinuint;
            int shiftuints = bits / bitsinuint;
            byte[] result = new byte[len];

            if (len > 1)
                {
                // first uint is shifted without carry from previous byte (previous byte does not exist)
                unsafe
                    {
                    fixed (byte* fpba = ba, fpres = result)
                        {
                        uint* pres = (uint*)fpres + shiftuints;
                        uint* pba = (uint*)fpba;
                        *pres = *pba << shiftbits;
                        }
                    }
                System.Action paction = delegate()
                    {
                        int actidx;
                        lock (o)
                            {
                            actidx = ipar++;
                            }
                        unsafe
                            {
                            fixed (byte* fpba = ba, fpres = result)
                                {
                                // pointer to results; shift the bytes in the result
                                // (i.e. move left the pointer to the result)
                                uint* pres = (uint*)fpres + shiftuints + actidx + 1;
                                // pointer to original data, second byte
                                uint* pba1 = (uint*)fpba + actidx + 1;
                                if (shiftbits == 0)
                                    {
                                    while (pres < fpres + len)
                                        {
                                        *pres = *pba1;
                                        pres += paralleldegree; pba1 += paralleldegree;
                                        }
                                    }
                                else
                                    {
                                    // pointer to original data, first byte
                                    uint* pba2 = (uint*)fpba + actidx;
                                    while (pres < fpres + len)
                                        {
                                        *pres = *pba2 >> (bitsinuint - shiftbits) | *pba1 << shiftbits;
                                        pres += paralleldegree; pba1 += paralleldegree; pba2 += paralleldegree;
                                        }
                                    }
                                }
                            };

                    };
                System.Action[] actions = new Action[paralleldegree];
                for (int i = 0; i < paralleldegree; i++)
                    {
                    actions[i] = paction;
                    }
                Parallel.Invoke(actions);
                }

            return result;
            }
            
        public static byte[] Bin_ShiftRight(this byte[] ba, int bits)
            {
            int ipar = 0;
            object o = new object();
            int len = ba.Length;
            if (bits >= len * bitsinbyte) return new byte[len];
            int ulen = len / uintsize + 1 - (uintsize - (len % uintsize)) / uintsize;
            int shiftbits = bits % bitsinuint;
            int shiftuints = bits / bitsinuint;
            byte[] result = new byte[len];

            if (len > 1)
                {
                unsafe
                    {
                    fixed (byte* fpba = ba, fpres = result)
                        {
                        uint* pres = (uint*)fpres + ulen - shiftuints - 1;
                        uint* pba = (uint*)fpba + ulen - 1;
                        *pres = *pba >> shiftbits;
                        }
                    }
                System.Action paction = delegate()
                    {
                        int actidx;
                        lock (o)
                            {
                            actidx = ipar++;
                            }
                        unsafe
                            {
                            fixed (byte* fpba = ba, fpres = result)
                                {
                                // pointer to results; shift the bytes in the result
                                // (i.e. move left the pointer to the result)
                                uint* pres = (uint*)fpres + actidx;
                                // pointer to original data, first useful byte
                                uint* pba1 = (uint*)fpba + shiftuints + actidx;
                                if (shiftbits == 0)
                                    {
                                    while (pres < ((uint*)fpres) + ulen - shiftuints - 1)
                                        {
                                        *pres = *pba1;
                                        // increment pointers to next position
                                        pres += paralleldegree; pba1 += paralleldegree;
                                        }
                                    }
                                else
                                    {
                                    // pointer to original data, second useful byte
                                    uint* pba2 = (uint*)fpba + shiftuints + actidx + 1;
                                    while (pres < ((uint*)fpres) + ulen - shiftuints - 1)
                                        {
                                        // Core shift operation
                                        *pres = (*pba1 >> shiftbits | *pba2 << (bitsinuint - shiftbits));
                                        // increment pointers to next position
                                        pres += paralleldegree; pba1 += paralleldegree; pba2 += paralleldegree;
                                        }
                                    }
                                }
                            };
                    };
                System.Action[] actions = new Action[paralleldegree];
                for (int i = 0; i < paralleldegree; i++)
                    {
                    actions[i] = paction;
                    }
                Parallel.Invoke(actions);
                }
            return result;
            }
        }
    }

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here