Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C#

ByteArrayBuilder - A StringBuilder for Bytes

4.91/5 (9 votes)
26 Oct 2013CPOL3 min read 49.1K   652  
StringBuilder is a very useful and memory efficient way to concatenate strings, but there is no obvious similar class for byte arrays. This class adds that, and provides a flexible binary data storage medium at the same time.

Introduction

StringBuilders are an excellent way of concatenating strings without the memory overhead involved in repeated construction and allocation of a number of intermediate string values:

C#
string result = "";
foreach (string s in myListOfStrings)
    {
    result += s.StartsWith("A") ? "," : ";" + s;
    }

May look fine, but each time round the loop a new string is created, and the existing content copied into it before the new data is added - strings are immutable in .NET, remember?

C#
StringBuilder sb = new StringBuilder();
foreach (string s in myListOfStrings)
    {
    sb.Append(s.StartsWith("A") ? "," : ";");
    sb.Append(s);
    }
string result = sb.ToString();

Looks more complex, but it doesn't generate any more memory that it needs to - the StringBuilder expands when it is full, and the data is only copied into the new (larger) space at that point. This is a lot more efficient in terms of memory, and a whole load more efficient in processing time.

But there isn't an obvious class that does this for bytes, is there?

Background

Well, yes, there is a class which does this for bytes - but you might not have thought of it. It's a MemoryStream. It starts out at 256 bytes, and expands in the same way as a StringBuilder does. But...it's not quite as obvious:

C#
MemoryStream ms = new MemoryStream();
foreach (byte[] b in myListOfByteArrays)
    {
    ms.Write(b, 0, b.Length);
    }
byte[] result = new byte[ms.Length];
Array.Copy(ms.GetBuffer(), result, ms.Length);

What we want is more like this:

C#
ByteArrayBuilder bab = new ByteArrayBuilder();
foreach (byte[] b in myListOfByteArrays)
    {
    bab.Append(b);
    }
byte[] result = bab.ToArray();

This document presents a class designed to do just that, with some useful features added.

(In fact, that code won't do exactly what you want it to with the class I present here, you need a slight change to how you use it:

C#
ByteArrayBuilder bab = new ByteArrayBuilder();
foreach (byte[] b in myListOfByteArrays)
    {
    bab.Append(b, false);
    }
byte[] result = bab.ToArray();

I will explain why later.)

The ByteArrayBuilder class encapsulates a MemoryStream and provides a number of methods to add and extract data from it.

Using the Code

Firstly, do note that unlike StringBuilder, ByteArrayBuilder implements IDisposable (because it contains a MemoryStream which also implements it, and it is a good idea to dispose large objects when you are finished with them anyway) so it is probably a good idea to use a using block around the constructor:

C#
byte[] result;
using (ByteArrayBuilder bab = new ByteArrayBuilder())
    {
    foreach (byte[] b in myListOfByteArrays)
        {
        bab.Append(b, false);
        }
    result = bab.ToArray();
    } 

There are three constructors provided:

C#
/// <summary>
/// Create a new, empty builder ready to be filled.
/// </summary>
public ByteArrayBuilder()
    {
    }
/// <summary>
/// Create a new builder from a set of data
/// </summary>
/// <param name="data">Data to preset the builder from</param>
public ByteArrayBuilder(byte[] data)
    {
    store.Close();
    store.Dispose();
    store = new MemoryStream(data);
    }
/// <summary>
/// Create a new builder from the Base64 string representation of an
/// existing instance.
/// The Base64 representation can be retrieved using the ToString override
/// </summary>
/// <param name="base64">Base64 string representation of an
/// existing instance.</param>
public ByteArrayBuilder(string base64)
    {
    store.Close();
    store.Dispose();
    store = new MemoryStream(Convert.FromBase64String(base64));
    }

The parameterless version creates a new, empty ByteArrayBuilder ready for you to add fresh data. The other two constructors allow you to recreate an instance from the data you got from one earlier - either in the form of a raw byte array, or as a Base64 string (which can be transferred more easily with text based transfer mechanisms than raw binary data.) You can access the data to feed these from using the ToArray and ToString methods:

C#
/// <summary>
/// Returns the builder as an array of bytes
/// </summary>
/// <returns></returns>
public byte[] ToArray()
    {
    byte[] data = new byte[Length];
    Array.Copy(store.GetBuffer(), data, Length);
    return data;
    }
/// <summary>
/// Returns a text based (Base64) string version of the current content
/// </summary>
/// <returns></returns>
public override string ToString()
    {
    return Convert.ToBase64String(ToArray());
    }

This allows you to create complex binary based save files for example and load them later to recover the data (yes, I know you can use a BinarySerializer, but there are times when that isn't appropriate or even possible).

The ByteArrayBuilder.Append method has a number of overrides to allow you to add various datatypes to the store:

C#
public void Append(bool b)
public void Append(byte b)
public void Append(byte[] b, bool addLength = true)
public void Append(char c)
public void Append(char[] c, bool addLength = true)
public void Append(DateTime dt)
public void Append(decimal d)
public void Append(double d)
public void Append(float f)
public void Append(Guid g)
public void Append(int i)
public void Append(long l)
public void Append(short i)
public void Append(string s, bool addLength = true)
public void Append(uint ui)
public void Append(ulong ul)
public void Append(ushort us)

Note that the variable length versions all have an optional bool parameter - this defaults to true, and inserts the length of the value to be inserted before the actual data - this allows the value to be extracted in the same form as it went in. Setting it to false creates a "raw data" byte array with no padding information. This allows for easy transfer to other equipment where the fields may be specific lengths and length info is not necessary.

The addition of the lengths is to allow the data to be extracted from the store as it went in. Again, there are a number of methods to do this:

C#
public bool GetBool()
public byte GetByte()
public byte[] GetByteArray()
public char GetChar()
public char[] GetCharArray()
public DateTime GetDateTime()
public decimal GetDecimal()
public double GetDouble()
public float GetFloat()
public Guid GetGuid()
public int GetInt()
public long GetLong()
public short GetShort()
public string GetString()
public uint GetUint()
public ulong GetUlong()
public ushort GetUshort() 

Example

Suppose you have a card index system, and each Card consists of a hierarchical list of Lines. In this case, both the Card and CardLine data have independant version information to allow future versions to be able to read older data.

The Card class ToBytes method may need to save a variety of information:

C#
/// <summary>
/// Convert this card into a byte stream.
/// </summary>
/// <returns></returns>
public byte[] ToBytes()
    {
    using (ByteArrayBuilder bab = new ByteArrayBuilder())
        {
        bab.Append(cardDataVersion);
        bab.Append(Guid);
        bab.Append(Text);
        bab.Append(Created);
        bab.Append(Modified);
        bab.Append(_Lines.Count);
        foreach (CardLine child in _Lines)
            {
            bab.Append(child.ToBytes());
            }
        return bab.ToArray();
        }
    }

And the CardLine.ToBytes will also be similar:

C#
/// <summary>
/// Convert this line into a byte stream.
/// </summary>
/// <returns></returns>
public byte[] ToBytes()
    {
    using (ByteArrayBuilder bab = new ByteArrayBuilder())
        {
        bab.Append(lineDataVersion);
        bab.Append(Guid);
        bab.Append(HasPrompt);
        bab.Append(Prompt);
        bab.Append(HasValue);
        bab.Append(Value);
        bab.Append(Created);
        bab.Append(Modified);
        bab.Append(_Children.Count);
        foreach (CardLine child in _Children)
            {
            bab.Append(child.ToBytes());
            }
        return bab.ToArray();
        }
    }

Then, when you want to load the card back again:

C#
/// <summary>
/// Create an Card line from existing data
/// </summary>
/// <param name="cardData"></param>
public Card(byte[] cardData)
    {
    using (ByteArrayBuilder bab = new ByteArrayBuilder(cardData))
        {
        short dataVersion = bab.GetShort();
        if (dataVersion > cardDataVersion)
            {
            // What? To risky to open - I could discard data.
            throw new ApplicationException("Cannot open data: 
              it was saved with a more advanced version of this program\nExpected version " +
                                           cardDataVersion +
                                           ", but found " +
                                           dataVersion);
            }
        LoadData(bab, dataVersion);
        }
    }
/// <summary>
/// Load the data into this instance.
/// Loads using the appropriate data version.
/// </summary>
/// <param name="bab"></param>
/// <param name="dataVersion"></param>
private void LoadData(ByteArrayBuilder bab, short dataVersion)
    {
    if (dataVersion == 1) LoadDataVersion1(bab);
    else
        {
        throw new ApplicationException("Cannot open data: 
          it was saved with an unknown version of this program\nExpected version " +
                                       cardDataVersion +
                                       ", but found " +
                                       dataVersion);
        }
    }
/// <summary>
/// Load version 1 data
/// </summary>
/// <param name="bab"></param>
private void LoadDataVersion1(ByteArrayBuilder bab)
    {
    Guid = bab.GetGuid();
    Text = bab.GetString();
    Created = bab.GetDateTime();
    Modified = bab.GetDateTime();
    int childrenCount = bab.GetInt();
    _Lines = new List<CardLine>();
    while (childrenCount-- > 0)
        {
        _Lines.Add(new CardLine(bab.GetByteArray()));
        }
    }

The CardLine code looks very similar:

C#
/// <summary>
/// Create a Card line from existing data
/// </summary>
/// <param name="lineData"></param>
public CardLine(byte[] lineData)
    {
    using (ByteArrayBuilder bab = new ByteArrayBuilder(lineData))
        {
        short dataVersion = bab.GetShort();
        if (dataVersion > lineDataVersion)
            {
            // What? To risky to open - I could discard data.
            throw new ApplicationException("Cannot open data: 
              it was saved with a more advanced version of this program\nExpected version " +
                                           lineDataVersion +
                                           ", but found " +
                                           dataVersion);
            }
        LoadData(bab, dataVersion);
        }
    }
/// <summary>
/// Load the data into this instance.
/// Loads using the appropriate data version.
/// </summary>
/// <param name="bab"></param>
/// <param name="dataVersion"></param>
private void LoadData(ByteArrayBuilder bab, short dataVersion)
    {
    if (dataVersion == 1) LoadDataVersion1(bab);
    else
        {
        throw new ApplicationException("Cannot open data: 
             it was saved with an unknown version of this program\nExpected version " +
                                       lineDataVersion +
                                       ", but found " +
                                       dataVersion);
        }
    }
/// <summary>
/// Load version 1 data
/// </summary>
/// <param name="bab"></param>
private void LoadDataVersion1(ByteArrayBuilder bab)
    {
    Guid = bab.GetGuid();
    HasPrompt = bab.GetBool();
    Prompt = bab.GetString();
    HasValue = bab.GetBool();
    Value = bab.GetString();
    Created = bab.GetDateTime();
    Modified = bab.GetDateTime();
    int childrenCount = bab.GetInt();
    _Children = new List<CardLine>();
    while (childrenCount-- > 0)
        {
        _Children.Add(new CardLine(bab.GetByteArray()));
        }
    }

History

  • 26th October, 2013: Original version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)