A More Powerful BinaryReader/Writer

Ray K

4.94/5 (21 votes)

23 Jan 2019CPOL8 min read

40.2K

Extending BinaryReader/Writer to support a different byte order, string and date formats, and advanced navigation in binary files

Introduction

This article will discuss how to extend the functionality of the standard .NET BinaryReader and BinaryWriter classes to support a lot of new, generally applicable and useful features.

The API discussed in this article has been wrapped into a NuGet package for you to easily implement it in your existing .NET 4.5 projects.

A separate wiki for the NuGet package is found on the GitLab repository which is focusing the implementors side more, but this article also discusses how the package evolved and what I had to consider when writing its internals.

Background

Whenever I wrote a library to handle loading and saving a custom binary file format - sometimes complex proprietary formats of other companies - I used the standard .NET BinaryReader and BinaryWriter classes to parse the binary data out of these files.

However, the more complex the formats got, the more I missed pretty generic tasks and features in these .NET classes. Especially, I was looking for the following functionality:

Handle data stored in a different byte order than the one of the executing machine.
Understand strings not stored in the .NET format, for example, 0-terminated strings.
Simple reading and storing of repetitive data types (e.g., loading 12 Int32s), without writing for loops over and over.
Single-time reading strings with a different encoding than the one specified for the whole reader / writer instance.
Easier navigation around in the file, like temporarily seeking to an offset, then seeking back, or aligning to a block size.

At first, I wrote extension methods appending new features to the existing BinaryReader or BinaryWriter classes. However, this was insufficient for implementing the behavior of reading data in a byte order different than the system executing the code. This eventually lead me to create new classes named BinaryDataReader and BinaryDataWriter, inheriting from the .NET ones.

Let's see how I realized the different aspects listed above, and have a look at how to use it from the implementors side.

Implementation & Usage

Byte Order

Implementing the support for reading multi-byte data in a different byte order than the one of the machine which is executing the code required a lot of changes to the standard .NET class.

Remember that .NET does not define the endianness of the data it is working with. Thus, it might parse data either in big or little endian, depending on what machine you are running it on.

So at first, it was important to detect the system's byte encoding correctly. This is trivial by simply checking the System.BitConverter.IsLittleEndian field:

ByteOrder systemByteOrder = BitConverter.IsLittleEndian ? ByteOrder.LittleEndian : ByteOrder.BigEndian;

As you can see, I also introduced a ByteOrder enumeration which values, by the way, map to the byte order mark seen in files, in case this comes in handy for you:

public enum ByteOrder : ushort
{
    BigEndian = 0xFEFF,
    LittleEndian = 0xFFFE
}

To make the reader / writer respect this, I introduced a ByteOrder property to which one of those enumeration values can be set. It checks whether the system runs in a different byte order than the one set to reverse the read / written bytes if required:

public ByteOrder ByteOrder
{
    get
    {
        return _byteOrder;
    }
    set
    {
        _byteOrder = value;
        _needsReversion = _byteOrder != ByteOrder.GetSystemByteOrder();
    }
}

Then, I had to override every Read* (in case of the BinaryDataReader) or Write(T) (in case of the BinaryDataWriter) method to respect the _needsReversion boolean:

public override Int32 ReadInt32()
{
    if (_needsReversion)
    {
        byte[] bytes = base.ReadBytes(sizeof(int));
        Array.Reverse(bytes);
        return BitConverter.ToInt32(bytes, 0);
    }
    else
    {
        return base.ReadInt32();
    }
}

This worked out really well with the help of all the BitConverter.ToXxx methods, allowing me to retrieve the bytes of multi-byte values or convert bytes to multi-byte values. Decimal values were an oddball however, requiring some manual work of byte conversion, with the code from here taken as the basis.

Usage

By default, the BinaryDataReader/Writer uses the system byte order.

You simply set a ByteOrder enumeration value to the ByteOrder property of a BinaryDataReader/Writer - even between calls to Read* or Write(T) - to switch around the byte order whenever you want:

using (MemoryStream stream = new MemoryStream(...))
using (BinaryDataReader reader = new BinaryDataReader(stream))
{
    int intInSystemOrder = reader.ReadInt32();

    reader.ByteOrder = ByteOrder.BigEndian;
    int intInBigEndian = reader.ReadInt32();

    reader.ByteOrder = ByteOrder.LittleEndian;
    int intInLittleEndian = reader.ReadInt32();
}

Repetitive Data Types

When working with 3D file formats, I often had to read transformation matrices, which are 16 floats, one after another. Of course, I could write a highly specific "ReadMatrix" method, but I wanted to keep it reusable and added methods like ReadSingles / Write(T[]), which you pass the count of values you want to read. It internally just runs a for loop and calls the method with the corresponding singular name.

public Int32[] ReadInt32s(int count)
{
    return ReadMultiple(count, ReadInt32);
}

private T[] ReadMultiple<T>(int count, Func<T> readFunc)
{
    T[] values = new T[count];
    for (int i = 0; i < values.Length; i++)
    {
        values[i] = readFunc.Invoke();
    }
    return values;
}

Usage

Simply call any of the Read*s() methods with the number of values you want to have returned as an array, or the Write(T[]) methods with the array you want to write away into the stream:

using (MemoryStream stream = new MemoryStream(...))
using (BinaryDataReader reader = new BinaryDataReader(stream))
{
    int[] fortyFatInts = reader.ReadInt32s(40);
}

Different String Formats

Strings can be stored in different binary representations, and by default, the BinaryReader/Writer classes only support storing them with an unsigned integer prefix determining their length.

Most of the file formats I worked with used 0-terminated strings, e.g., those without any length prefix and just ending as soon as a byte of the value 0 has been read. So I added overloads for ReadString (or Write(string)) methods to which you can pass a value of the BinaryStringFormat enumeration, supporting the following representations:

ByteLengthPrefix: The string has a prefix of an unsigned byte, determining the number of the following characters.
WordLengthPrefix: The string has a prefix of a signed two-byte value (e.g. Int16 / short), determining the number of the following characters.
DwordLengthPrefix: The string has a prefix of a signed four-byte value (e.g. Int32 / int), determining the number of the following characters.
ZeroTerminated: The string has no prefix, but ends on the first encountered byte with the value 0.
NoPrefixOrTermination: The string has neither a prefix nor a postfix, and the length must be known to read it.

Usage

Whenever you want to read or write a string in the according format, use the corresponding method overloads:

using (MemoryStream stream = new MemoryStream(...))
using (BinaryDataReader reader = new BinaryDataReader(stream))
using (BinaryDataWriter writer = new BinaryDataWriter(stream))
{
    string magicBytes = reader.ReadString(4); // No prefix or termination just needs to know the length
    if (magicBytes != "RIFF")
    {
        throw new InvalidOperationException("Not a RIFF file.");
    }

    string zeroTerminated = reader.ReadString(BinaryStringFormat.ZeroTerminated);
    string netString = reader.ReadString();
    string anotherNetString = reader.ReadString(BinaryStringFormat.DwordLengthPrefix);

    writer.Write("RAW", BinaryStringFormat.NoPrefixOrTermination);
}

Due to NoPrefixOrTermination requiring you to know the number of characters you want to read, there is just an overload requiring the length instead of the BinaryStringFormat. You cannot use it directly in the ReadString overloads to which you would pass a BinaryStringFormat enumeration value.

One-time String Encoding

The default .NET BinaryReader/Writer classes allow you to specify a string encoding in the constructor, but they don't allow you - for example - to write a one-off ASCII string for an instance you created with UTF8 encoding. I added overloads to override this encoding by simply passing it to the ReadString or Write(string) calls.

The encoding of a standard .NET reader or writer cannot be changed at runtime. In fact, it cannot even be retrieved after creating the instances. My inherited classes remember that encoding though and it can be queried - but not set - through the Encoding property.

Usage

Simply pass the one-off encoding to the ReadString or Write(string) methods:

using (MemoryStream stream = new MemoryStream(...))
using (BinaryDataReader reader = new BinaryDataReader(stream, Encoding.ASCII))
using (BinaryDataWriter writer = new BinaryDataWriter(stream, Encoding.ASCII))
{
    string unicodeString = reader.ReadString(BinaryStringFormat.DwordLengthPrefix, Encoding.UTF8);
    string asciiString = reader.ReadString();
    Console.WriteLine(reader.Encoding.CodePage);
}

Different Date / Time Formats

Not only strings have different binary representations, DateTime instances can also be stored in different, common ways. These mostly differ at what point in time the tick with index 0 happened and how granular those ticks are, also determining the minimum and maximum DateTime value. Right now, the API only supports a few representations, which the BinaryDateTimeFormat enumeration spoils:

CTime: The time_t format of the C standard library
NetTicks: The default .NET DateTime format

Usage

As you might have guessed, this enumeration can be used like the string methods accepting a BinaryStringFormat. The new methods are ReadDateTime(BinaryDateFormat) / Write(DateTime value, BinaryDateTimeFormat):

using (MemoryStream stream = new MemoryStream(...))
using (BinaryDataReader reader = new BinaryDataReader(stream))
{
    DateTime cTime = reader.ReadDateTime(BinaryDateTimeFormat.CTime);
}

Advanced Stream Navigation

Another common task totally not covered in the default BinaryReader/Writer classes is temporarily seeking to another position, fetching or storing data there, then going back to the previous position.

I implemented temporary seeking with the using / IDisposable pattern. When you call TemporarySeek(long), it returns an instance of the SeekTask class, which immediately teleports the stream position to the one you specified. After it gets disposed, it returns to the position it had before the seek.

public class SeekTask : IDisposable
{
    public SeekTask(Stream stream, long offset, SeekOrigin origin)
    {
        Stream = stream;
        PreviousPosition = stream.Position;
        Stream.Seek(offset, origin);
    }

    public Stream Stream { get; private set; }

    /// <summary>
    /// Gets the absolute position to which the <see cref="Stream"/> will be rewound after this task is
    /// disposed.
    /// </summary>
    public long PreviousPosition { get; private set; }

    /// <summary>
    /// Rewinds the <see cref="Stream"/> to its previous position.
    /// </summary>
    public void Dispose()
    {
        Stream.Seek(PreviousPosition, SeekOrigin.Begin);
    }
}

Usage

Using TemporarySeek is much easier than the class above looks. Simply call it together with a using block, like in this example:

using (MemoryStream stream = new MemoryStream(...))
using (BinaryDataReader reader = new BinaryDataReader(stream))
{
    int offset = reader.ReadInt32();
    using (reader.TemporarySeek(offset, SeekOrigin.Begin))
    {
        byte[] dataAtOffset = reader.ReadBytes(128);
    }
    int dataAfterOffsetValue = reader.ReadInt32();
}

This first reads an offset from the file itself, then seeks to that offset read to fetch 128 more bytes from there. After that, the using block ends and the stream returns to the position after the offset originally read.

Of course, you can also use absolute offsets to seek to; this was just a common example as seen in many file formats.

Block Alignment

Several file formats are highly optimized to be loaded quickly by the hardware they will run on, and thus organize their data in blocks of a specific size in bytes. Some finicky calculation is required to seek to the start of the next block from the current position, but the BinaryDataReader/Writer classes simply wrap it for you in one call to which you pass the size of your blocks.

/// <summary>
/// Aligns the reader to the next given byte multiple.
/// </summary>
/// <param name="alignment">The byte multiple.</param>
public void Align(int alignment)
{
    Seek((-Position % alignment + alignment) % alignment);
}

Usage

Let's say you know your file is organized in blocks of 0x200 bytes size. Use Align as follows:

using (MemoryStream stream = new MemoryStream(...))
using (BinaryDataReader reader = new BinaryDataReader(stream))
{
    string header = reader.ReadString(4);

    reader.Align(0x200); // Seek to the start of the next block of 0x200 bytes size.
}

Shortcuts to Stream Properties

Some important stream properties or methods like Length, Position or Seek are a little buried in the default .NET BinaryReader/Writer, since you have to access them through BaseStream there. My classes forward those properties and you can directly access them on the reader / writer instance.

/// <summary>
/// Gets or sets the position within the current stream. This is a shortcut to the base stream Position
/// property.
/// </summary>
public long Position
{
    get { return BaseStream.Position; }
    set { BaseStream.Position = value; }
}

Usage

This should be straight forward, so let's make a 'random' example:

Random random = new Random();
using (MemoryStream stream = new MemoryStream(...))
using (BinaryDataWriter writer = new BinaryDataWriter(stream))
{
    while (writer.Position < 0x4000) // Directly accessing 'Position' here.
    {
        writer.Write(random.Next());
    }
}

Points of Interest

Optimizing the performance of the reader and writer is surely of high priority and I did the best I know without the use of unsafe code. Maybe someone knows ways to optimize it even further, and I'm eager to get to know the 'tricks' which can speed up binary data handling!

Don't forget to check out the NuGet package if you want to start using discussed features right away (note that the API has changed quite a bit over the years, and you can find the new documentation here).

History

2016-09-18: Initially published
2019-01-17: Update link to new NuGet package

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)