Introduction
This article will discuss how to extend the functionality of the standard .NET BinaryReader
and BinaryWriter
classes to support a lot of new, generally applicable and useful features.
The API discussed in this article has been wrapped into a NuGet package for you to easily implement it in your existing .NET 4.5 projects.
A separate wiki for the NuGet package is found on the GitLab repository which is focusing the implementors side more, but this article also discusses how the package evolved and what I had to consider when writing its internals.
Background
Whenever I wrote a library to handle loading and saving a custom binary file format - sometimes complex proprietary formats of other companies - I used the standard .NET BinaryReader
and BinaryWriter
classes to parse the binary data out of these files.
However, the more complex the formats got, the more I missed pretty generic tasks and features in these .NET classes. Especially, I was looking for the following functionality:
- Handle data stored in a different byte order than the one of the executing machine.
- Understand
string
s not stored in the .NET format, for example, 0-terminated string
s. - Simple reading and storing of repetitive data types (e.g., loading 12
Int32
s), without writing for
loops over and over. - Single-time reading
string
s with a different encoding than the one specified for the whole reader / writer instance. - Easier navigation around in the file, like temporarily seeking to an offset, then seeking back, or aligning to a block size.
At first, I wrote extension methods appending new features to the existing BinaryReader
or BinaryWriter
classes. However, this was insufficient for implementing the behavior of reading data in a byte order different than the system executing the code. This eventually lead me to create new classes named BinaryDataReader
and BinaryDataWriter
, inheriting from the .NET ones.
Let's see how I realized the different aspects listed above, and have a look at how to use it from the implementors side.
Implementation & Usage
Byte Order
Implementing the support for reading multi-byte data in a different byte order than the one of the machine which is executing the code required a lot of changes to the standard .NET class.
Remember that .NET does not define the endianness of the data it is working with. Thus, it might parse data either in big or little endian, depending on what machine you are running it on.
So at first, it was important to detect the system's byte encoding correctly. This is trivial by simply checking the System.BitConverter.IsLittleEndian
field:
ByteOrder systemByteOrder = BitConverter.IsLittleEndian ? ByteOrder.LittleEndian : ByteOrder.BigEndian;
As you can see, I also introduced a ByteOrder
enumeration which values, by the way, map to the byte order mark seen in files, in case this comes in handy for you:
public enum ByteOrder : ushort
{
BigEndian = 0xFEFF,
LittleEndian = 0xFFFE
}
To make the reader / writer respect this, I introduced a ByteOrder
property to which one of those enumeration values can be set. It checks whether the system runs in a different byte order than the one set to reverse the read / written bytes if required:
public ByteOrder ByteOrder
{
get
{
return _byteOrder;
}
set
{
_byteOrder = value;
_needsReversion = _byteOrder != ByteOrder.GetSystemByteOrder();
}
}
Then, I had to override every Read*
(in case of the BinaryDataReader
) or Write(T)
(in case of the BinaryDataWriter
) method to respect the _needsReversion
boolean:
public override Int32 ReadInt32()
{
if (_needsReversion)
{
byte[] bytes = base.ReadBytes(sizeof(int));
Array.Reverse(bytes);
return BitConverter.ToInt32(bytes, 0);
}
else
{
return base.ReadInt32();
}
}
This worked out really well with the help of all the BitConverter.ToXxx
methods, allowing me to retrieve the bytes of multi-byte values or convert bytes to multi-byte values. Decimal
values were an oddball however, requiring some manual work of byte conversion, with the code from here taken as the basis.
Usage
By default, the BinaryDataReader/Writer
uses the system byte order.
You simply set a ByteOrder
enumeration value to the ByteOrder
property of a BinaryDataReader/Writer
- even between calls to Read*
or Write(T)
- to switch around the byte order whenever you want:
using (MemoryStream stream = new MemoryStream(...))
using (BinaryDataReader reader = new BinaryDataReader(stream))
{
int intInSystemOrder = reader.ReadInt32();
reader.ByteOrder = ByteOrder.BigEndian;
int intInBigEndian = reader.ReadInt32();
reader.ByteOrder = ByteOrder.LittleEndian;
int intInLittleEndian = reader.ReadInt32();
}
Repetitive Data Types
When working with 3D file formats, I often had to read transformation matrices, which are 16 floats, one after another. Of course, I could write a highly specific "ReadMatrix
" method, but I wanted to keep it reusable and added methods like ReadSingles
/ Write(T[])
, which you pass the count of values you want to read. It internally just runs a for
loop and calls the method with the corresponding singular name.
public Int32[] ReadInt32s(int count)
{
return ReadMultiple(count, ReadInt32);
}
private T[] ReadMultiple<T>(int count, Func<T> readFunc)
{
T[] values = new T[count];
for (int i = 0; i < values.Length; i++)
{
values[i] = readFunc.Invoke();
}
return values;
}
Usage
Simply call any of the Read*s()
methods with the number of values you want to have returned as an array, or the Write(T[])
methods with the array you want to write away into the stream:
using (MemoryStream stream = new MemoryStream(...))
using (BinaryDataReader reader = new BinaryDataReader(stream))
{
int[] fortyFatInts = reader.ReadInt32s(40);
}
Different String Formats
String
s can be stored in different binary representations, and by default, the BinaryReader/Writer
classes only support storing them with an unsigned integer prefix determining their length.
Most of the file formats I worked with used 0-terminated string
s, e.g., those without any length prefix and just ending as soon as a byte of the value 0
has been read. So I added overloads for ReadString
(or Write(string)
) methods to which you can pass a value of the BinaryStringFormat
enumeration, supporting the following representations:
ByteLengthPrefix
: The string
has a prefix of an unsigned byte, determining the number of the following characters. WordLengthPrefix
: The string
has a prefix of a signed two-byte value (e.g. Int16
/ short
), determining the number of the following characters. DwordLengthPrefix
: The string
has a prefix of a signed four-byte value (e.g. Int32
/ int
), determining the number of the following characters. ZeroTerminated
: The string
has no prefix, but ends on the first encountered byte with the value 0
. NoPrefixOrTermination
: The string
has neither a prefix nor a postfix, and the length must be known to read it.
Usage
Whenever you want to read or write a string
in the according format, use the corresponding method overloads:
using (MemoryStream stream = new MemoryStream(...))
using (BinaryDataReader reader = new BinaryDataReader(stream))
using (BinaryDataWriter writer = new BinaryDataWriter(stream))
{
string magicBytes = reader.ReadString(4);
if (magicBytes != "RIFF")
{
throw new InvalidOperationException("Not a RIFF file.");
}
string zeroTerminated = reader.ReadString(BinaryStringFormat.ZeroTerminated);
string netString = reader.ReadString();
string anotherNetString = reader.ReadString(BinaryStringFormat.DwordLengthPrefix);
writer.Write("RAW", BinaryStringFormat.NoPrefixOrTermination);
}
Due to NoPrefixOrTermination
requiring you to know the number of characters you want to read, there is just an overload requiring the length instead of the BinaryStringFormat
. You cannot use it directly in the ReadString
overloads to which you would pass a BinaryStringFormat
enumeration value.
One-time String Encoding
The default .NET BinaryReader/Writer
classes allow you to specify a string
encoding in the constructor, but they don't allow you - for example - to write a one-off ASCII string
for an instance you created with UTF8 encoding. I added overloads to override this encoding by simply passing it to the ReadString
or Write(string)
calls.
The encoding of a standard .NET reader or writer cannot be changed at runtime. In fact, it cannot even be retrieved after creating the instances. My inherited classes remember that encoding though and it can be queried - but not set - through the Encoding
property.
Usage
Simply pass the one-off encoding to the ReadString
or Write(string)
methods:
using (MemoryStream stream = new MemoryStream(...))
using (BinaryDataReader reader = new BinaryDataReader(stream, Encoding.ASCII))
using (BinaryDataWriter writer = new BinaryDataWriter(stream, Encoding.ASCII))
{
string unicodeString = reader.ReadString(BinaryStringFormat.DwordLengthPrefix, Encoding.UTF8);
string asciiString = reader.ReadString();
Console.WriteLine(reader.Encoding.CodePage);
}
Different Date / Time Formats
Not only string
s have different binary representations, DateTime
instances can also be stored in different, common ways. These mostly differ at what point in time the tick with index 0
happened and how granular those ticks are, also determining the minimum and maximum DateTime
value. Right now, the API only supports a few representations, which the BinaryDateTimeFormat
enumeration spoils:
CTime
: The time_t format of the C standard library NetTicks
: The default .NET DateTime
format
Usage
As you might have guessed, this enumeration can be used like the string
methods accepting a BinaryStringFormat
. The new methods are ReadDateTime(BinaryDateFormat)
/ Write(DateTime value, BinaryDateTimeFormat)
:
using (MemoryStream stream = new MemoryStream(...))
using (BinaryDataReader reader = new BinaryDataReader(stream))
{
DateTime cTime = reader.ReadDateTime(BinaryDateTimeFormat.CTime);
}
Advanced Stream Navigation
Another common task totally not covered in the default BinaryReader/Writer
classes is temporarily seeking to another position, fetching or storing data there, then going back to the previous position.
I implemented temporary seeking with the using
/ IDisposable
pattern. When you call TemporarySeek(long)
, it returns an instance of the SeekTask
class, which immediately teleports the stream
position to the one you specified. After it gets disposed, it returns to the position it had before the seek.
public class SeekTask : IDisposable
{
public SeekTask(Stream stream, long offset, SeekOrigin origin)
{
Stream = stream;
PreviousPosition = stream.Position;
Stream.Seek(offset, origin);
}
public Stream Stream { get; private set; }
public long PreviousPosition { get; private set; }
public void Dispose()
{
Stream.Seek(PreviousPosition, SeekOrigin.Begin);
}
}
Usage
Using TemporarySeek
is much easier than the class above looks. Simply call it together with a using
block, like in this example:
using (MemoryStream stream = new MemoryStream(...))
using (BinaryDataReader reader = new BinaryDataReader(stream))
{
int offset = reader.ReadInt32();
using (reader.TemporarySeek(offset, SeekOrigin.Begin))
{
byte[] dataAtOffset = reader.ReadBytes(128);
}
int dataAfterOffsetValue = reader.ReadInt32();
}
This first reads an offset from the file itself, then seeks to that offset read to fetch 128 more bytes from there. After that, the using
block ends and the stream returns to the position after the offset originally read.
Of course, you can also use absolute offsets to seek to; this was just a common example as seen in many file formats.
Block Alignment
Several file formats are highly optimized to be loaded quickly by the hardware they will run on, and thus organize their data in blocks of a specific size in bytes. Some finicky calculation is required to seek to the start of the next block from the current position, but the BinaryDataReader/Writer
classes simply wrap it for you in one call to which you pass the size of your blocks.
public void Align(int alignment)
{
Seek((-Position % alignment + alignment) % alignment);
}
Usage
Let's say you know your file is organized in blocks of 0x200
bytes size. Use Align
as follows:
using (MemoryStream stream = new MemoryStream(...))
using (BinaryDataReader reader = new BinaryDataReader(stream))
{
string header = reader.ReadString(4);
reader.Align(0x200);
}
Shortcuts to Stream Properties
Some important stream
properties or methods like Length
, Position
or Seek
are a little buried in the default .NET BinaryReader/Writer
, since you have to access them through BaseStream
there. My classes forward those properties and you can directly access them on the reader / writer instance.
public long Position
{
get { return BaseStream.Position; }
set { BaseStream.Position = value; }
}
Usage
This should be straight forward, so let's make a 'random' example:
Random random = new Random();
using (MemoryStream stream = new MemoryStream(...))
using (BinaryDataWriter writer = new BinaryDataWriter(stream))
{
while (writer.Position < 0x4000)
{
writer.Write(random.Next());
}
}
Points of Interest
Optimizing the performance of the reader and writer is surely of high priority and I did the best I know without the use of unsafe code. Maybe someone knows ways to optimize it even further, and I'm eager to get to know the 'tricks' which can speed up binary data handling!
Don't forget to check out the NuGet package if you want to start using discussed features right away (note that the API has changed quite a bit over the years, and you can find the new documentation here).
History
- 2016-09-18: Initially published
- 2019-01-17: Update link to new NuGet package