Introduction
Many algorithms such as those used in data compression deal with variable-length data. Variable-length data is the data that cannot be fully expressed as a primitive data type. For instance, the .NET Framework has the following primitive data types:
Primitive data type | C# equivalent | Nominal storage allocation (bits) |
System.Byte | byte | 8 |
System.SByte | sbyte | 8 |
System.Boolean | bool | 16 |
System.Char | char | 16 |
System.UInt16 | ushort | 16 |
System.Int16 | short | 16 |
System.UInt32 | uint | 32 |
System.Int32 | int | 32 |
System.UInt64 | ulong | 64 |
System.Int64 | long | 64 |
System.Single | float | 32 |
System.Double | double | 64 |
Variable-length data, on the other hand, has no set nominal storage allocation defined for it. Each data element can contain any number of bits such as 1 bit, 12 bits, 23 bits etc.
The .NET Framework provides two classes to help read and write variable-length data: the BitVector32
and BitArray
classes.
The BitVector32
class (namespace: System.Collections.Specialized
) stores boolean values and small integers in 32 bits of memory, which is optimal for x86 32-bit processing. However, its major drawbacks are that it can only be used to store 32 bits of data and its structure tends to be overly abstracted for the simple tasks it was designed to perform.
The BitArray
class (namespace: System.Collections
) manages a compact array of bit values, which are represented as booleans. Its internal buffer is an array of 32-bit integers. As with the BitVector32
class, this is optimal for x86 32-bit processing. However, its buffer is fixed in length and cannot be expanded as required once it has been instanced. Another issue with the BitArray
class is that it primarily has methods for reading and writing one bit at a time, which proves to be inefficient with large amounts of data.
This article focuses on the BitStream
class which combines, I believe, the best elements of the BitVector32
and BitArray
classes and also addresses their weaknesses. The BitStreamSample solution and the BitStream
class are written in C#, which I compiled successfully with v1.1 of the .NET Framework using Visual Studio.NET 2003.
An expandable stream
The internal buffer of the BitStream
class is an array of 32-bit unsigned integers:
private uint [] _auiBitBuffer;
This buffer expands automatically, depending on the needs of the calling application.
The internal buffer stores multi-byte data in big-endian byte format. For example, the System.Int32
value of 65500 is stored in the internal buffer as:
Byte offset | Bits in byte |
00 | 00000000 |
01 | 00000000 |
02 | 11111111 |
03 | 11011100 |
The BitStream
class has three constructors:
public BitStream();
public BitStream(long capacity);
public BitStream(Stream bits);
The capacity
parameter in the second constructor represents the approximated final length of the internal buffer in bits. Closer approximations lead to faster Write operations. The reason for this is because there are fewer memory re-allocations necessary to expand the BitStream
’s internal buffer at run-time.
Writing to the BitStream
There are several overloaded Write
methods in the BitStream
class. These range from writing a single bit to the BitStream
:
public virtual void Write(bool bit);
to writing an array of 64-bit elements to the BitStream
:
public virtual void Write(ulong [] bits, int offset, int count);
public virtual void Write(long [] bits, int offset, int count);
public virtual void Write(decimal [] bits, int offset, int count);
There are also methods that allow you to write a select number of bits from a primitive data type to the BitStream
:
public virtual void Write(byte bits, int bitIndex, int count);
public virtual void Write(sbyte bits, int bitIndex, int count);
public virtual void Write(char bits, int bitIndex, int count);
public virtual void Write(ushort bits, int bitIndex, int count);
public virtual void Write(short bits, int bitIndex, int count);
public virtual void Write(uint bits, int bitIndex, int count);
public virtual void Write(int bits, int bitIndex, int count);
public virtual void Write(ulong bits, int bitIndex, int count);
public virtual void Write(long bits, int bitIndex, int count);
public virtual void Write(float bits, int bitIndex, int count);
public virtual void Write(double bits, int bitIndex, int count);
In the above methods, the bits
parameter specifies the bits to write data from.
The bitIndex
parameter specifies the little-endian bit index to begin writing from.
And the count
parameter specifies the maximum number of bits to write.
For example, the byte value 217 can be represented in binary notation as:
Bit Index | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
Bits | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 1 |
Here, bit zero is the LSB (Least Significant Bit) and bit 7 is the MSB (Most Significant Bit).
The statement to write bits 1 to 6 of the byte value 217 to the BitStream
is:
Write((byte)217, 1, 6);
Please refer to the BitStream class reference for further information on writing to a BitStream.
Reading from the BitStream
As with writing to the BitStream
, there are several overloaded Read
methods in the BitStream
class. These range from reading a single bit:
public virtual int Read(out bool bit);
to reading an array of 64-bit elements from the BitStream
:
public virtual int Read(ulong [] bits, int offset, int count);
public virtual int Read(long [] bits, int offset, int count);
public virtual int Read(double [] bits, int offset, int count);
There are also methods that allow reading a select number of bits to a primitive data type from the BitStream
:
public virtual int Read(out byte bits, int bitIndex, int count);
public virtual int Read(out sbyte bits, int bitIndex, int count);
public virtual int Read(out char bits, int bitIndex, int count);
public virtual int Read(out ushort bits, int bitIndex, int count);
public virtual int Read(out short bits, int bitIndex, int count);
public virtual int Read(out uint bits, int bitIndex, int count);
public virtual int Read(out int bits, int bitIndex, int count);
public virtual int Read(out ulong bits, int bitIndex, int count);
public virtual int Read(out long bits, int bitIndex, int count);
public virtual int Read(out float bits, int bitIndex, int count);
public virtual int Read(out double bits, int bitIndex, int count);
In the above methods, the bits
parameter contains the bits between bitIndex
and (bitIndex + count - 1
) which are replaced by the bits read from the current BitStream
.
The bitIndex
parameter specifies the bit index at which to begin reading.
And the count
parameter specifies the maximum number of bits to read.
The return value specifies the number of bits written into the primitive data type. This can be less than the number of bits requested if that number of bits are not currently available, or zero if the end of the current BitStream
is reached before any bits are read.
The main reason for designing this group of Read
methods to use out
parameters was for the sake of consistency. All the Read
methods in the BitStream
class are designed so that they do not throw an exception when reading past the end of the stream. Instead, they return the actual number of bits read. Consequently, all the Read
methods in the BitStream
class return a single primitive data type defining the actual number of bits read.
For example, the statements to read 7 bits from the BitStream
and store it in a byte
value, beginning at bit index 1, are:
byte bytValue;
int iBitsRead = Read(out bytValue, 1, 7);
Please refer to the BitStream class reference for further information on reading from a BitStream.
Other methods
The BitStream
class contains methods to perform logical operations:
public virtual BitStream And(BitStream bits);
public virtual BitStream Or(BitStream bits);
public virtual BitStream Xor(BitStream bits);
public virtual BitStream Not();
And methods to perform both left and right bit shifts on the entire stream:
public virtual BitStream ShiftLeft(long count);
public virtual BitStream ShiftRight(long count);
There are also several overloaded ToString
methods which display the contents of the BitStream
and the primitive data types in binary notation:
public override string ToString();
public static string ToString(bool bit);
public static string ToString(byte bits);
public static string ToString(char bits);
public static string ToString(sbyte bits);
public static string ToString(ushort bits);
public static string ToString(short bits);
public static string ToString(uint bits);
public static string ToString(int bits);
public static string ToString(ulong bits);
public static string ToString(long bits);
public static string ToString(float bits);
public static string ToString(double bits);
These methods can prove to be useful during debugging.
Detailed documentation of these and other methods can be found in the BitStream class reference.
The BitStreamSample application
The BitStreamSample application gives you the ability to take the BitStream
class for a test run.
It contains a BitStream properties panel on the left that displays Capacity, Length, Position (read/write), the last number of bits written to the BitStream
and the last number of bits read from the BitStream
. It also displays the contents of the BitStream
’s internal buffer in binary notation. These fields are automatically updated as data is read from and written to the BitStream
.
Use the Write tab panel to write to the BitStream
. It contains a list of primitive data types. Each data type has a modifiable value, bit index and count, and a write button () associated with writing.
Use the Read tab panel to read from the BitStream
. It also contains the same list of primitive data types as those found in the Write tab. Each data type has a modifiable bit index and count, and a read button () associated with reading. The value field is there for display purposes only.
Notes: The "Bit" type in both the Read and Write tab panels is actually a System.Boolean
which is converted to a System.Int32
type for clearer display and easier input. The "Char" type accepts a single character when writing to the BitStream
and displays a single character when reading from the BitStream
.
History
- November 16th, 2005:
- November 23rd, 2005:
- Added comments in the article to clarify that the
BitStream
stores multi-byte data in big-endian byte format.
- Added the class constructor that initializes a new instance of the
BitStream
class with the bits provided by the specified stream.
- November 24th, 2005:
- Added
public virtual byte [] ToByteArray();
method.
- The
public override int ReadByte();
and public override void WriteByte(byte value);
methods are now supported by the BitStream
class.
- November 25th, 2005:
- November 27th, 2005:
- Added
public virtual void WriteTo(Stream bits);
to write the contents of the current bit stream to another stream.
- December 01st, 2005:
- Fixed the problem with
public virtual void Write(ulong bits, int bitIndex, int count)
and public virtual int Read(out ulong bits, int bitIndex, int count)
methods.