Motivation
Due to the confusion the below code example seems to cause, here is a short motivation. Let's assume we want to read properties of an MP3 file, for example get a bitrate histogram of a file that is encoded with a variable bitrate. An MP3 file consists of a bunch of frames (in the magnitude of 10k for a 4 or 5 minute song). Each frame has a header of 4 bytes (see Wikipedia article), which contains information like the bitrate and the length of the data block that follows the header. To build the histogram, we need a buffer of 4 bytes, read the header and then skip to the next header.
Make sure to read the remarks section at the bottom of this tip. If you care about performance, you should probably avoid seeking at all. But since I've seen a lot of code that uses the Position
property for seeking, I thought it was worth a tip ...
Performance Tests
The values of N
and SKIP
in the code below are chosen deliberately to illustrate the performance differences, even for small file sizes.
Implementing that "read some bytes, then skip a few" behavior, you might find yourself writing code like the following:
private const int N = 4;
private const int SKIP = 3;
private static void Test1(string file)
{
var s = new Stopwatch();
using (var stream = File.OpenRead(file))
{
long hash = 0;
int count;
byte[] buffer = new byte[N];
s.Start();
while ((count = stream.Read(buffer, 0, N)) > 0)
{
hash += Checksum(buffer, 0, count);
stream.Position += SKIP;
}
s.Stop();
Console.WriteLine("File size: {0} bytes", stream.Length);
Console.WriteLine("Elapsed : {0} ms", s.ElapsedMilliseconds);
Console.WriteLine("Checksum : {0}", hash);
}
}
private static int Checksum(byte[] buffer, int offsetStart, int offsetEnd)
{
int sum = 0;
for (int i = offsetStart; i < offsetEnd; i++)
{
sum += buffer[i];
}
return sum;
}
This seems straightforward, but with just one change you can greatly improve performance. Let's see what happens if we use stream.Seek
instead of stream.Position
:
private static void Test2(string file)
{
var s = new Stopwatch();
using (var stream = File.OpenRead(file))
{
long hash = 0;
long position = 0;
byte[] buffer = new byte[N];
int count;
s.Start();
while ((count = stream.Read(buffer, 0, N)) > 0)
{
hash += Checksum(buffer, 0, count);
position += (N + SKIP);
stream.Seek(SKIP, SeekOrigin.Current);
}
s.Stop();
Console.WriteLine("File size: {0} bytes", stream.Length);
Console.WriteLine("Elapsed : {0} ms", s.ElapsedMilliseconds);
Console.WriteLine("Checksum : {0}", hash);
}
}
We improved performance by a factor of 4. That's impressive.
As a last test, let's see what happens if we don't seek from the current position, but from the beginning of the stream
:
private static void Test3(string file)
{
var s = new Stopwatch();
using (var stream = File.OpenRead(file))
{
long hash = 0;
long position = 0;
byte[] buffer = new byte[N];
int count;
s.Start();
while ((count = stream.Read(buffer, 0, N)) > 0)
{
hash += Checksum(buffer, 0, count);
position += (N + SKIP);
stream.Seek(position, SeekOrigin.Begin);
}
s.Stop();
Console.WriteLine("File size: {0} bytes", stream.Length);
Console.WriteLine("Elapsed : {0} ms", s.ElapsedMilliseconds);
Console.WriteLine("Checksum : {0}", hash);
}
}
Again, we see a small improvement (by a factor of 1.2).
Here's a sample output calling the test functions on a 6.27MB file (I made sure to call Test1
twice, the first call as a warm up and to make sure the file gets cached):
File size: 6581961 bytes
Elapsed : 12654 ms
Checksum : 100462446
File size: 6581961 bytes
Elapsed : 3184 ms
Checksum : 100462446
File size: 6581961 bytes
Elapsed : 2668 ms
Checksum : 100462446
Conclusion
In the above example, we get an overall speed-up factor of 4.8. Results may vary on different PCs, but here are some rules you should follow when reading from .NET streams:
- Avoid setting the
Position
property. Always prefer the Seek
method - Avoid reading properties in loops (like
Position
or Length
) - Prefer using
SeekOrigin.Begin
Remarks
Say you want to seek to a particular time offset in an audio file. That's obviously a valid use-case for seeking in a stream, but here it won't make a difference if you are using stream.Position
or stream.Seek
since it is just a single call. On the other hand, using seeking the way it is implemented above will always degrade performance in a massive way.
So, I guess my conclusion stays valid: if you do seeking, prefer the Seek
method. But as a result of the discussion with GravityPhazer (see comments), here is a solution that doesn't use seeking at all. It's a bit more involved, because you need a way to synchronize two successive buffer reads, but it pays: runtime 50ms.
private static void Test4(string file)
{
const int SIZE = 1024;
var s = new Stopwatch();
using (var stream = File.OpenRead(file))
{
long hash = 0;
byte[] buffer = new byte[SIZE];
int position = 0;
int count, end;
s.Start();
while ((count = stream.Read(buffer, 0, SIZE)) > 0)
{
if (position > SKIP)
{
hash += Checksum(buffer, 0, position - SKIP);
}
while (position < count)
{
end = position + N;
if (end > count) end = count;
hash += Checksum(buffer, position, end);
position += (N + SKIP);
}
position = position % SIZE;
}
s.Stop();
Console.WriteLine("File size: {0} bytes", stream.Length);
Console.WriteLine("Elapsed : {0} ms", s.ElapsedMilliseconds);
Console.WriteLine("Checksum : {0}", hash);
}
}