Improve Stream Reading Performance in C#

Christian Woltering

4.22/5 (10 votes)

22 Jan 2016MIT2 min read

49.3K

Improve reading performance of .NET streams using the Seek method instead of Position

Motivation

Due to the confusion the below code example seems to cause, here is a short motivation. Let's assume we want to read properties of an MP3 file, for example get a bitrate histogram of a file that is encoded with a variable bitrate. An MP3 file consists of a bunch of frames (in the magnitude of 10k for a 4 or 5 minute song). Each frame has a header of 4 bytes (see Wikipedia article), which contains information like the bitrate and the length of the data block that follows the header. To build the histogram, we need a buffer of 4 bytes, read the header and then skip to the next header.

Make sure to read the remarks section at the bottom of this tip. If you care about performance, you should probably avoid seeking at all. But since I've seen a lot of code that uses the Position property for seeking, I thought it was worth a tip ...

Performance Tests

The values of N and SKIP in the code below are chosen deliberately to illustrate the performance differences, even for small file sizes.

Implementing that "read some bytes, then skip a few" behavior, you might find yourself writing code like the following:

// Number of bytes to read.
private const int N = 4;

// Number of bytes to skip.
private const int SKIP = 3;

private static void Test1(string file)
{
    var s = new Stopwatch();

    // Open the file for reading.
    using (var stream = File.OpenRead(file))
    {
        long hash = 0;

        int count;

        byte[] buffer = new byte[N];

        s.Start();

        // Read a couple of bytes from the stream.
        while ((count = stream.Read(buffer, 0, N)) > 0)
        {
            hash += Checksum(buffer, 0, count);

            stream.Position += SKIP;
        }

        s.Stop();

        Console.WriteLine("File size: {0} bytes", stream.Length);
        Console.WriteLine("Elapsed  : {0} ms", s.ElapsedMilliseconds);
        Console.WriteLine("Checksum : {0}", hash);
    }
}

private static int Checksum(byte[] buffer, int offsetStart, int offsetEnd)
{
    int sum = 0;

    for (int i = offsetStart; i < offsetEnd; i++)
    {
        sum += buffer[i];
    }

    return sum;
}

This seems straightforward, but with just one change you can greatly improve performance. Let's see what happens if we use stream.Seek instead of stream.Position:

private static void Test2(string file)
{
    var s = new Stopwatch();

    // Open the file for reading.
    using (var stream = File.OpenRead(file))
    {
        long hash = 0;
        long position = 0;

        byte[] buffer = new byte[N];

        int count;

        s.Start();

        // Read a couple of bytes from the stream.
        while ((count = stream.Read(buffer, 0, N)) > 0)
        {
            hash += Checksum(buffer, 0, count);
            position += (N + SKIP);

            stream.Seek(SKIP, SeekOrigin.Current);
        }

        s.Stop();

        Console.WriteLine("File size: {0} bytes", stream.Length);
        Console.WriteLine("Elapsed  : {0} ms", s.ElapsedMilliseconds);
        Console.WriteLine("Checksum : {0}", hash);
    }
}

We improved performance by a factor of 4. That's impressive.

As a last test, let's see what happens if we don't seek from the current position, but from the beginning of the stream:

private static void Test3(string file)
{
    var s = new Stopwatch();

    // Open the file for reading.
    using (var stream = File.OpenRead(file))
    {
        long hash = 0;
        long position = 0;

        byte[] buffer = new byte[N];

        int count;

        s.Start();

        // Read a couple of bytes from the stream.
        while ((count = stream.Read(buffer, 0, N)) > 0)
        {
            hash += Checksum(buffer, 0, count);
            position += (N + SKIP);

            stream.Seek(position, SeekOrigin.Begin);
        }

        s.Stop();

        Console.WriteLine("File size: {0} bytes", stream.Length);
        Console.WriteLine("Elapsed  : {0} ms", s.ElapsedMilliseconds);
        Console.WriteLine("Checksum : {0}", hash);
    }
}

Again, we see a small improvement (by a factor of 1.2).

Here's a sample output calling the test functions on a 6.27MB file (I made sure to call Test1 twice, the first call as a warm up and to make sure the file gets cached):

File size: 6581961 bytes
Elapsed  : 12654 ms
Checksum : 100462446

File size: 6581961 bytes
Elapsed  : 3184 ms
Checksum : 100462446

File size: 6581961 bytes
Elapsed  : 2668 ms
Checksum : 100462446

Conclusion

In the above example, we get an overall speed-up factor of 4.8. Results may vary on different PCs, but here are some rules you should follow when reading from .NET streams:

Avoid setting the Position property. Always prefer the Seek method
Avoid reading properties in loops (like Position or Length)
Prefer using SeekOrigin.Begin

Remarks

Say you want to seek to a particular time offset in an audio file. That's obviously a valid use-case for seeking in a stream, but here it won't make a difference if you are using stream.Position or stream.Seek since it is just a single call. On the other hand, using seeking the way it is implemented above will always degrade performance in a massive way.

So, I guess my conclusion stays valid: if you do seeking, prefer the Seek method. But as a result of the discussion with GravityPhazer (see comments), here is a solution that doesn't use seeking at all. It's a bit more involved, because you need a way to synchronize two successive buffer reads, but it pays: runtime 50ms.

private static void Test4(string file)
{
    const int SIZE = 1024;

    var s = new Stopwatch();

    // Open the file for reading.
    using (var stream = File.OpenRead(file))
    {
        long hash = 0;

        byte[] buffer = new byte[SIZE];

        int position = 0;
        int count, end;

        s.Start();

        // Fill the buffer.
        while ((count = stream.Read(buffer, 0, SIZE)) > 0)
        {
            if (position > SKIP)
            {
                // The previous frame overlapped with the current.
                hash += Checksum(buffer, 0, position - SKIP);
            }

            // Process the buffer.
            while (position < count)
            {
                end = position + N;

                if (end > count) end = count;

                hash += Checksum(buffer, position, end);
                position += (N + SKIP);
            }

            // Set the correct offset.
            position = position % SIZE;
        }

        s.Stop();

        Console.WriteLine("File size: {0} bytes", stream.Length);
        Console.WriteLine("Elapsed  : {0} ms", s.ElapsedMilliseconds);
        Console.WriteLine("Checksum : {0}", hash);
    }
}

License

This article, along with any associated source code and files, is licensed under The MIT License