Reading Multi-channel WAV Files with .NET 2.0 (C#)

ArcticEcho

4.85/5 (8 votes)

15 Dec 2014GPL33 min read

27.4K

This tip explains how to read WAV (wave) files containing multiple channels (2+) using .Net 2.0+ (in C#).

Introduction

This tip aims to provide you with the necessary knowledge (and code) for reading WAV files with multiple channels (typically more than 2, but the same principles apply to mono/stereo files) in C#, using .NET 2.0 or higher. It is assumed you have a basic understanding of a WAV's header structure (if not, take a moment to read this); complemented with a moderate knowledge of the .NET Framework (and C#).

Now, we all know that a standard stereo WAV file is interleaved with the left channel first, and then the right. But what if a file has more than 2 channels? And how can we tell which channel is which? Thankfully, to overcome this issue, Microsoft has created a standard that covers up to 18 channels.

According to them, the WAV file needs to have a special meta sub-chunk (under the "Extensible Format" section), also known as "WAVE_FORMAT_EXTENSIBLE", which specifies a "channel mask" (dwChannelMask). This field is 4 bytes long (an unsigned integer) which contains the corresponding bits of each channel that is present, therefore indicating which of the 18 channels are present within the file.

The Master Channel Layout

Below is the MCL, that is, the order in which existing channels should be interleaved, along with the corresponding bit value for each channel.

Order |  Bit  | Channel

 1.        0x1 Front Left
 2.        0x2 Front Right
 3.        0x4 Front Center
 4.        0x8 Low Frequency (LFE)
 5.       0x10 Back Left (Surround Back Left)
 6.       0x20 Back Right (Surround Back Right)
 7.       0x40 Front Left of Center
 8.       0x80 Front Right of Center
 9.      0x100 Back Center
10.      0x200 Side Left (Surround Left)
11.      0x400 Side Right (Surround Right)
12.      0x800 Top Center
13.     0x1000 Top Front Left
14.     0x2000 Top Front Center
15.     0x4000 Top Front Right
16.     0x8000 Top Back Left
17.    0x10000 Top Back Center
18.    0x20000 Top Back Right

For example, a channel mask of 0x63F would indicate that the file contains 8 channels: FL, FR, FC, LFE, BL, BR, SL & SR. (Please note, channel locations beyond this predefined set of 18 are considered "reserved"; you should not make any assumptions regarding the ordering of channels beyond these.)

Reading the Channel Mask

Now, to read the mask of a standard WAV file, you must read the 40^th through 43^rd byte (inclusive; assuming a base index of 0). For example:

var bytes = new byte[50];

using (var stream = new FileStream("filepath...", FileMode.Open))
{
    stream.Read(bytes, 0, 50);
}

var speakerMask = BitConverter.ToUInt32(new[] { bytes[40], bytes[41], bytes[42], bytes[43] }, 0);

Then you can check which channels exist. To do this, I'd suggest creating an enum (defined with [Flags]) that contains all the channels (and their respective values).

[Flags]
public enum Channels : uint
{
    FrontLeft = 0x1,
    FrontRight = 0x2,
    FrontCenter = 0x4,
    Lfe = 0x8,
    BackLeft = 0x10,
    BackRight = 0x20,
    FrontLeftOfCenter = 0x40,
    FrontRightOfCenter = 0x80,
    BackCenter = 0x100,
    SideLeft = 0x200,
    SideRight = 0x400,
    TopCenter = 0x800,
    TopFrontLeft = 0x1000,
    TopFrontCenter = 0x2000,
    TopFrontRight = 0x4000,
    TopBackLeft = 0x8000,
    TopBackCenter = 0x10000,
    TopBackRight = 0x20000
}

And then finally, (if you wish to) you can populate a List<Channels> of all present channels.

var foundChannels = new List<Channels>();

foreach (var ch in Enum.GetValues(typeof(Channels)))
{
    if ((speakerMask & (uint)ch) == (uint)ch)
    {
        foundChannels.Add((Channels)ch);
    } 
}

What If the Speaker Mask Doesn't Exist?

If the file's wFormatTag field (the field that is normally used for specifying the encoding of the audio data) is not set to 0xFFFE you will need to create the mask yourself! Based on the file's channel count, you will either have to guess which channels are used, or just blindly follow the MCL. In the below code snippet, we're doing a bit of both.

static uint GetSpeakerMask(int channelCount)
{
    // Assume a setup of: FL, FR, FC, LFE, BL, BR, SL & SR. 
    // Otherwise, MCL will use: FL, FR, FC, LFE, BL, BR, FLoC & FRoC.
    if (channelCount == 8)
    {
        return 0x63F;
    }

    // Otherwise follow MCL.
    uint mask = 0;
    var channels = new Channels[18];
    Enum.GetValues(typeof(Channels)).CopyTo(channels, 0);

    for (var i = 0; i < channelCount; i++)
    {
        mask += (uint)channels[i];
    }

    return mask;
}

Extracting the Samples

To actually read the samples of a particular channel, you follow the exact same process as if the file were stereo, that is, you increment your loop's counter by frame size (in bytes).

frameSize = (bitDepth / 8) * channelCount

And just like if you were reading the right channel of a stereo file, you also need to offset your loop's starting index. This is where things become more complicated, as you have to start reading data from the channel's order number based on existing channels, times byte depth.

What do I mean "based on existing channels"? Well, you need to reassign the existing channels' order number from 1, incrementing the order for each channel that is present. For example, the channel mask 0x63F indicates that the FL, FR, FC, LFE, BL, BR, SL & SR channels are used, therefore the new channel order numbers for the respective channels would look like the following (note, the bit values are not and should not ever be changed),

Order | Bit | Channel

 1.     0x1  Front Left
 2.     0x2  Front Right
 3.     0x4  Front Center
 4.     0x8  Low Frequency (LFE)
 5.    0x10  Back Left (Surround Back Left)
 6.    0x20  Back Right (Surround Back Right)
 7.   0x200  Side Left (Surround Left)
 8.   0x400  Side Right (Surround Right)

You'll notice that the FLoC, FRoC & BC are all missing, therefore the SL & SR channels "drop down" into the next lowest available order numbers, rather than using the SL & SR's default order (10, 11).

Example Code

So, taking into account all of the above, while using .NET 2.0, here's a fully functional example for retrieving the bytes of a single specified channel,

byte[] GetChannelBytes(byte[] fileAudioBytes, uint speakerMask, 
Channels channelToRead, int bitDepth, uint sampleStartIndex, uint sampleEndIndex)
{
    var channels = FindExistingChannels(speakerMask);
    var ch = GetChannelNumber(channelToRead, channels);
    var byteDepth = bitDepth / 8;
    var chOffset = ch * byteDepth;
    var frameBytes = byteDepth * channels.Length;
    var startByteIncIndex = sampleStartIndex * byteDepth * channels.Length;
    var endByteIncIndex = sampleEndIndex * byteDepth * channels.Length;
    var outputBytesCount = endByteIncIndex - startByteIncIndex;
    var outputBytes = new byte[outputBytesCount / channels.Length];
    var i = 0;

    startByteIncIndex += chOffset;

    for (var j = startByteIncIndex; j < endByteIncIndex; j += frameBytes)
    {
        for (var k = j; k < j + byteDepth; k++)
        {
            outputBytes[i] = fileAudioBytes[(k - startByteIncIndex) + chOffset];
            i++;
        }
    }

    return outputBytes;
}

Channels[] FindExistingChannels(uint speakerMask)
{
    var foundChannels = new List<Channels>();

    foreach (var ch in Enum.GetValues(typeof(Channels)))
    {
        if ((speakerMask & (uint)ch) == (uint)ch)
        {
            foundChannels.Add((Channels)ch);
        }
    }

    return foundChannels.ToArray();
}

int GetChannelNumber(Channels input, Channels[] existingChannels)
{
    for (var i = 0; i < existingChannels.Length; i++)
    {
        if (existingChannels[i] == input)
        {
            return i;
        }
    }

    throw new KeyNotFoundException();
}

License

This article, along with any associated source code and files, is licensed under The GNU General Public License (GPLv3)