The code in this article was inspired by some questions on Windows Phone 7, but it's generic enough to be used on other .NET based platforms. In the Windows Phone AppHub forums, there was a question about altering the volume of the WAVE file that the Microsoft translator service returns. In the StackOverflow forums, there was a question about mixing two WAVE files together. I started off working on a solution for the volume question and when I stepped back to examine it, I realized I wasn't far away from a solution for the other question. So I have both solutions implemented in the same code. In this first post, I'm showing what I needed to do to alter the volume of the WAVE stream that comes from the Microsoft Translation service.
I've kept the code generic enough so that if you want to apply other algorithms to the code, you can do so. I've got some ideas on how the memory buffer for the sound data can be better handled that would allow large recordings to be manipulated without keeping the entire recording in memory and allowing the length of the recording to be more easily altered. But the code as presented demonstrates three things:
- Loading a WAVE file from a stream
- Alter the WAVE file contents in memory
- Save WAVE files back to a stream
The code for saving a WAVE file is a modified version of the code that I demonstrated some time ago for writing a proper WAVE file for the content that comes from the Microphone buffer.
Prerequisites
I'm making the assumption that you know what a WAVE file and a sample are. I am also assuming that you know how to use the Microsoft Translator web service.
Loading a Wave File
The formats for WAVE files is pretty well documented. There's more than one encoding that can be used in WAVE files, but I'm concentrating on PCM encoded WAVE files and will for now ignore all of the other possible encodings. The document that I used can be found here. There are a few variants from the document that I found when dealing with real WAVE files and I'll comment on those variants in a moment. In general, most of what you'll find in the header are 8, 16, and 32-bit integers and strings. I read the entire header into a byte array and extract the information from that byte array into an appropriate type. To extract a string
from the byte array, you need to know the starting index for the string
and the number of characters it contains. You can then use Encoding.UTF8.GetString
to extract the string
. If you understand how numbers are encoded (little endian), decoding them is fairly easy. If you want to get a better understanding, see the Wikipedia article on the encoding.
Integer Size | Extraction Code |
8-bit | data[i] |
16-bit | (data[i])|(data[i+1]<<0x08) |
32-bit | (data[i])|(data[i+1]<<0x08)|(data[i+2]<<0x10)|(data[i+3]<<0x18) |
Offset | Title | Size | Type | Description |
0 | ChunkID | 4 | string(4) | literal string "RIFF " |
4 | ChunkSize | 4 | int32 | Size of the entire file minus eight bytes |
8 | Format | 8 | string(4) | literal string "WAVE" |
12 | SubChunkID | 4 | string(4) | literal string "fmt " |
16 | SubChunk1Size | 4 | int32 | size of the rest of the subchunk |
20 | AudioFormat | 2 | int16 | Should be 1 for PCM encoding. |
22 | Channel Count | 2 | int16 | 1 for mono, 2 for stereo,... |
24 | SampleRate | 4 | int32 | |
28 | ByteRate | 4 | int32 | (SampleRate)*(Channel Count)*(Bits Per Sample)/8 |
32 | Block Align | 2 | int16 | (Channel Count)*(Bits Per Sample)/8 |
34 | BitsPerSample | 2 | int16 | |
| ExtraParamSize | 2 | int16 | possibly not there |
| ExtraParams | ? | ? | possibly not there |
36+x | SubChunk2ID | 4 | int32 | literal string "data " |
40+x | SubChunk2Size | 4 | int32 | |
44+x | data | SubChunk2Size | byte[SubChunk2Size] | |
The header will always be at least 44
bytes long. So I start off reading the first 44
bytes of the stream. The SubChunk1Size
will normally contain the value 16
. If it's greater than 16
, then the header is greater than 44
bytes and I read the rest. I've allowed for a header size of up to 64
bytes (which is much larger than I have encountered). A header size of larger than 44
bytes will generally mean that there is an extra parameter at the end of SubChunk1
. For what I'm doing, the contents of the extra parameters don't matter. But I still need to account for the space that they consume to properly read the header.
To my surprise, the contents of the fields in the header are not always populated. Some audio editors leave some of the fields zeroed out. My first attempt to read a WAVE file was with a file that came from the open source audio editor Audacity. Among other fields, the BitsPerSample
field was zeroed. I'm not sure if this is allowed by the format or not. It certainly is not in any of the spec sheets that I've found. But when I encounter this, I assume a value of 16
.
Regardless of whether a WAVE file contains 8-bit, 16-bit-, or 32-bit samples when read in, I store the value in an array of doubles. I chose to do this because double works out better for some of the math operations I have in mind.
public void ReadWaveData(Stream sourceStream, bool normalizeAmplitude = false)
{
byte[] header = new byte[60];
int bytesRead = sourceStream.Read(header, 0, 44);
if(bytesRead!=44)
throw new InvalidDataException(String.Format
("This can't be a wave file. It is only {0} bytes long!",bytesRead));
int audioFormat = ChannelCount = (header[20]) | (header[21] << 8);
if (audioFormat != 1)
throw new Exception("Only PCM Waves are supported (AudioFormat=1)");
#region mostless useless code
string chunkID = Encoding.UTF8.GetString(header, 0, 4);
if (!chunkID.Equals("RIFF"))
{
throw new InvalidDataException(String.Format
("Expected a ChunkID of 'RIFF'. Received a chunk ID of {0} instead.", chunkID));
}
int chunkSize = (header[4]) | (header[5] << 8) |
(header[6] << 16) | (header[7] << 24);
string format = Encoding.UTF8.GetString(header, 8, 4);
if (!format.Equals("WAVE"))
{
throw new InvalidDataException(String.Format
("Expected a format of 'WAVE'. Received a chunk ID of {0} instead.", format));
}
string subChunkID = Encoding.UTF8.GetString(header, 12, 4);
if (!format.Equals("fmt "))
{
throw new InvalidDataException(String.Format("Expected a subchunkID of
'fmt '. Received a chunk ID of {0} instead.", subChunkID));
}
int subChunkSize = (header[16]) | (header[17] << 8) |
(header[18] << 16) | (header[19] << 24);
#endregion
if (subChunkSize > 16)
{
var bytesNeeded = subChunkSize - 16;
if(bytesNeeded+44 > header.Length)
throw new InvalidDataException("The WAV header is larger than expected. ");
sourceStream.Read(header, 44, subChunkSize - 16);
}
ChannelCount = (header[22]) | (header[23] << 8);
SampleRate = (header[24]) | (header[25] << 8) |
(header[26] << 16) | (header[27] << 24);
#region Useless Code
int byteRate = (header[28]) | (header[29] << 8) |
(header[30] << 16) | (header[31] << 24);
int blockAlign = (header[32]) | (header[33] << 8);
#endregion
BitsPerSample = (header[34]) | (header[35] << 8);
#region Useless Code
string subchunk2ID = Encoding.UTF8.GetString(header, 20 + subChunkSize, 4);
#endregion
var offset = 24 + subChunkSize;
int dataLength = (header[offset+0]) | (header[offset+1] << 8) |
(header[offset+2] << 16) | (header[offset+3] << 24);
if (BitsPerSample == 0)
{
BitsPerSample = 16;
}
byte[] dataBuffer = new byte[dataLength];
bytesRead = sourceStream.Read(dataBuffer, 0, dataBuffer.Length);
Debug.Assert(bytesRead == dataLength);
if (BitsPerSample == 8)
{
byte[] unadjustedSoundData = new byte[dataBuffer.Length / (BitsPerSample / 8)];
Buffer.BlockCopy(dataBuffer, 0, unadjustedSoundData, 0, dataBuffer.Length);
SoundData = new double[unadjustedSoundData.Length];
for (var i = 0; i < (unadjustedSoundData.Length); ++i)
{
SoundData[i] = 128d*(double)unadjustedSoundData[i];
}
}
if (BitsPerSample == 16)
{
short[] unadjustedSoundData = new short[dataBuffer.Length / (BitsPerSample / 8)];
Buffer.BlockCopy(dataBuffer, 0, unadjustedSoundData, 0, dataBuffer.Length);
SoundData = new double[unadjustedSoundData.Length];
for (var i = 0; i < (unadjustedSoundData.Length); ++i)
{
SoundData[i] = (double) unadjustedSoundData[i];
}
}
else if(BitsPerSample==32)
{
int[] unadjustedSoundData = new int[dataBuffer.Length / (BitsPerSample / 8)];
Buffer.BlockCopy(dataBuffer, 0, unadjustedSoundData, 0, dataBuffer.Length);
SoundData = new double[unadjustedSoundData.Length];
for (var i = 0; i < (unadjustedSoundData.Length); ++i)
{
SoundData[i] = (double)unadjustedSoundData[i];
}
}
Channels = new PcmChannel[ChannelCount];
for (int i = 0; i < ChannelCount;++i )
{
Channels[i]=new PcmChannel(this,i);
}
if (normalizeAmplitude )
NormalizeAmplitude();
}
Mono vs Stereo
In a mono (single channel) file, the samples are ordered one after another, no mystery there. For stereo files, the data stream will contain the first sample for channel 0, then the first sample for channel 1, then the second sample for channel 0, second sample for channel 1, and so on. Every other sample will be for the left channel or right channel. The sample data is stored in memory in the same way. in an array called SampleData
. To work exclusively with one channel or the other, there is also a property named Channels
(of type PcmChannel
) that can be used to access that one channel.
public class PcmChannel
{
internal PcmChannel(PcmData parent, int channel)
{
Channel = channel;
Parent = parent;
}
protected PcmData Parent { get; set; }
public int Channel { get; protected set; }
public int Length
{
get { return (int)(Parent.SoundData.Length/Parent.ChannelCount); }
}
public double this[int index]
{
get { return Parent.SoundData[index*Parent.ChannelCount + Channel]; }
set { Parent.SoundData[index*Parent.ChannelCount + Channel] = value; }
}
}
public class PcmData
{
public double[] SoundData { get; set; }
public int ChannelCount { get; set; }
public PcmChannel[] Channels { get; set; }
}
Where's 24-bit Support
Yes, there do exist 24-bit WAVE files. I'm not supporting them (yet) because there's more code required to handle them and most of the scenarios I have in mind are going to use 8 and 16-bit files. Adding support for 32-bit files was only 5 more lines of code. I'll be handing 24-bit files in a forthcoming code.
Altering the Sound Data
Changes made to the values in the SoundData[]
array will alter the sound data. There are some constraints on how the data can be modified. Since I'm writing this to a 16-bit WAVE file, the maximum and minimum values that can be written out are 32,768 and -32,767. The double
data type has a range significantly larger than this. The properties, AdjustmentFactor
and AdjustmentOffset
are used to alter the sound data when it is being prepared to be written back to a file. They are used to apply a linear transformation to the sound data (remember y=mx+b?
). Finding the right values for these is done for you through the NormalizeAmplitude
method. Calling this method after you've altered your sound data will result in appropriate values being chose. By default, this method will try to normalize the sound data to 99% of maximum amplitude. You can pass an argument to this method between the values of 0 and 1 for some other amplitude.
public void NormalizeAmplitude( double percentMax = 0.99d)
{
var max = SoundData.Max();
var min = SoundData.Min();
double rangeSize = max - min+1 ;
AdjustmentFactor = ((percentMax * (double)short.MaxValue) -
percentMax * (double)short.MinValue) / (double)rangeSize;
AdjustmentOffset = (percentMax * (double)short.MinValue) - (min * AdjustmentFactor);
int maxExpected = (int)(max * AdjustmentFactor + AdjustmentOffset);
int minExpected = (int)(min * AdjustmentFactor + AdjustmentOffset);
}
Saving WAVE Data
To save the WAVE data, I'm using a variant of something I used to save the stream that comes from the Microphone. The original form of the code had a bug that makes a difference when working with a stream with multiple channels. The microsphone produces a single channel stream and wasn't impacted by this bug (but it's fixed here). The code for writing the wave produces a header from the parameters it is given, then it writes out the WAVE data. The WAVE data must be converted from the double[]
array to a byte[]
array containing 16-bit integers in little endian format.
public class PcmData
{
public void Write(Stream destinationStream)
{
byte[] writeData = new byte[SoundData.Length*2];
short[] conversionData = new short[SoundData.Length];
for(int i=0;i<SoundData.Length;++i)
{
double sample = ((SoundData[i]*AdjustmentFactor)+AdjustmentOffset);
sample = Math.Min(sample, (double) short.MaxValue);
sample = Math.Max(sample, short.MinValue);
conversionData[i] = (short) sample;
}
int max = conversionData.Max();
int min = conversionData.Min();
Buffer.BlockCopy(conversionData, 0, writeData, 0, writeData.Length);
WaveHeaderWriter.WriteHeader(destinationStream,writeData.Length,
ChannelCount,SampleRate);
destinationStream.Write(writeData,0,writeData.Length);
}
}
public class WaveHeaderWriter
{
static byte[] RIFF_HEADER = new byte[] { 0x52, 0x49, 0x46, 0x46 };
static byte[] FORMAT_WAVE = new byte[] { 0x57, 0x41, 0x56, 0x45 };
static byte[] FORMAT_TAG = new byte[] { 0x66, 0x6d, 0x74, 0x20 };
static byte[] AUDIO_FORMAT = new byte[] { 0x01, 0x00 };
static byte[] SUBCHUNK_ID = new byte[] { 0x64, 0x61, 0x74, 0x61 };
private const int BYTES_PER_SAMPLE = 2;
public static void WriteHeader(
System.IO.Stream targetStream,
int byteStreamSize,
int channelCount,
int sampleRate)
{
int byteRate = sampleRate * channelCount * BYTES_PER_SAMPLE;
int blockAlign = BYTES_PER_SAMPLE;
targetStream.Write(RIFF_HEADER, 0, RIFF_HEADER.Length);
targetStream.Write(PackageInt(byteStreamSize + 36, 4), 0, 4);
targetStream.Write(FORMAT_WAVE, 0, FORMAT_WAVE.Length);
targetStream.Write(FORMAT_TAG, 0, FORMAT_TAG.Length);
targetStream.Write(PackageInt(16, 4), 0, 4);
targetStream.Write(AUDIO_FORMAT, 0, AUDIO_FORMAT.Length);
targetStream.Write(PackageInt(channelCount, 2), 0, 2);
targetStream.Write(PackageInt(sampleRate, 4), 0, 4);
targetStream.Write(PackageInt(byteRate, 4), 0, 4);
targetStream.Write(PackageInt(blockAlign, 2), 0, 2);
targetStream.Write(PackageInt(BYTES_PER_SAMPLE * 8), 0, 2);
targetStream.Write(SUBCHUNK_ID, 0, SUBCHUNK_ID.Length);
targetStream.Write(PackageInt(byteStreamSize, 4), 0, 4);
}
static byte[] PackageInt(int source, int length = 2)
{
if ((length != 2) && (length != 4))
throw new ArgumentException("length must be either 2 or 4", "length");
var retVal = new byte[length];
retVal[0] = (byte)(source & 0xFF);
retVal[1] = (byte)((source >> 8) & 0xFF);
if (length == 4)
{
retVal[2] = (byte)((source >> 0x10) & 0xFF);
retVal[3] = (byte)((source >> 0x18) & 0xFF);
}
return retVal;
}
}
Using the Code
Once you've gotten the wave stream, only a few lines of code are needed to do the work. For the example program, I am downloading a spoken phrase from the Microsoft Translation service, amplifying it, and then writing both the original and amplified versions to a file.
static void Main(string[] args)
{
PcmData pcm;
MicrosoftTranslatorService.LanguageServiceClient client = new LanguageServiceClient();
string waveUrl = client.Speak(APP_ID, "this is a volume test", "en", "audio/wav","");
WebClient wc = new WebClient();
var soundData = wc.DownloadData(waveUrl);
using (var ms = new MemoryStream(soundData))
{
pcm = new PcmData(ms, true);
}
using (Stream s = new FileStream("amplified.wav", FileMode.Create, FileAccess.Write))
{
pcm.Write(s);
}
using (Stream s = new FileStream("original.wav", FileMode.Create, FileAccess.Write))
{
s.Write(soundData,0,soundData.Length);
}
}
The End Result
The code works as designed, but I found a few scenarios that can make it ineffective. One scenario is that not all phones have the same response frequency for their speakers. Frequencies that come through loud and clear on one phone may come through sounding quieter on another. The other scenario is that the source files may have a sample that goes to the maximum or minimum reading even though a majority of the other samples may come nowhere near the same level of amplitude. When this occurs, the spurious sample will limit the amount of amplification that is applied to the file. I opened an original and amplified WAVE file in audacity to see my results and I was pleased to see that the amplified WAVE does actually look louder when I view its graph in audacity.
Part 2 - Overlaying Wave Files
The other problem that this code can solve is combining wave files together in various ways. I'll be putting that up in the next post. Between now and then, I've got a presentation at the Windows Phone Developers Atlanta meeting this week (if you are in the Atlanta area, come on out!) and will get back to this code after the presentation.
CodeProject