Introduction
I've been working for a while on a voice chat program in C# and encountered a vexing problem. Uncompressed audio data simply wouldn't do for a chat program and yet
all of the .NET voice compression solutions I could find were quite expensive. Speex, the license-free open-source voice codec, seemed to be the obvious choice, and yet an exhaustive search turned up no C# implementations of the library. A quick search on SourceForge turned up a few such projects, but all were incomplete and long-abandoned. This is not a
comforting fact for someone considering attempting the same feat.
Of course the entire time an exceedingly simple solution was readily available, if only I had been able to see it. The Speex website provides two command-line utilities, speexenc
and speexdec
. As I was looking for a programmatic solution, it never occurred to me to even look at the syntax of these
utilities. If I had, I would've seen how easy it would be to use the utilities from .NET code.
What is Speex?
Speex is a license-free open-source voice codec. It is used for compressing audio data into a smaller format, which is advantageous for transmitting voice over the internet. Keep in mind
that it is generally not efficient for non-voice data. This is because (according to Wikipedia) voice codecs work by
eliminating frequencies that cannot be made by the human voices and those that are inaudible to human ears. With a reduced number of available frequencies, the audio data can be stored in a more compact form. Speex is usually used in VoIP programs and other, similar internet voice applications, but
it can also be used simply for reducing the size of a file on your computer.
More information can be found at speex.org.
Using the Code
This code is quite simple to use. It contains an exception class, two structures for storing various data, and a class with two methods: encode
and decode
. It also makes use of a heavily modified version of Sujoy G.'s clsWaveProcessor
class from his article Wave File Processor in C#. I've added a new field
that contains the raw PCM data, removed all methods except for WaveHeaderIN, and
edited that method to work on a stream instead of a filename. Here is the meat of the program, the Codec
class.
public class Codec
{
public EncodeReturn Encode(byte[] raw, int bytespersecond,
int samplespersecond, bool stereo,
short bitspersample, bool denoise, bool agc)
{
Process encProc = Process.Start("speexenc",
"-u " + (denoise ? "--denoise " : "") + "--agc " + "--bitrate " + bytespersecond * 8 + " " + "--rate " + samplespersecond + " " + (stereo ? "--stereo " : "") + (bitspersample != 16 ? "--8bit " : "") + "con con");
foreach (byte b in raw)
{
encProc.StandardInput.BaseStream.WriteByte(b);
}
encProc.WaitForExit();
if (encProc.ExitCode != 0)
throw new EncodeDecodeFailureException(encProc.ExitCode);
encProc.StandardOutput.ReadLine();
BinaryReader br = new BinaryReader(encProc.StandardOutput.BaseStream);
byte[] retB = new byte[encProc.StandardOutput.BaseStream.Length];
encProc.StandardOutput.ReadLine();
int k = 0;
while (!encProc.StandardOutput.EndOfStream)
{
retB[k++] = br.ReadByte();
}
br.Close();
EncodeReturn retVal = new EncodeReturn(retB);
return retVal;
}
public DecodeReturn Decode(byte[] raw)
{
Process decProc = Process.Start("speexdec", "--force-uwb con con");
foreach (byte b in raw)
{
decProc.StandardInput.BaseStream.WriteByte(b);
}
decProc.WaitForExit();
if (decProc.ExitCode != 0)
throw new EncodeDecodeFailureException(decProc.ExitCode);
encProc.StandardOutput.ReadLine();
clsWaveProcessor cwp = new clsWaveProcessor();
cwp.WaveHeaderIN(decProc.StandardOutput.BaseStream);
DecodeReturn dr = new DecodeReturn(cwp.RawPcmWaveData,
((cwp.BitsPerSample / 8) * cwp.SampleRate),
cwp.SampleRate, cwp.Channels != 1, cwp.BitsPerSample);
return dr;
}
}
The code is fairly straightforward. In each method, I start a process based on the users needs. Note that I simplified encode
for my needs. You may wish to add more parameters. The most important thing to note here is that both the input and
output files for speexenc
and speexdec
are set to con
. This allows me to write the input
PCM data, as well as read the output, without having to use temporary files.
Then I simply read and write the data. Note that I ignore one line of output data in each method. This is because, regardless of whether or not the output is redirected to the console, both utilities write one line of extra information before they begin to write the output file. This is why it's very important that verbose mode is left off.
Pretty much all the rest of the relevant info can be found in the comments.
Note
I just want to mention that I plan updating this soon. The code has not been tested yet and I plan on doing a demo application.
History
Update: 1 September, 2007
- Made
EncodeReturn
and DecodeReturn
members public.
- Added a section on Speex itself.