Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Speex in C#

0.00/5 (No votes)
1 Sep 2007 2  
Using the Speex speech codec with the .NET framework
Screenshot - Hierarchy.png

Introduction

I've been working for a while on a voice chat program in C# and encountered a vexing problem. Uncompressed audio data simply wouldn't do for a chat program and yet all of the .NET voice compression solutions I could find were quite expensive. Speex, the license-free open-source voice codec, seemed to be the obvious choice, and yet an exhaustive search turned up no C# implementations of the library. A quick search on SourceForge turned up a few such projects, but all were incomplete and long-abandoned. This is not a comforting fact for someone considering attempting the same feat.

Of course the entire time an exceedingly simple solution was readily available, if only I had been able to see it. The Speex website provides two command-line utilities, speexenc and speexdec. As I was looking for a programmatic solution, it never occurred to me to even look at the syntax of these utilities. If I had, I would've seen how easy it would be to use the utilities from .NET code.

What is Speex?

Speex is a license-free open-source voice codec. It is used for compressing audio data into a smaller format, which is advantageous for transmitting voice over the internet. Keep in mind that it is generally not efficient for non-voice data. This is because (according to Wikipedia) voice codecs work by eliminating frequencies that cannot be made by the human voices and those that are inaudible to human ears. With a reduced number of available frequencies, the audio data can be stored in a more compact form. Speex is usually used in VoIP programs and other, similar internet voice applications, but it can also be used simply for reducing the size of a file on your computer.

More information can be found at speex.org.

Using the Code

This code is quite simple to use. It contains an exception class, two structures for storing various data, and a class with two methods: encode and decode. It also makes use of a heavily modified version of Sujoy G.'s clsWaveProcessor class from his article Wave File Processor in C#. I've added a new field that contains the raw PCM data, removed all methods except for WaveHeaderIN, and edited that method to work on a stream instead of a filename. Here is the meat of the program, the Codec class.

public class Codec
{
    public EncodeReturn Encode(byte[] raw, int bytespersecond,
        int samplespersecond, bool stereo,
        short bitspersample, bool denoise, bool agc)
    {
        //Start speexenc process
        Process encProc = Process.Start("speexenc",
            "-u " + //Ultra wide-band
            (denoise ? "--denoise " : "") + //Denoise before encode
            "--agc " + //Addaptive gain control before encode
            "--bitrate " + bytespersecond * 8 + " " + //Set the bitrate
            "--rate " + samplespersecond + " " + //Set the sample rate
            (stereo ? "--stereo " : "") + //Set the channel count
            (bitspersample != 16 ? "--8bit " : "") + //
            "con con"); //Set console input and output

        //Writes the raw audio data to encproc's StdIn one byte at a time
        foreach (byte b in raw)
        {
            encProc.StandardInput.BaseStream.WriteByte(b);
        }

        //Wait, to ensure that all output has been written
        encProc.WaitForExit();

        //Check for success
        if (encProc.ExitCode != 0)
            throw new EncodeDecodeFailureException(encProc.ExitCode);

        //Skip the first line
        encProc.StandardOutput.ReadLine();

        //Remove output
        BinaryReader br = new BinaryReader(encProc.StandardOutput.BaseStream);

        byte[] retB = new byte[encProc.StandardOutput.BaseStream.Length];

        //In non-verbose mode, the first line of output is the only line on
        //non-audio data
        encProc.StandardOutput.ReadLine();

        //Read the output
        int k = 0;
        while (!encProc.StandardOutput.EndOfStream)
        {
            retB[k++] = br.ReadByte();
        }

        //Clean up
        br.Close();

        //Create the return object
        EncodeReturn retVal = new EncodeReturn(retB);

        //And return it
        return retVal;

    }

    public DecodeReturn Decode(byte[] raw)
    {
        //Create and start the decoding process
        Process decProc = Process.Start("speexdec", "--force-uwb con con");

        //Writes the raw audio data to encproc's StdIn one byte at a time
        foreach (byte b in raw)
        {
            decProc.StandardInput.BaseStream.WriteByte(b);
        }

        //Wait, to ensure that all output has been written
        decProc.WaitForExit();

        //Check for success
        if (decProc.ExitCode != 0)
            throw new EncodeDecodeFailureException(decProc.ExitCode);

        //Skip the first line
        encProc.StandardOutput.ReadLine();

        //Pass the output to clsWaveProcessor
        clsWaveProcessor cwp = new clsWaveProcessor();

        //Process the header and the data
        cwp.WaveHeaderIN(decProc.StandardOutput.BaseStream);

        //Create the output
        DecodeReturn dr = new DecodeReturn(cwp.RawPcmWaveData,
            ((cwp.BitsPerSample / 8) * cwp.SampleRate),
            cwp.SampleRate, cwp.Channels != 1, cwp.BitsPerSample);

        //Return the output
        return dr;
    }
} 

The code is fairly straightforward. In each method, I start a process based on the users needs. Note that I simplified encode for my needs. You may wish to add more parameters. The most important thing to note here is that both the input and output files for speexenc and speexdec are set to con. This allows me to write the input PCM data, as well as read the output, without having to use temporary files.

Then I simply read and write the data. Note that I ignore one line of output data in each method. This is because, regardless of whether or not the output is redirected to the console, both utilities write one line of extra information before they begin to write the output file. This is why it's very important that verbose mode is left off.

Pretty much all the rest of the relevant info can be found in the comments.

Note

I just want to mention that I plan updating this soon. The code has not been tested yet and I plan on doing a demo application.

History

Update: 1 September, 2007

  • Made EncodeReturn and DecodeReturn members public.
  • Added a section on Speex itself.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here