Sound recording and encoding in MP3 format.

rtybase

4.89/5 (64 votes)

16 Nov 2006CPOL6 min read

19.2K

An article describing the technique of recording sound from waveform-audio input devices and encoding it in MP3 format.

Introduction

Have you ever tried to write something for recording sound from the sound card and encoding it in MP3 format? Not interesting? Well, to make stuff more interesting, have you ever tried to write an MP3 streaming, internet radio sever? I know, you'll say "What for? There are good and pretty much standard implementations like Icecast or SHOUcast". But, anyway, have you ever tried, at least, to dig a bit inside this entire kitchen or write anything similar for your soul? Well, that's what this article is about. Of course, we won't manage to cover all topics in one article; at the end, this may be tiresome. So, I will split the entire topic in a few articles, this one covering the recording and encoding process.

Background

Obviously, the first problem everyone encounters is the MP3 encoding itself. Trying to write something that will work properly isn't quite an easy task. So, I won't go too far and will stop at the LAME (Sourceforge) encoder, considered one of the best (one, not the only!). I am using version 3.97); those interested in having sources, feel free to download them from SourceForge (it's an open source project). The relevant "lame_enc.dll" is also included in the demo project (see the links at the top of this article).

The next problem is recording the sound from the soundcard. Well, with some luck, on Google, MSDN, and CodeProject, you can find many articles related to this topic. I should say that I am using the low level waveform-audio API (see the Windows Media Platform SDK, e.g., waveInOpen(...), mixerOpen(...), etc.).

So, let's go with the details now.

MP3 Encoding

Download the "mp3_stream_src.zip" file containing the sources (see the link to the sources at the top of this article). Inside it, you should find the "mp3_simple.h" file (see the INCLUDE folder after un-zipping). It contains the definition and implementation of the CMP3Simple class. This class is a wrapper of the LAME API, which I tried to design to make life a bit easier. I commented code as much as possible, and I hope those comments are good enough. All we need to know at this point:

When instantiating a CMP3Simple object, we need to define the desired bitrate at what to encode the sound's samples, expected frequency of the sound's samples, and (if necessary to re-sample) the desired frequency of the encoded sound:

C++

// Constructor of the class accepts only three parameters.
// Feel free to add more constructors with different parameters, 
// if a better customization is necessary.
//
// nBitRate - says at what bitrate to encode the raw (PCM) sound
// (e.g. 16, 32, 40, 48, ... 64, ... 96, ... 128, etc), see 
// official LAME documentation for accepted values.
// 
// nInputSampleRate - expected input frequency of the raw (PCM) sound
// (e.g. 44100, 32000, 22500, etc), see official LAME documentation
// for accepted values.
//
// nOutSampleRate - requested frequency for the encoded/output 
// (MP3) sound. If equal with zero, then sound is not
// re-sampled (nOutSampleRate = nInputSampleRate).
CMP3Simple(unsigned int nBitRate, unsigned int nInputSampleRate = 44100,
           unsigned int nOutSampleRate = 0);

Encoding itself is performed via CMP3Simple::Encode(...).

C++

// This method performs encoding.
//
// pSamples - pointer to the buffer containing raw (PCM) sound to be 
// encoded. Mind that buffer must be an array of SHORT (16 bits PCM stereo 
// sound, for mono 8 bits PCM sound better to double every byte to obtain 
// 16 bits).
//
// nSamples - number of elements in "pSamples" (SHORT). Not to be confused 
// with buffer size which represents (usually) volume in bytes. See 
// also "MaxInBufferSize" method.
//
// pOutput - pointer to the buffer that will receive encoded (MP3) sound, 
// here we have bytes already. LAME says that if pOutput is not
// cleaned before call, data in pOutput will be mixed with incoming 
// data from pSamples.
//
// pdwOutput - pointer to a variable that will receive the 
// number of bytes written to "pOutput". See also "MaxOutBufferSize" 
// method.
BE_ERR Encode(PSHORT pSamples, DWORD nSamples, PBYTE pOutput, 
              PDWORD pdwOutput);

Recording from the soundcard

Similarly, after un-zipping the "mp3_stream_src.zip" file, inside the INCLUDE folder, you should find the "waveIN_simple.h" file. It contains the definitions and implementations for the CWaveINSimple, CMixer and CMixerLine classes. Those classes are wrappers for a sub-set of the waveform-audio API functions. Why just a sub-set? Because (I am lazy sometimes), they encapsulate only functionality associated with Wave In devices (recording). So, Wave Out devices (playback) are not captured (type "sndvol32 /r" from "Start->Run" to see what I mean). Check comments I added to each class to have a better picture of what they are doing. What we need to know at this point:

One CWaveINSimple device has one CMixer which has zero or more CMixerLines.
Constructors and destructors of all those classes are declared "private" (due design).
- Objects of the CWaveINSimple class can not be instantiated directly, for that the CWaveINSimple::GetDevices() and CWaveINSimple::GetDevice(...) static methods are declared.
- Objects of the CMixer class can not be instantiated directly, for that the CWaveINSimple::OpenMixer() method is declared.
- Objects of the CMixerLine class can not be instantiated directly, for that the CMixer::GetLines() and CMixer::GetLine(...) methods are declared.

In order to capture and process further sound data, a class must inherit from the IReceiver abstract class and implement the IReceiver::ReceiveBuffer(...) method. Further, an instance of the IReceiver derivate is passed to CWaveINSimple via CWaveINSimple::Start(IReceiver *pReceiver).

C++

// See CWaveINSimple::Start(IReceiver *pReceiver) below.
// Instances of any class extending "IReceiver" will be able 
// to receive raw (PCM) sound from an instance of the CWaveINSimple 
// and process sound via own implementation of the "ReceiveBuffer" method.
class IReceiver {
public:
    virtual void ReceiveBuffer(LPSTR lpData, DWORD dwBytesRecorded) = 0;
};
...
class CWaveINSimple {
private:
...
    // This method starts recording sound from the 
    // WaveIN device. Passed object (derivate from 
    // IReceiver) will be responsible for further 
    // processing of the sound data.
    void _Start(IReceiver *pReceiver);
...
public:
...
    // Wrapper of the _Start() method, for the multithreading
    // version. This is the actual starter.
    void Start(IReceiver *pReceiver);
...
};

Let's see some examples.

Examples

How would we list all the Wave In devices in the system?

C++

const vector<CWaveINSimple*>& wInDevices = CWaveINSimple::GetDevices();
UINT i;

for (i = 0; i < wInDevices.size(); i++) {
    printf("%s\n", wInDevices[i]->GetName());
}

How would we list a Wave In device's lines (supposing that strDeviceName = e.g., "SoundMAX Digital Audio")?

C++

CWaveINSimple& WaveInDevice = CWaveINSimple::GetDevice(strDeviceName);
CHAR szName[MIXER_LONG_NAME_CHARS];
UINT j;

try {
    CMixer& mixer = WaveInDevice.OpenMixer();
    const vector<CMixerLine*>& mLines = mixer.GetLines();

    for (j = 0; j < mLines.size(); j++) {
        // Useful when Line has non proper English name 
        ::CharToOem(mLines[j]->GetName(), szName);
        printf("%s\n", szName);
    }

    mixer.Close();
}
catch (const char *err) {
    printf("%s\n",err);
}

How would we record and encode in MP3 actually?

First of all, we define a class like:

C++

class mp3Writer: public IReceiver {
private:
    CMP3Simple    m_mp3Enc;
    FILE *f;

public:
    mp3Writer(unsigned int bitrate = 128, 
                  unsigned int finalSimpleRate = 0): 
          m_mp3Enc(bitrate, 44100, finalSimpleRate) {
        f = fopen("music.mp3", "wb");
        if (f == NULL) throw "Can't create MP3 file.";
    };

    ~mp3Writer() {
        fclose(f);
    };

    virtual void ReceiveBuffer(LPSTR lpData, DWORD dwBytesRecorded) {
        BYTE    mp3Out[44100 * 4];
        DWORD    dwOut;
        m_mp3Enc.Encode((PSHORT) lpData, dwBytesRecorded/2, 
                                 mp3Out, &dwOut);

        fwrite(mp3Out, dwOut, 1, f);
    };
};

and (supposing that strLineName = e.g., "Microphone"):

C++

try {
    CWaveINSimple& device = CWaveINSimple::GetDevice(strDeviceName);
    CMixer& mixer = device.OpenMixer();
    CMixerLine& mixerline = mixer.GetLine(strLineName);

    mixerline.UnMute();
    mixerline.SetVolume(0);
    mixerline.Select();
    mixer.Close();

    mp3Writer *mp3Wr = new mp3Writer();
    device.Start((IReceiver *) mp3Wr);
    while( !_kbhit() ) ::Sleep(100);
        
    device.Stop();
    delete mp3Wr;
}
catch (const char *err) {
    printf("%s\n",err);
}

CWaveINSimple::CleanUp();

Remark 1

mixerline.SetVolume(0) is a pretty tricky point. For some sound cards, SetVolume(0) gives original (good) sound's quality, for others, SetVolume(100) does the same. However, you can find sound cards where SetVolume(15) is the best quality. I have no good advices here, just try and check.

Remark 2

Almost every sound card supports "Wave Out Mix" or "Stereo Mix" (the list is extensible) Mixer's Line. Recording from such a line (mixerline.Select()) will actually record everything going to the sound card's Wave Out (read "speakers"). So, leave WinAmp or Windows Media Player to play for a while, and start the application to record the sound at the same time, you'll see the result.

Remark 3

Rather than calling:

mp3Writer *mp3Wr = new mp3Writer();

it is also possible to instantiate an instance of the mp3Writer as following (see the class definition above):

mp3Writer *mp3Wr = new mp3Writer(64, 32000);

This will produce a final MP3 at a 64 Kbps bitrate and 32 Khz sample rate.

Comments on using the demo application

The demo application (see the links at the top of this article) is a console application supporting two command line options. Executing the application without specifying any of the command line options will simply print the usage guideline, e.g.:

...>mp3_stream.exe
mp3_stream.exe -devices
        Will list WaveIN devices.

mp3_stream.exe -device=<device_name>
        Will list recording lines of the WaveIN <device_name> device.

mp3_stream.exe -device=<device_name> -line=<line_name> 
          [-v=<volume>] [-br=<bitrate>] [-sr=<samplerate>]
        Will record from the <line_name> 
        at the given voice <volume>, output <bitrate> (in Kbps)
        and output <samplerate> (in Hz).

        <volume>, <bitrate> and <samplerate> are optional parameters.
        <volume> - integer value between (0..100), defaults to 0 if not set.
        <bitrate> - integer value (16, 24, 32, .., 64, etc.), 
                        defaults to 128 if not set.
        <samplerate> - integer value (44100, 32000, 22050, etc.), 
                        defaults to 44100 if not set.

Executing the application with the "-devices" command line option will print the names of the Wave In devices currently installed in the system, e.g.:

...>mp3_stream.exe -devices
Realtek AC97 Audio

Executing the application with the "-device=<device_name>" command line option will list all the lines of the selected Wave In device, e.g.:

...>mp3_stream.exe "-device=Realtek AC97 Audio"
Mono Mix
Stereo Mix
Aux
TV Tuner Audio
CD Player
Line In
Microphone
Phone Line

At the end, the application will start recording (and encoding) sound from the selected Wave In device/line (microphone in this example) when executing with the following command line options:

...>mp3_stream.exe "-device=Realtek AC97 Audio" -line=Microphone

Recording at 128Kbps, 44100Hz
from Microphone (Realtek AC97 Audio).
Volume 0%.

hit <ENTER> to stop ...

Recorded and encoded sound is saved in the "music.mp3" file, in the same folder from where you executed the application.

If you want to record sound that is currently playing (e.g., AVI movie, or Video DVD, or ...) through the soundcard Wave Out, you can run the application with the following options:

...>mp3_stream.exe "-device=Realtek AC97 Audio" "-line=Stereo Mix"

However, this may be specific for my configuration only (also explained in the "Remark 2" above).

You can specify additional command line parameters, e.g.:

...>mp3_stream.exe "-device=Realtek AC97 Audio" 
        "-line=Stereo Mix" -v=100 -br=32 -sr=32000

This will set the line’s volume at 100%, and will produce the final MP3 at 32 Kbps and 32 Khz.

Conclusion

In this article, I covered couple of months I spent investigating MP3 encoding APIs and recording (capturing actually) sound going to the sound card's speakers. I used all this techniques for implementing an internet based radio station (MP3 streaming server). I found this topic very interesting, and decided to share some of my code. In one of my next articles, I will try to cover some of the aspects related to MP3 streaming and IO Completion Ports, but, until that time, I have to clean existing code, comment it, and prepare the article :).

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)