Introduction
Many applications need to be able to output sound s of one sort or another and C# offers the SoundPlayer class to help with this. SoundPlayer is designed for playing WAV files and includes some quite helpful features for making it easy to load WAVs from a variety of sources. However, when it comes to playing a continuous audio stream, such as might be produced by an audio synthesizer program or a proprietary streaming protocol, SoundPlayer does not offer much to help.
In my own (custom streaming) application, while I was able to encapsulate the continuous stream into discrete memory-based WAVs, I could not get SoundPlayer to produce an adequately smooth playback experience. To overcome this, I returned to the more classic multimedia audio interface provided by Windows. While straightforward enough in some programming paradigms, C# presented some interesting (and for me quite new) challenges.
Incorporating dlls into C# is remarkably simple and, after the few tweaks necessary to accommodate C#’s data types, I was quickly able to create an interface to the default audio device on my PC. The line below was literally all that was necessary to get started.
[DllImport("winmm.dll")]
public static extern int waveOutOpen(out IntPtr hWaveOut, int uDeviceID, WaveFormat lpFormat,
WaveDelegate dwCallback, IntPtr dwInstance, int dwFlags);
However, the waveOut device requires that data remain valid in the application while being either played or queue for playing; you can’t just fire the data at the device and forget it. The principal reason for this is that you only actually send a header to the device itself. The header contains a pointer detailing where the data live. Anyone familiar with managed objects will be nodding their head at this point but, as a newcomer to C#, it was a surprise to me. Managed objects represent a mechanism used by C# applications to tidy up memory that has been used by a program but is allegedly no longer required; in most languages the programmer has this responsibility. In C#, however, this is taken care of behind the scenes. One of the side effects of this is that objects, words, bytes, and so forth, can (and do) move around. If we apply a traditional programming paradigm to C# we find that, sooner or later, our pointers are no longer pointing at the data structures we thought they were. Typically, in C# program, you are either not allowed to do these kinds of things or you get a stern warning when you try. If you’re interfacing to external libraries, of course, you don’t get these reminders because C# doesn’t know what the external libraries are going to do with the information.
In order to use the waveOut interface we need, therefore, to create some unmanaged objects. Essentially these are areas of reserved memory that the C# garbage collector won’t interfere with. For this interface we need, as a minimum, two fixed elements. The first is a buffer where all the raw audio samples and, for good measure, the wave headers will be stored. The second is a UInt32 which is used in the waveOut callback. Interestingly, my first suspicion that all was not well with my code was the result of this callback. I registered a pointer to a managed UInt32 with the waveOut interface so that I could determine how many audio frames were currently queued for playout. This worked consistently for a few seconds than stopped. Eventually the program, while seeming initially to carry on, became singularly unhappy. So what was happening? The garbage collector was kicking in and moving my frame counter to a new location; the address registered with the wave out device, however, remained unchanged. Consequently, each time the callback was triggered after garbage collection, some other unsuspecting variable was being modified instead.
My two unmanaged structures (IntPtrs) are created as follows:
audioRawP = Marshal.AllocHGlobal(audioRawSize);
rFrames = Marshal.AllocHGlobal(4);
These will not now be touched by background garbage collection; audioRawP is an Int32 pointer to audioRawSize bytes of memory while rFrames is an IntPtr to four bytes of memory which will be cast to a UInt32. Because we have created these in this way, we have to dispose of them too. To do this the class is declared as an IDisposable
type
public class woLib : IDisposable
When a class is declared as IDisposable
we must include a Dispose()
method. This method is called when the object gets destroyed and ensures that any unmanaged memory is tidied up. Without this we’ll end up with potential memory leaks in or programs. My Dispose
method is
public void Dispose()
{
CloseWODevice();
if (audioRawP == (IntPtr)0)
{
Marshal.FreeHGlobal(audioRawP);
Marshal.FreeHGlobal(rFrames);
}
}
I ensure that the application finishes with the memory, check that memory has actually been allocated (since audioRawP
is initialised to 0,) then free it. It is interesting that, when pointers are created in this way, they become more intuitive to work with if you come from a more traditional programming background.
The final result is a class called woLib which provides a very simple interface to the waveOut device. After creation, only two methods are required to get going. My own application requires raw PCM and G.711 so the initialisation method
public void InitWODevice(UInt32 sf, UInt32 chans, UInt32 bps, bool G711)
includes a Boolean, G711. This feeds into the waveOut device through the WaveFormat tag
if (G711) wFmt.wFormatTag = (short)WaveFormats.mulaw; else wFmt.wFormatTag = (short)WaveFormats.Pcm;
Clearly it is a simple matter to include a more generic interface, if necessary. I’ve included a few typical formats in the WaveFormat list
public enum WaveFormats
{
Unknown = 0,
Pcm = 1,
Adpcm = 2,
Float = 3,
alaw = 6,
mulaw = 7
}
In case they should prove helpful. Finally, I have used only the default audio interface on my PC. When the waveOut device is opened
waveOutOpen(out hwo, -1, wFmt, woDone, rFrames, CALLBACK_FUNCTION);
I specify -1 as the required device. -1 selects the default. If you have more than one audio device present you can choose the device by replacing -1 with 0 to n-1 for n devices. The multimedia interface does include a call to find out how many devices are present on your system, should you wish to implement it. The call is imported from the same dll and is called "int waveOutGetNumDevs()"
Using the code
To use the code, an object is first created. In my program this appears as
static woLib WaveOut = new woLib();
Next, the device must be initialised thus:
WaveOut.InitWODevice(48000, 2, 16, false);
In this case the waveOut interface will expect data delivered at 48kHz with 4 bytes per sample; two bytes (16 bits) per channel, and two channels. The false at the end means that this isn’t G.711 (which would be 8 bits per sample in any case.) If InitWOdevice
is called after it has already been initialised it will close itself and reopen with the new settings. Once this is accomplished, it remains only to send data to the interface.
WaveOut.SendWODevice(pPCM, 4096);
In this case, I have 4096 bytes that are located at pPCM
which is an IntPtr to my raw PCM data. This represents 1024 actual audio samples, base on my current settings, which will last for around 21ms. If I were feeding data from a disk file, I would monitor the number of queued buffers
UInt32 queued = GetQueued();
And test against a minimum and maximum threshold. My raw buffer size is set at
private const int audioRawSize = 1000000;
1000000 bytes so I can fit around 240 (4096-byte) frames on the queue before overflowing. I’d probably aim at maintaining half this which means that I’d only need to check every 2 or so seconds. Clearly, if you’re accessing data from a disk you can use much larger data chunks; mine come from an AAC decoder stream and thus arrive in 1024-sample blocks. There is no particular reason that the raw audio buffer size is set to a fixed value. You can easily extend InitWODevice
to allow you to specify the size of this buffer, should you need to.
The complete class is included below.
public class woLib : IDisposable
{
[StructLayout(LayoutKind.Sequential)]
public struct WaveHdr
{
public IntPtr lpData;
public int dwBufferLength;
public int dwBytesRecorded;
public IntPtr dwUser;
public int dwFlags;
public int dwLoops;
public IntPtr lpNext;
public int reserved;
}
public enum WaveFormats
{
Unknown = 0,
Pcm = 1,
Adpcm = 2,
Float = 3,
alaw = 6,
mulaw = 7
}
[StructLayout(LayoutKind.Sequential)]
public class WaveFormat
{
public short wFormatTag;
public short nChannels;
public int nSamplesPerSec;
public int nAvgBytesPerSec;
public short nBlockAlign;
public short wBitsPerSample;
public short cbSize;
}
public const int CALLBACK_FUNCTION = 0x00030000;
public const int CALLBACK_NULL = 0x00000000;
public const int BUFFER_DONE = 0x3BD;
public delegate void WaveDelegate(IntPtr dev, int uMsg, int dwUser, int dwParam1, int dwParam2);
[DllImport("winmm.dll")]
public static extern int waveOutOpen(out IntPtr hWaveOut, int uDeviceID, WaveFormat lpFormat, WaveDelegate dwCallback, IntPtr dwInstance, int dwFlags);
[DllImport("winmm.dll")]
public static extern int waveOutReset(IntPtr hWaveOut);
[DllImport("winmm.dll")]
public static extern int waveOutRestart(IntPtr hWaveOut);
[DllImport("winmm.dll")]
public static extern int waveOutPrepareHeader(IntPtr hWaveOut, ref WaveHdr lpWaveOutHdr, int uSize);
[DllImport("winmm.dll")]
public static extern int waveOutWrite(IntPtr hWaveOut, ref WaveHdr lpWaveOutHdr, int uSize);
[DllImport("winmm.dll")]
public static extern int waveOutClose(IntPtr hWaveOut);
private UInt32 pFrames = 0;
private IntPtr rFrames = (IntPtr)0;
private IntPtr hwo = (IntPtr)0;
private const int audioRawSize = 1000000;
private IntPtr audioRawP = (IntPtr)0;
private UInt32 audioRawIndex = 0;
private WaveDelegate woDone = new WaveDelegate(WaveOutDone);
unsafe static void WaveOutDone(IntPtr dev, int uMsg, int dwUser, int dwParam1, int dwParam2)
{
if ((uMsg == BUFFER_DONE) && (dwUser!=0))
{
try
{
(*(UInt32*)dwUser)++;
}
catch
{
}
}
}
public void InitWODevice(UInt32 sf, UInt32 chans, UInt32 bps, bool G711)
{
if (hwo != (IntPtr)0) CloseWODevice();
if (audioRawP == (IntPtr)0)
{
audioRawP = Marshal.AllocHGlobal(audioRawSize);
rFrames = Marshal.AllocHGlobal(4);
}
WaveFormat wFmt = new WaveFormat();
wFmt.nAvgBytesPerSec = (int)(sf * (bps / 8) * chans);
wFmt.nBlockAlign = (short)(chans * (bps / 8));
wFmt.nChannels = (short)chans;
wFmt.nSamplesPerSec = (int)sf;
wFmt.wBitsPerSample = (short)bps;
if (G711) wFmt.wFormatTag = (short)WaveFormats.mulaw; else wFmt.wFormatTag = (short)WaveFormats.Pcm;
wFmt.cbSize = 0;
waveOutOpen(out hwo, -1, wFmt, woDone, rFrames, CALLBACK_FUNCTION);
ResetWODevice();
}
public void ResetWODevice()
{
if (hwo != (IntPtr)0) waveOutRestart(hwo);
pFrames = 0;
if (rFrames != (IntPtr)0)
unsafe
{
UInt32* i = (UInt32*)rFrames.ToPointer();
*i = 0;
}
audioRawIndex = 0;
}
public bool IsInit()
{
return (hwo != (IntPtr)0);
}
public UInt32 GetFrames()
{
return pFrames;
}
public UInt32 GetQueued()
{
if (rFrames != (IntPtr)0)
unsafe
{
UInt32* i = (UInt32*)rFrames.ToPointer();
if (pFrames > *i)
return (pFrames - *i);
else
return 0;
}
else return 0;
}
public void CloseWODevice()
{
ResetWODevice();
if (hwo != (IntPtr)0) waveOutClose(hwo);
hwo = (IntPtr)0;
}
public void SendWODevice(IntPtr data, UInt32 bytes)
{
if ((hwo != (IntPtr)0) && (audioRawP!=(IntPtr)0))
unsafe
{
UInt32 pwhs = (UInt32) sizeof(WaveHdr);
if ((bytes + audioRawIndex + pwhs) > audioRawSize)
audioRawIndex = 0;
if ((bytes + pwhs) < audioRawSize)
{
byte* bpi = (byte*)audioRawP;
byte* bpw = (byte*)&bpi[audioRawIndex];
byte* bpr = (byte*)&bpi[audioRawIndex + pwhs];
WaveHdr* pwh = (WaveHdr*)bpw;
pwh->lpData = (IntPtr)bpr;
pwh->dwBufferLength = (int)bytes;
pwh->dwUser = (IntPtr)0;
pwh->dwFlags = 0;
pwh->dwLoops = 0;
byte[] raw = new byte[bytes];
Marshal.Copy(data, raw, 0, (int)bytes);
IntPtr bpd = (IntPtr)bpr;
Marshal.Copy(raw, 0, bpd, (int)bytes);
audioRawIndex += (pwhs + bytes);
waveOutPrepareHeader(hwo, ref *pwh, (int)sizeof(WaveHdr));
waveOutWrite(hwo, ref *pwh, (int)sizeof(WaveHdr));
pFrames++;
}
}
}
public void Dispose()
{
CloseWODevice();
if (audioRawP == (IntPtr)0)
{
Marshal.FreeHGlobal(audioRawP);
Marshal.FreeHGlobal(rFrames);
}
}
}
Points of Interest
Interfacing to real devices through C# can produce a clash of paradigms; it’s been helpful to have a practical, and hopefully useful, example with which to work through this shift in understanding. C# does impose some frustrations; for example, where data are copied from the source, and into the unmanaged, buffers an extra copy appears to be necessary.
byte[] raw = new byte[bytes];
Marshal.Copy(data, raw, 0, (int)bytes);
IntPtr bpd = (IntPtr)bpr;
Marshal.Copy(raw, 0, bpd, (int)bytes);
In a world of optimized applications this produces a slightly uncomfortable feeling. I experienced exactly the same issue when getting a memory based image onto my screen. It is first copied into memory, then into a bitmap, and finally attached to a picture box object. It’s hard to say how many physical copies were actually involved but it feels like the long way around. Heaven help me when I start unravelling DirectX!
Starting to get to grips with unmanaged objects was the real key in this project. It took quite a bit of internet research in order to even find out the right question to ask, never mind its answer! As a complete newcomer to both classes and C# the code may not be brilliantly written; however, a newcomer’s perspective may well help other newcomers since we’re all going to trip over the same issues.