Introduction
To record audio in Windows Store apps, the Windows Runtime provides the MediaCapture class to easily and quickly get started recording audio. You are however limited to outputting the available formats specified in the MediaEncodingProfile, WAV isn't one of them currently.
So to record to WAV, you need another solution and because you do not have access to the full .NET stack, your options are limited. WASAPI, included in Microsoft's Core Audio SDK, offers a solution.
Unfortunately, to use WASAPI, you are thrown out of a safe managed haven, to an unmanaged COM part of the woods. This drop can be quite deep with a steep learning curve to climb out of, if you are not used to dealing with unmanaged code in .NET. C#'s dynamic type won't ease the pain either, because the used COM interfaces don't work with it (they don't implement IDispatch
).
Finally, writing the result to a WAV file also requires some low(er than normal) level code, which can also be an extra obstacle to overcome if you are not used to audio programming or its concepts.
This article is aimed at getting a C# developer that is a WASAPI novice up & running with a basic working solution.
Background
I wanted to try out an idea for a Windows Store app that deals with basic audio editing. For this, I wanted to use the WAV format for its lossless uncompressed characteristics and its compatibility with other audio software.
Since I consider WAV to be the default uncompressed audio format on Windows, I expected out of the box support for it in WinRT.
This is not the case, so I turned to NAudio as the solution. NAudio will do the heavy lifting, talking to the Core Audio SDK for you. Unfortunately, WinRT support in NAudio is in progress and not completed yet. It does include a working Windows Store app demo to record audio to WAV, but its built-in components to write the result to a WAV file are not available yet in WinRT.
I considered contributing to add the WinRT support I'm looking for. But that requires me to grasp a big part of the NAudio library, to be able to submit a patch that works nicely with the existing code base and its concepts.
Instead, as a starting point, I tried taking just out of NAudio what I needed and making a stripped down WinRT compatible solution, but I quickly realized I don't understand half of what it is going on.
Finally, I accepted I had to learn dealing directly with the Core Audio SDK and the basics of writing WAV files.
While learning, I discovered the official channels mostly use C++ as a default on the topic, which introduces an extra barrier to a C# developer which is more than just a syntax difference.
So with this article, I set out to piece together a solution that demonstrates the basics, obviously I cannot guarantee best practices are not violated.
Prerequisites
- Basic knowledge of Windows Store app development is assumed.
- Understanding of the MVVM pattern is assumed. I did not oversimplify the solution by stuffing everything from COM interop to UI logic in the codebehind, to avoid surprises when developing a more realistic solution.
Using the Code
Overview
The attached Visual Studio 2012 solution contains a Windows Store app project which demonstrates recording audio to WAV in an MVVM setup using the Core Audio SDK.
Notable namespaces:
CoreAudio
namespace: contains the COM interop logic to interact with the Core Audio SDK Services
namespace: contains the business logic to record audio to a WAV file (to be honest there's more than business logic in there, such as the specifics of writing a WAV file which should be refactored out of there)
To just read the code without following this article, use the StartRecordingCommand
in the ViewModels
namespace as a starting point and follow the logical flow from there.
Capturing Audio via WASAPI
Select an Audio Device for Capturing
The goal is to get an IAudioCaptureClient
to capture audio.
You get an IAudioCaptureClient through an IAudioClient
. Both are part of the Core Audio SDK's WASAPI.
To get an IAudioClient
, you use the Core Audio SDK's MMDevice
API by activating an audio device.
public class WindowsMultimediaDevice
{
[DllImport("Mmdevapi.dll", ExactSpelling = true, PreserveSig = false)]
public static extern void ActivateAudioInterfaceAsync(
[In, MarshalAs(UnmanagedType.LPWStr)] string deviceInterfacePath,
[In, MarshalAs(UnmanagedType.LPStruct)] Guid riid,
[In] IntPtr activationParams,
[In] IActivateAudioInterfaceCompletionHandler completionHandler,
out IActivateAudioInterfaceAsyncOperation createAsync);
}
The above definition exposes the relevant method to call. You can find the definition in the header file, if you install the Windows SDK for Windows 8.0 you can find this in Windows Kits\8.0\Include\um\mmdeviceapi.h.
The unmanaged code in Mmdevapi.dll is exposed with DllImport
, the assembly Mmdevapi.dll is assumed to be available by default on Vista and up. Also, since the unmanaged code has different types, a conversion is necessary which is done by marshalling using the MarshalAs
keyword.
public void Start()
{
_isRecording = true;
var defaultAudioCaptureId = MediaDevice.GetDefaultAudioCaptureId(AudioDeviceRole.Default);
var completionHandler = new ActivateAudioInterfaceCompletionHandler(StartCapture);
IActivateAudioInterfaceAsyncOperation createAsync;
WindowsMultimediaDevice.ActivateAudioInterfaceAsync(
defaultAudioCaptureId, new Guid(CoreAudio.Components.WASAPI.Constants.IID_IAudioClient),
IntPtr.Zero, completionHandler, out createAsync);
}
Used parameters explained:
- The
defaultAudioCaptureId
is easy to get through the MediaDevice
class provided by the Windows Runtime. - The
completionHandler
however is another type defined by MMDevice
API, view IActivateAudioInterfaceCompletionHandler
for the details. - The third parameter is the IID of the WASAPI COM interface we want to get, which is an
IAudioClient
in this case. The value for this IID can be found in header file Windows Kits\8.0\Include\um\Audioclient.h - No activation parameters are required, so the COM equivalent of
null
is passed - The
completionHandler
is the callback that will receive the IAudioClient
, which is the goal createAsync
is not used here, but passed to satisfy the method definition
Start Capturing Audio
After calling ActivateAudioInterfaceAsync
, in the ActivateAudioInterfaceCompletionHandler
callback, use the activated IAudioClient
to get an IAudioCaptureClient
.
object audioCaptureClientInterface;
audioClient.GetService(new Guid(CoreAudio.Components.WASAPI.Constants.IID_IAudioCaptureClient),
out audioCaptureClientInterface);
var audioCaptureClient = (IAudioCaptureClient)audioCaptureClientInterface;
var sleepMilliseconds = CalculateCaptureDelay(waveFormat, bufferSize);
audioClient.Start();
while (_isRecording)
{
Task.Delay(sleepMilliseconds);
CaptureAudioBuffer(waveFormat, bufferSize, audioCaptureClient, sleepMilliseconds);
}
audioClient.Stop();
The actual audio capturing happens in the while
loop. To be honest, the specifics are entirely based on an MSDN example in C++ using the NAudio Windows Store app demo as a help for bringing it to C#.
As I understand it, to optimize the process of capturing, a delay is executed on each pass to ensure the buffer can keep up. No point in hammering an empty buffer.
Then each time the buffer is read, for as long as there is something available (GetNextPacketSize > 0
), the buffer is read. The mixformat of the audio device you're capturing with, determines how to interpret the bytes in the buffer.
Finally, any subscribed clients are signaled through an event, with the captured buffer as an argument.
Writing WAV Files
Basically a WAV file consists out of a header in which the format details are specified and the actual data, the different blocks are called chunks.
Create a WAV File to Store the Captured Audio
After getting a binary writer that points to a file path to output to, the file is prepared as a WAV file to write the captured audio in.
You can find this logic in WaveFileWriter
.
private void WriteWavRiffHeader()
{
_binaryWriter.Write("RIFF".ToCharArray());
_binaryWriter.Write((uint)0);
_binaryWriter.Write("WAVE".ToCharArray());
}
The header starts with the main chunk, which specifies that this is a WAV file. The length of the file is unknown at this point and therefore initialized as zero.
private void WriteWavFormatChunkHeader(WaveFormat waveFormat)
{
_binaryWriter.Write("fmt ".ToCharArray());
uint samplesPerSecond = (uint)waveFormat.SampleRate;
ushort channels = (ushort)waveFormat.Channels;
ushort bitsPerSample = (ushort)waveFormat.BitsPerSample;
ushort blockAlign = (ushort)(channels * (bitsPerSample / 8));
uint averageBytesPerSec = (samplesPerSecond * blockAlign);
_binaryWriter.Write((uint)(18 + waveFormat.ExtraSize));
unchecked { _binaryWriter.Write((short)0xFFFE); }
_binaryWriter.Write(channels);
_binaryWriter.Write(samplesPerSecond);
_binaryWriter.Write(averageBytesPerSec);
_binaryWriter.Write(blockAlign);
_binaryWriter.Write(bitsPerSample);
_binaryWriter.Write((short)waveFormat.ExtraSize);
_binaryWriter.Write(bitsPerSample);
_binaryWriter.Write((uint)3);
byte[] subformat = new Guid(KsMedia.WAVEFORMATEX).ToByteArray();
_binaryWriter.Write(subformat, 0, subformat.Length);
}
The next chunk above, specifies the details of the WAV file, using the format of the activated IAudioClient
.
private void WriteWavDataChunkHeader()
{
_binaryWriter.Write("data".ToCharArray());
_dataSizePosition = _fileStream.Position;
_binaryWriter.Write((uint)0);
}
Finally, the last chunk before the actual data, specifies the start and length of the data which is currently unknown.
Write the Captured Audio to the Wave File
Writing the capture audio is straightforward, the received bytes are appended raw to the file.
public void Write(byte[] buffer, int bytesRecorded)
{
_fileStream.Write(buffer, 0, bytesRecorded);
_dataChunkSize += bytesRecorded;
}
When capturing is done and the last buffer is written, the headers are updated with the required length according to the specification.
private void UpdateWavRiffHeader()
{
_binaryWriter.Seek(4, SeekOrigin.Begin);
_binaryWriter.Write((uint)(_binaryWriter.BaseStream.Length - 8));
}
private void UpdateDataChunkHeader()
{
_binaryWriter.Seek((int)_dataSizePosition, SeekOrigin.Begin);
_binaryWriter.Write((uint)_dataChunkSize);
}
References
History