Introduction
This small app displays the audio captured by the microphone of a Windows Phone device and displays it as a continous waveform on the screen using the XNA framework.
A slightly modified version of the app (changing color when touching the screen) can be found in the Windows Phone Marketplace.
Background
In order to capture the audio on a Windows Phone device, you need an instance to the default microphone (Microphone.Default
), decide how often you want samples using the BufferDuration
-property and hook up the BufferReady
-event. Then you control the capturing with the Start()
and Stop()
methods.
The microphone is giving you samples at a fixed rate of 16 000 Hz, i.e. 16 000 samples per second. There is a property SampleRate
that will tell this value. This means that you won't be able to capture audio of higher frequency than 8000 Hz (without distortion) according to the sampling theorem.
You are also limited when it comes to choose the value for the BufferDuration
-property; it must be between 0.1 and 1 seconds (100 - 1000 ms) in 10ms-steps. This means that you must choose a value of 100, 110, 120, ..., 990, 1000 milliseconds.
When the microphone event BufferReady
is fired, you should call the microphone.GetData(myBuffer)
-method, in order to copy the samples from the microphone's internal buffer to a buffer that belongs to you. The recorded audio comes in the form of a byte-array, but since the samples are actually signed 16-bits integers (i.e. an integer in the range of -32'768 ... 32'767), you will probably need to do some convertion before you can process them.
Using the code
The way this application works is keeping a fixed number of narrow images, here called "(image) slices", arranged in a linked list. The images are rendered on the screen and smoothly moved from the right to the left. When the left-most slice has gone off the screen, it is moved to the far right (still outside the screen) in order to create the illusion of an unlimited number of images.
Each slice holds the rendered samples from the content of one microphone buffer. When the buffer is filled by the microphone mechanism, the rightmost slice (outside of the screen) is rendered with these new samples and started to be moved inwards the screen.
The speed of how fast the slices are moving across the screen is correlated to the duration of the buffer in such a way that the slices are moved a total of "one slice width" during the time the microphone is capturing the next buffer.
Since the buffer of captured audio is rendered as graphic on a texture as soon it is received, there is no reason to keep any old buffer data. Therefore the application only keeps one buffer in memory which is reused over and over.
A flag is set each time the microphone buffer is ready. Since the BufferReady
event is fired on the main thread, there is no need for any lock-mechanism.
In the Update()
-method of the XNA app, the flag is checked whether new data has arrived, and if so, the slice in line is drawn. In the Draw()
-method, the slices are drawn on the screen and slightly moved as time goes by.
Here's a description of the structure of the main Game
-class.
Some constants:
private const int LandscapeWidth = 800;
private const int LandscapeHeight = 480;
private const int SliceMilliseconds = 100;
Fields regarding the microphone and the captured data:
private readonly Microphone microphone;
private readonly byte[] microphoneData;
private readonly TimeSpan screenMilliseconds = TimeSpan.FromSeconds(5);
Choose a color that is almost transparent (the last of the four parameters; it's the red, green, blue and alpha-component of the color). The reason is that many samples are drawn on top of each other, and keeping each individual sample almost see-through makes an interesting visual effect.
private readonly Color sampleColor = new Color(0.4f, 0.9f, 0.2f, 0.07f);
The drawing classes. The white pixel texture is doing all the drawing.
private SpriteBatch spriteBatch;
private Texture2D whitePixelTexture;
The size of each image slice.
private int imageSliceWidth;
private int imageSliceHeight;
There's no need to keep a reference to the linked list itself; just the first and last link. These links keeps references to their neighbors. The currentImageSlice is the one to draw on the next time.
private LinkedListNode<RenderTarget2D> firstImageSlice;
private LinkedListNode<RenderTarget2D> lastImageSlice;
private LinkedListNode<RenderTarget2D> currentImageSlice;
The speed of the slices moving across the screen.
private float pixelsPerSeconds;
In order to know how far the current samples should be moved, the application must keep track of when they appeared.
private float microphoneDataAppearedAtSeconds;
The signal that tells the Update()-method that there is new data to handle.
private bool hasNewMicrophoneData;
The density of samples per pixel.
private int samplesPerPixel;
Here's the constructor. In it the graphics mode is set and the microphone is wired up and asked to start listening.
public Waveform()
{
new GraphicsDeviceManager(this)
{
PreferredBackBufferWidth = LandscapeWidth,
PreferredBackBufferHeight = LandscapeHeight,
IsFullScreen = true,
SupportedOrientations =
DisplayOrientation.Portrait |
DisplayOrientation.LandscapeLeft |
DisplayOrientation.LandscapeRight
};
Content.RootDirectory = "Content";
TargetElapsedTime = TimeSpan.FromTicks(333333);
InactiveSleepTime = TimeSpan.FromSeconds(1);
microphone = Microphone.Default;
microphone.BufferReady += MicrophoneBufferReady;
microphone.BufferDuration = TimeSpan.FromMilliseconds(SliceMilliseconds);
var microphoneDataLength = microphone.GetSampleSizeInBytes(microphone.BufferDuration);
microphoneData = new byte[microphoneDataLength];
microphone.Start();
}
In the XNA's LoadContent nothing is actually loaded since the app is not dependent on any predrawn images. The SpriteBatch is created, the white pixel texture is generated and the image slices are initialized (as black images).
protected override void LoadContent()
{
spriteBatch = new SpriteBatch(GraphicsDevice);
whitePixelTexture = new Texture2D(GraphicsDevice, 1, 1);
var white = new[] { Color.White };
whitePixelTexture.SetData(white);
CreateSliceImages();
}
The CreateSliceImages is calculating how many slices that are needed to cover the entire screen (plus two so there's room for movement). In the end of the method the regular RenderSamples-method is called in order to initial all the images. Since there is no data yet (all samples are zero) it will generate black images.
private void CreateSliceImages()
{
var imageSlicesOnScreenCount = (int)Math.Ceiling(screenMilliseconds.TotalMilliseconds / SliceMilliseconds);
imageSliceWidth = (int)Math.Ceiling((float)LandscapeWidth / imageSlicesOnScreenCount);
imageSliceHeight = LandscapeWidth;
var imageSlices = new LinkedList<RenderTarget2D>();
for (var i = 0; i < imageSlicesOnScreenCount + 2; i++)
{
var imageSlice = new RenderTarget2D(GraphicsDevice, imageSliceWidth, imageSliceHeight);
imageSlices.AddLast(imageSlice);
}
firstImageSlice = imageSlices.First;
lastImageSlice = imageSlices.Last;
currentImageSlice = imageSlices.Last;
pixelsPerSeconds = imageSliceWidth / (SliceMilliseconds / 1000f);
var sampleCount = microphoneData.Length / 2;
samplesPerPixel = (int)Math.Ceiling((float)sampleCount / imageSliceWidth);
var slice = firstImageSlice;
while (slice != null)
{
RenderSamples(slice.Value);
slice = slice.Next;
}
}
The XNA's UnloadContent is just cleaning up what the LoadContent created.
protected override void UnloadContent()
{
spriteBatch.Dispose();
whitePixelTexture.Dispose();
var slice = firstImageSlice;
while (slice != null)
{
slice.Value.Dispose();
slice = slice.Next;
}
}
The event handler to the microphone's BufferReady-event. It copies the data from the microphone buffer and raises the flag that new data has arrived.
private void MicrophoneBufferReady(object sender, EventArgs e)
{
microphone.GetData(microphoneData);
hasNewMicrophoneData = true;
}
The XNA's Update method checks the phone's Back-button to see if it's time to quit. After that it checks the flag to see if new data has been recorded. If so, the new samples are rendered by calling the RenderSamles-method.
protected override void Update(GameTime gameTime)
{
if (GamePad.GetState(PlayerIndex.One).Buttons.Back == ButtonState.Pressed)
{
Exit();
}
if (hasNewMicrophoneData)
{
hasNewMicrophoneData = false;
var currentSeconds = (float)gameTime.TotalGameTime.TotalSeconds;
microphoneDataAppearedAtSeconds = currentSeconds;
RenderSamples(currentImageSlice.Value);
currentImageSlice = currentImageSlice.Next ?? firstImageSlice;
}
base.Update(gameTime);
}
The XNA's Draw-method takes care of drawing the rendered slices. It handles the two screen orientation modes; landscape and portrait, by scaling the images accordingly. If it is landscape mode the height of the images are squeezed and if it is portrait mode the width of the images are squeezed.
When all is setup, the method iterates through the images and render them one-by-one on the screen, adjusted a bit along the X-axis to make up for the time that has passed.
protected override void Draw(GameTime gameTime)
{
GraphicsDevice.Clear(Color.Black);
var screenWidthScale = (float)GraphicsDevice.Viewport.Width / LandscapeWidth;
var scaledWidth = (int)Math.Ceiling(imageSliceWidth * screenWidthScale);
var currentSeconds = (float)gameTime.TotalGameTime.TotalSeconds;
var secondsPassed = currentSeconds - microphoneDataAppearedAtSeconds;
var drawOffsetX = secondsPassed * pixelsPerSeconds;
if (drawOffsetX > scaledWidth)
{
drawOffsetX = scaledWidth;
}
try
{
spriteBatch.Begin();
var imageSlice = currentImageSlice.Previous ?? lastImageSlice;
var destinationRectangle = new Rectangle(
(int)(GraphicsDevice.Viewport.Width + scaledWidth - drawOffsetX),
0,
scaledWidth,
GraphicsDevice.Viewport.Height);
while (destinationRectangle.X > -scaledWidth)
{
spriteBatch.Draw(imageSlice.Value, destinationRectangle, Color.White);
destinationRectangle.X -= scaledWidth;
imageSlice = imageSlice.Previous ?? lastImageSlice;
}
}
finally
{
spriteBatch.End();
}
base.Draw(gameTime);
}
The RenderSamples is taking a RenderTarget2D as an argument, which is the texture to be drawn on. The routine iterates through the samples and render them one by one.
private void RenderSamples(RenderTarget2D target)
{
try
{
GraphicsDevice.SetRenderTarget(target);
GraphicsDevice.Clear(Color.Black);
spriteBatch.Begin(SpriteSortMode.Deferred, BlendState.Additive);
var x = 0;
var sampleCount = microphoneData.Length / 2;
var sampleIndex = 0;
var halfHeight = imageSliceHeight / 2;
const float SampleFactor = 32768f;
for (var i = 0; i < sampleCount; i++)
{
if ((i > 0) && ((i % samplesPerPixel) == 0))
{
x++;
}
var sampleValue = BitConverter.ToInt16(microphoneData, sampleIndex) / SampleFactor;
var sampleHeight = (int)Math.Abs(sampleValue * halfHeight);
var y = (sampleValue < 0)
? halfHeight
: halfHeight - sampleHeight;
var destinationRectangle = new Rectangle(x, y, 1, sampleHeight);
spriteBatch.Draw(
whitePixelTexture,
destinationRectangle,
sampleColor);
sampleIndex += 2;
}
}
finally
{
spriteBatch.End();
GraphicsDevice.SetRenderTarget(null);
}
}
You can download the solution file from here.