Introduction
When you are listening to your favorite song, a little visual entertainment just makes your favorite song even more enjoyable. As a techie and an audio/music enthusiast, I like to see the technical details of everything, even my music. This interactive 3D spectrum analyzer not only provides an audio visualization that is appealing to the eye, it also shows some details of how sounds change over time to help us understand more about how audio works.
This project uses DirectX 9.0c to do the 3D rendering, and integrates with Windows Media Player. It's tested only on Vista Home Premium, but it should work on XP as long as you have Windows Media Player 11 and the DirectX 9.0c redistributables installed.
Getting Started
If you just want to install the binaries, you will need to make sure the DirectX 9.0c redistributables and Windows Media Player 11 are installed on your system. The attached installer should install everything else for you.
To build the project, you will need the DirectX 9.0c SDK and the Windows Platform SDK version 6.1. Your graphics card should support the DirectX shader model 3 (vs_3_0 and ps_3_0). The DirectX includes and libraries should be in their appropriate search paths. I have the paths to the Windows SDK configured in the project file, so those paths should not have to be changed if you have installed the SDK in the default location. When building on Windows Vista, Visual Studio will unsuccessfully try to register WM3DSpectrum.dll as part of the build process. You'll need to run "regsvr32 WM3DSpectrum.dll" from an Administrator privileged command prompt to register WM3DSpectrum.dll on Vista.
If you want to make your own visualization from scratch, you can use the WMP SDK which is part of the Windows Platform SDK. A good overview of how to get started can be found here. Follow the directions carefully because the little details make a big difference.
Why DirectX 9.0c
While it's true that DirectX 10 make a few things simpler, DirectX 10 is basically DirectX 9 with a little reorganization. DirectX 10 contains Microsoft's reorganization of the graphics pipeline for Windows Vista. DirectX 10 also adds the DXGI framework. DXGI basically facilitates the use of the graphics processor without tying the DirectX device directly to a window. The GPUSpectogram project included with the DirectX 10 SDK shows an example of a windowless DirectX 10 device. GPUSpectogram creates a bitmap spectogram without associating the rendered bitmap to a window handle. Windows XP uses a simpler graphics pipeline that doesn't allow room for the features of DirectX 10 and DXGI. No wonder DirectX 10 can only work with Windows Vista!
Another significant benefit of using DirectX 10 is the fact that you don't have delete objects that exist in the GPU when a user does something like change the screen resolution. When a user does certain things like change the screen resolution in DirectX 9, you essentially "lose" your DirectX device object because the hardware configuration changed. With DirectX 9, to recover after you lose the device, you start by calling TestCooperativeLevel()
on the device object. If the return value is D3DERR_DEVICENOTRESET
, you need to release all objects in the default object pool (objects that exist in the GPU), call OnLostDevice()
on things like fonts and sprites, call Reset()
on the device, recreate all of your objects that you need in the default pool, and then call OnResetDevice()
on objects like fonts and sprites. I know it sounds complicated, but it really isn't that difficult. This project provides a sample of how to handle this issue with DirectX 9. DirectX 10 doesn't require you to do anything special when you “lose” the device.
So still, “Why DirectX 9?” Well, as of the time this article was originally written (5/7/2009), the previous months' reports from hitslink.com show that Windows XP still has about 62% Operating System market share. Vista has only about 24%. These statistics where gathered from here. Vista and XP currently have a combined 86% market share of all Operating Systems. (62% + 24% = 86%) You can see that XP still has 74% of the combined Vista and XP markets share. (62% / 86% * 100 = 74%) Although Vista is a great Operating System, building applications for DirectX 10 would make the application usable for a limited number of people - Vista users only.
It's funny that after throwing out all of those statistics that I don't even have an XP machine to test this on. I've switched to Vista. :) Your feedback on how this works with Windows Media Player 11 and Windows XP would be useful.
A Picture is Worth a Thousand Words
I'm not going to go deep into sampling theory and other random DSP topics, but I must comment that one of the most difficult aspects of DSP programming is sometimes the fact that you can't easily see how sound works. To do something like develop a new lossless audio compression format or a cool audio effect tool, you have to be so familiar with how audio works that you can essentially “see” sound.
An audio stream is made up of separate frequencies that, when combined, make a hopefully harmonious single sound. If you've ever seen the movie “Drum Line”, you may be familiar with the band's motto “One Band, One Sound!” So, how do the various sounds become one? Through close associations that can be modeled with certain principles of Physics. This website lists the frequencies of the musical notes in audio. You'll notice that the notes repeat C, D, E, F, G, A, B, C, D, E, F, G, A, B, and so on where C is considered the starting point and each successive C has a frequency of twice the frequency of the previous C. For example, C4 (middle C) is twice the frequency of C3. C4 is 261.63 Hz, and C3 is 130.81 Hz. You'll see in the 3D spectrum analyzer that, often, many audibly separate sounds are packed into the lower frequencies. This is due to the nature of audio physics.
Pictures of the Code
The design of this visualization isn't extraordinary, but it is object oriented. The image below offers a little insight into the design.
Windows Media Player supports two modes for visualization – windowed mode, and non-windowed mode. I'm assuming that the non-windowed mode is for when Media Player is being hosted as an ActiveX control. I haven't looked very deep into non-windowed mode because it doesn't seem to be relevant to this project.
To support the windowed vs. non-windowed features transparently, there is an IRenderer
interface that both modes can use. There is a CWindowedRenderer
class and a CNonWindowedRenerer
class which respectively do the rendering in windowed and non-windowed mode using the IRenderer
interface.
Most of the core rendering information is stored in the RenderContext
structure. The only instance of the RenderContext
structure is stored in the root WM3DSpectrum
COM object. A pointer to that RenderContext
structure is passed throughout the rendering hierarchy to all of the objects that need to use it.
If you want to add an additional 3D object into the scene, you can implement the IRenderable
interface in a class and add that class to one of the vectors of renderable objects. You should add your new class to the renderables
vector in the WM3DSpectrum
constructor. All of your renderable objects will be rendered in the order that they exist in the vector.
This project currently supports 8 different visualizations. There are two color schemes which I call “Rose Garden” and “City Lights”. You can render either in solid mode or point mode. There are two interpolation options – Linear and Smooth. Linear is really just a simple average that converts the 1024 separate frequencies supplied by Windows Media Player into 512 separate frequencies. The Smooth interpolator basically does the same thing as the Linear interpolator but it additionally averages the surrounding frequencies of each frequency level with its neighboring frequency levels. Here's the Linear interpolator's code:
void CLinearInterpolator::PrepareInterpolation(TimedLevel* pLevel )
{
int x, y; const
int xmax = 512; for( x = 0, y = 0; x < xmax; x++, y+=2)
{
m_LevelCacheL[x] = (unsigned char)((int)(*pLevel).frequency[0][y] +
(int)(*pLevel).frequency[0][y + 1]) / 2;
m_LevelCacheR[x] = (unsigned char)((int)(*pLevel).frequency[1][y] +
(int)(*pLevel).frequency[1][y + 1]) / 2;
}
}
and here is the Smooth interpolator's code:
void CSmoothInterpolator::PrepareInterpolation( TimedLevel* pLevel )
{
int x, y;
const int xmax = 512;
for( x = 0, y = 0; x < xmax; x++, y+=2)
{
m_LevelCacheL[x] = (unsigned char)((int)(*pLevel).frequency[0][y] +
(int)(*pLevel).frequency[0][y + 1]) / 2;
m_LevelCacheR[x] = (unsigned char)((int)(*pLevel).frequency[1][y] +
(int)(*pLevel).frequency[1][y + 1]) / 2;
}
const int radius = 10;
for( x = 0; x < xmax; x++)
{
float sumL = 0.0f, sumR = 0.0f;
int count = 0;
for(y = x - radius; y < x + radius; y++)
{
sumL += m_LevelCacheL[ max(0,min(xmax - 1,y)) ];
sumR += m_LevelCacheR[ max(0,min(xmax - 1,y)) ];
count++;
}
m_LevelCacheL[x] = (unsigned char)(sumL / (float)count);
m_LevelCacheR[x] = (unsigned char)(sumR / (float)count);
}
}
You can select any combination of the visualization modes listed above from the right-click menu in Windows Media Player. Here are some additional images of the visualization:
Simplifying Things with DirectX
At any given moment, depending on whether you are using point mode or solid mode, there's either over a million vertex points or over 3 million triangles being displayed in this visualization. The heights of all of the vertices need to be shifted and the new frequencies need to be added on each frame. With there being so many vertices, continuously moving all of this memory around on the CPU could be a performance problem.
A simple way to speed things up is to use the GPU and Ping-Pong textures. You can use Ping-Pong textures by creating two (or more) textures, then render the textures to each other. When you render the textures to each other, you have the option of using shaders to do some GPGPU processing. This project doesn't really need to do anything really fancy. We just need to use the GPU to move memory around.
Since the only changing aspect of the vertices is the heights - the y position, we can use Ping-Pong textures to hold a height map. We basically build a spectrogram in a couple of Ping-Pong textures using the alternate texture to do memory movement. The image below illustrates this.
Since DirectX doesn't allow developers to access textures that exist on the GPU, you have to copy your GPU textures (textures in the default pool) over to a main memory texture (a texture in the system pool) so you can read or write to your GPU texture. You can use GetRenderTargetData()
to read texture data, and UpdateSurface()
to write texture data, but you can only read or write to textures that exist in the system pool. You, therefore, need a swap texture – a texture in the system pool that you can use to copy texture data to and from the GPU. The image below offers a visual of this scenario.
When copying data from Texture A to Texture B for instance, we need to shift all of the data up, so we just use a sprite object to copy the bottom 1023 rows of pixels of Texture A to the top 1023 rows of pixels of Texture B. After tying the contents of Texture B to a swap texture, we then use GetRenderTargetData()
and UpdateSurface()
to add the new data into the bottom of Texture B. We now have two dimensional, parallel memory copy functionality that allows the visualization to run with significantly fewer CPU cycles.
Displaying the Data
Shaders sometimes seem intimidating because they're usually mathematically intense and they, at first, seem a bit foreign. It's not really as difficult as it may seem at first glance. Getting started is probably the hardest part. Here's a quick 10 second tutorial on shaders: in DirectX 9, there are basically two types of shaders, the pixel shader and the vertex shader. The vertex shader is generally used to modify or produce vertices, and the pixel shader generally modifies or produces colors. Both the pixel shader and the vertex shader can access global shader variables. The vertex shader executes first, and can pass information on to the pixel shader. DirectX allows you to define a vertex declaration object which defines all of the vertex usage type information for the data that is initially passed into the vertex shader. You can use the SetVertexDeclaration()
method to bind a vertex declaration to a device, and use the SetStreamSource()
method to bind your initial vertex data to a device. The data that can be passed in and out of the vertex shader, and possibly passed on to the pixel shader, is usually also defined in your shader code in a structure like this:
struct OutputVS
{
float4 posH :POSITION0;
float4 color :COLOR0;
};
posH
, for example is the name of a variable in the OutputVS
structure as with normal C/C++ code, but you'll notice that there are extra names added to the end of the structure. The extra names (POSITION0
and COLOR0
) are the usage identifiers. The usage identifiers originate with your vertex declaration, and they tell DirectX how to treat the variable in the GPU. These usage identifiers are also used to define function parameters for shaders. For example, in the function definition:
OutputVS ColorVSRoseGarden(float3 posL : POSITION0)
{
...
}
posL
is a function-local copy of the value that corresponds to posH
in the OutputVS
structure. The POSITION0
usage identifier creates the relationship between the value in the structure and the value that is passed into the shader. The numbers at the end of the usage type can go up into multiple digit values.
The shader in this spectrum analyzer is fairly simple; it mainly relates data from the ping-pong textures into vertex heights, and it implements the color schemes. Texture data is normally read from the pixel shader, but we need to read the texture data from our vertex shader. We have to use at least vertex shader version 3 (vs_3_0), and our texture format needs to be D3DFMT_A32B32G32R32F
since this is the format that the tex2Dlod
function deals with nicely.
Conclusion
There you have it – a not so processor intense 3D spectrum analyzer that works with Windows Media Player. It would be nice to do continuous overcomplete FFTs to give us a higher resolution view of the audio, but I'll leave that for another article.
LICENSE
This project is licensed under a license that I wrote which allows the project to be used for educational purposes only. You'll find a copy of the license at the top of all of the source files in the project.
References
- [1] Luna, F.D. (2003) "Introduction to 3D Game Programming with DirectX 9.0" pp. 326-333 Wordware Publishing Inc. Plano, Texas.
History
None yet. I guess the code is perfect! :)