(untagged)

Interactive 3D Spectrum Analyzer Visualization for Windows Media Player

Carlo McWhirter

0.00/5 (No votes)

17 May 2009

Interactive 3D Spectrum Analyzer for Windows Media Player using DirectX 9 and some light GPGPU.

City Lights Color Scheme

Introduction

When you are listening to your favorite song, a little visual entertainment just makes your favorite song even more enjoyable. As a techie and an audio/music enthusiast, I like to see the technical details of everything, even my music. This interactive 3D spectrum analyzer not only provides an audio visualization that is appealing to the eye, it also shows some details of how sounds change over time to help us understand more about how audio works.

This project uses DirectX 9.0c to do the 3D rendering, and integrates with Windows Media Player. It's tested only on Vista Home Premium, but it should work on XP as long as you have Windows Media Player 11 and the DirectX 9.0c redistributables installed.

Getting Started

If you just want to install the binaries, you will need to make sure the DirectX 9.0c redistributables and Windows Media Player 11 are installed on your system. The attached installer should install everything else for you.

To build the project, you will need the DirectX 9.0c SDK and the Windows Platform SDK version 6.1. Your graphics card should support the DirectX shader model 3 (vs_3_0 and ps_3_0). The DirectX includes and libraries should be in their appropriate search paths. I have the paths to the Windows SDK configured in the project file, so those paths should not have to be changed if you have installed the SDK in the default location. When building on Windows Vista, Visual Studio will unsuccessfully try to register WM3DSpectrum.dll as part of the build process. You'll need to run "regsvr32 WM3DSpectrum.dll" from an Administrator privileged command prompt to register WM3DSpectrum.dll on Vista.

If you want to make your own visualization from scratch, you can use the WMP SDK which is part of the Windows Platform SDK. A good overview of how to get started can be found here. Follow the directions carefully because the little details make a big difference.

Why DirectX 9.0c

While it's true that DirectX 10 make a few things simpler, DirectX 10 is basically DirectX 9 with a little reorganization. DirectX 10 contains Microsoft's reorganization of the graphics pipeline for Windows Vista. DirectX 10 also adds the DXGI framework. DXGI basically facilitates the use of the graphics processor without tying the DirectX device directly to a window. The GPUSpectogram project included with the DirectX 10 SDK shows an example of a windowless DirectX 10 device. GPUSpectogram creates a bitmap spectogram without associating the rendered bitmap to a window handle. Windows XP uses a simpler graphics pipeline that doesn't allow room for the features of DirectX 10 and DXGI. No wonder DirectX 10 can only work with Windows Vista!

Another significant benefit of using DirectX 10 is the fact that you don't have delete objects that exist in the GPU when a user does something like change the screen resolution. When a user does certain things like change the screen resolution in DirectX 9, you essentially "lose" your DirectX device object because the hardware configuration changed. With DirectX 9, to recover after you lose the device, you start by calling TestCooperativeLevel() on the device object. If the return value is D3DERR_DEVICENOTRESET, you need to release all objects in the default object pool (objects that exist in the GPU), call OnLostDevice() on things like fonts and sprites, call Reset() on the device, recreate all of your objects that you need in the default pool, and then call OnResetDevice() on objects like fonts and sprites. I know it sounds complicated, but it really isn't that difficult. This project provides a sample of how to handle this issue with DirectX 9. DirectX 10 doesn't require you to do anything special when you “lose” the device.

So still, “Why DirectX 9?” Well, as of the time this article was originally written (5/7/2009), the previous months' reports from hitslink.com show that Windows XP still has about 62% Operating System market share. Vista has only about 24%. These statistics where gathered from here. Vista and XP currently have a combined 86% market share of all Operating Systems. (62% + 24% = 86%) You can see that XP still has 74% of the combined Vista and XP markets share. (62% / 86% * 100 = 74%) Although Vista is a great Operating System, building applications for DirectX 10 would make the application usable for a limited number of people - Vista users only.

It's funny that after throwing out all of those statistics that I don't even have an XP machine to test this on. I've switched to Vista. :) Your feedback on how this works with Windows Media Player 11 and Windows XP would be useful.

A Picture is Worth a Thousand Words

I'm not going to go deep into sampling theory and other random DSP topics, but I must comment that one of the most difficult aspects of DSP programming is sometimes the fact that you can't easily see how sound works. To do something like develop a new lossless audio compression format or a cool audio effect tool, you have to be so familiar with how audio works that you can essentially “see” sound.

An audio stream is made up of separate frequencies that, when combined, make a hopefully harmonious single sound. If you've ever seen the movie “Drum Line”, you may be familiar with the band's motto “One Band, One Sound!” So, how do the various sounds become one? Through close associations that can be modeled with certain principles of Physics. This website lists the frequencies of the musical notes in audio. You'll notice that the notes repeat C, D, E, F, G, A, B, C, D, E, F, G, A, B, and so on where C is considered the starting point and each successive C has a frequency of twice the frequency of the previous C. For example, C4 (middle C) is twice the frequency of C3. C4 is 261.63 Hz, and C3 is 130.81 Hz. You'll see in the 3D spectrum analyzer that, often, many audibly separate sounds are packed into the lower frequencies. This is due to the nature of audio physics.

Natural Progression of the Frequencies of Notes

Pictures of the Code

The design of this visualization isn't extraordinary, but it is object oriented. The image below offers a little insight into the design.

Architectural Overview

Windows Media Player supports two modes for visualization – windowed mode, and non-windowed mode. I'm assuming that the non-windowed mode is for when Media Player is being hosted as an ActiveX control. I haven't looked very deep into non-windowed mode because it doesn't seem to be relevant to this project.

To support the windowed vs. non-windowed features transparently, there is an IRenderer interface that both modes can use. There is a CWindowedRenderer class and a CNonWindowedRenerer class which respectively do the rendering in windowed and non-windowed mode using the IRenderer interface.

Most of the core rendering information is stored in the RenderContext structure. The only instance of the RenderContext structure is stored in the root WM3DSpectrum COM object. A pointer to that RenderContext structure is passed throughout the rendering hierarchy to all of the objects that need to use it.

If you want to add an additional 3D object into the scene, you can implement the IRenderable interface in a class and add that class to one of the vectors of renderable objects. You should add your new class to the renderables vector in the WM3DSpectrum constructor. All of your renderable objects will be rendered in the order that they exist in the vector.

This project currently supports 8 different visualizations. There are two color schemes which I call “Rose Garden” and “City Lights”. You can render either in solid mode or point mode. There are two interpolation options – Linear and Smooth. Linear is really just a simple average that converts the 1024 separate frequencies supplied by Windows Media Player into 512 separate frequencies. The Smooth interpolator basically does the same thing as the Linear interpolator but it additionally averages the surrounding frequencies of each frequency level with its neighboring frequency levels. Here's the Linear interpolator's code:

// Prepare the interpolation 
void CLinearInterpolator::PrepareInterpolation(TimedLevel* pLevel ) 
{ 
    int x, y; const
    int xmax = 512; for( x = 0, y = 0; x < xmax; x++, y+=2) 
    { 
        // Scale the 1024 separate frequencies down to 512
        // separate frequencies linearly (by average). 
        m_LevelCacheL[x] = (unsigned char)((int)(*pLevel).frequency[0][y] + 
                           (int)(*pLevel).frequency[0][y + 1]) / 2;
        m_LevelCacheR[x] = (unsigned char)((int)(*pLevel).frequency[1][y] + 
                           (int)(*pLevel).frequency[1][y + 1]) / 2; 
    } 
}

and here is the Smooth interpolator's code:

// Prepare the interpolation
void CSmoothInterpolator::PrepareInterpolation( TimedLevel* pLevel ) 
{ 
    int x, y;
    const int xmax = 512; 
    for( x = 0, y = 0; x < xmax; x++, y+=2) 
    { 
        // Scale the 1024 separate frequencies down to 512
        // separate frequencies linearly (by average).
        m_LevelCacheL[x] = (unsigned char)((int)(*pLevel).frequency[0][y] + 
                           (int)(*pLevel).frequency[0][y + 1]) / 2;
        m_LevelCacheR[x] = (unsigned char)((int)(*pLevel).frequency[1][y] + 
                           (int)(*pLevel).frequency[1][y + 1]) / 2; 
    } 

    // Smooth all of the frequency samples using
    // a linear average with a given radius. 
    const int radius = 10; 
    for( x = 0; x < xmax; x++) 
    { 
        float sumL = 0.0f, sumR = 0.0f; 
        int count = 0; 
        for(y = x - radius; y < x + radius; y++)
        { 
            sumL += m_LevelCacheL[ max(0,min(xmax - 1,y)) ];
            sumR += m_LevelCacheR[ max(0,min(xmax - 1,y)) ];
            count++; 
        } 
        
        m_LevelCacheL[x] = (unsigned char)(sumL / (float)count); 
        m_LevelCacheR[x] = (unsigned char)(sumR / (float)count); 
    } 
}

You can select any combination of the visualization modes listed above from the right-click menu in Windows Media Player. Here are some additional images of the visualization:

City Lights Points

Rose Garden

Rose Garden Points

Simplifying Things with DirectX

At any given moment, depending on whether you are using point mode or solid mode, there's either over a million vertex points or over 3 million triangles being displayed in this visualization. The heights of all of the vertices need to be shifted and the new frequencies need to be added on each frame. With there being so many vertices, continuously moving all of this memory around on the CPU could be a performance problem.

A simple way to speed things up is to use the GPU and Ping-Pong textures. You can use Ping-Pong textures by creating two (or more) textures, then render the textures to each other. When you render the textures to each other, you have the option of using shaders to do some GPGPU processing. This project doesn't really need to do anything really fancy. We just need to use the GPU to move memory around.

Since the only changing aspect of the vertices is the heights - the y position, we can use Ping-Pong textures to hold a height map. We basically build a spectrogram in a couple of Ping-Pong textures using the alternate texture to do memory movement. The image below illustrates this.

Ping-Pong Textures

Since DirectX doesn't allow developers to access textures that exist on the GPU, you have to copy your GPU textures (textures in the default pool) over to a main memory texture (a texture in the system pool) so you can read or write to your GPU texture. You can use GetRenderTargetData() to read texture data, and UpdateSurface() to write texture data, but you can only read or write to textures that exist in the system pool. You, therefore, need a swap texture – a texture in the system pool that you can use to copy texture data to and from the GPU. The image below offers a visual of this scenario.

Ping-Pong Textures with a Memory Swap Texture in DirectX 9.0c

When copying data from Texture A to Texture B for instance, we need to shift all of the data up, so we just use a sprite object to copy the bottom 1023 rows of pixels of Texture A to the top 1023 rows of pixels of Texture B. After tying the contents of Texture B to a swap texture, we then use GetRenderTargetData() and UpdateSurface() to add the new data into the bottom of Texture B. We now have two dimensional, parallel memory copy functionality that allows the visualization to run with significantly fewer CPU cycles.

Displaying the Data

Shaders sometimes seem intimidating because they're usually mathematically intense and they, at first, seem a bit foreign. It's not really as difficult as it may seem at first glance. Getting started is probably the hardest part. Here's a quick 10 second tutorial on shaders: in DirectX 9, there are basically two types of shaders, the pixel shader and the vertex shader. The vertex shader is generally used to modify or produce vertices, and the pixel shader generally modifies or produces colors. Both the pixel shader and the vertex shader can access global shader variables. The vertex shader executes first, and can pass information on to the pixel shader. DirectX allows you to define a vertex declaration object which defines all of the vertex usage type information for the data that is initially passed into the vertex shader. You can use the SetVertexDeclaration() method to bind a vertex declaration to a device, and use the SetStreamSource() method to bind your initial vertex data to a device. The data that can be passed in and out of the vertex shader, and possibly passed on to the pixel shader, is usually also defined in your shader code in a structure like this:

struct OutputVS
{
    float4 posH :POSITION0; 
    float4 color :COLOR0; 
};

posH, for example is the name of a variable in the OutputVS structure as with normal C/C++ code, but you'll notice that there are extra names added to the end of the structure. The extra names (POSITION0 and COLOR0) are the usage identifiers. The usage identifiers originate with your vertex declaration, and they tell DirectX how to treat the variable in the GPU. These usage identifiers are also used to define function parameters for shaders. For example, in the function definition:

OutputVS ColorVSRoseGarden(float3 posL : POSITION0)
{ 
    ... 
}

posL is a function-local copy of the value that corresponds to posH in the OutputVS structure. The POSITION0 usage identifier creates the relationship between the value in the structure and the value that is passed into the shader. The numbers at the end of the usage type can go up into multiple digit values.

The shader in this spectrum analyzer is fairly simple; it mainly relates data from the ping-pong textures into vertex heights, and it implements the color schemes. Texture data is normally read from the pixel shader, but we need to read the texture data from our vertex shader. We have to use at least vertex shader version 3 (vs_3_0), and our texture format needs to be D3DFMT_A32B32G32R32F since this is the format that the tex2Dlod function deals with nicely.

Conclusion

There you have it – a not so processor intense 3D spectrum analyzer that works with Windows Media Player. It would be nice to do continuous overcomplete FFTs to give us a higher resolution view of the audio, but I'll leave that for another article.

LICENSE

This project is licensed under a license that I wrote which allows the project to be used for educational purposes only. You'll find a copy of the license at the top of all of the source files in the project.

References

[1] Luna, F.D. (2003) "Introduction to 3D Game Programming with DirectX 9.0" pp. 326-333 Wordware Publishing Inc. Plano, Texas.

History

None yet. I guess the code is perfect! :)

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here