Introduction
I've been playing an electric guitar for the last two years, but only a few months ago I started to learn with a professional teacher. The level of complexity rose incredibly - from just trying to playing some chords, I'm now playing difficult licks which are pretty hard to get right when a new song is learned. My guitar teacher uses an application to slow down the playback - Amazing Slow Downer.
It is commercial and I have no plans to pay money for it (surely it is affordable, but open source is better and I can tweak it to my preferences).
An open source alternative that I found is called BestPractice. This utility works fine on Windows XP. But it crashes on Windows 7 Pro 64 bit, the playback quality could have been better, the UI could have been better designed (e.g., volume bar has a tiny thumb button) and most importantly, it is missing a key feature - Presets. I downloaded the source code, but found out that the code was written in Borland C++ Builder - not quite the development environment I was looking forward to develop in.
User Interface and Presets are a point that is worth talking about - In order to get my guitar to have amazing sound effects, I purchased the Boss ME-70 (as shown below) a while ago.
This piece of hardware is truly amazing, not just because of the great effects and sound quality, but because the UI design is brilliant. There are no menus - everything is just laid out for immediate intuitive usage. The best part (for me) are presets - You can manually change some settings (for example: Amplifier emulation mode, Compressor, and Delay), then once you're happy with the sound, you can keep it by simply saving it to a persistent preset. This is great for practice - you come back the next day and that preset is waiting for you. No need to set it all up from scratch.
I really wanted these presets in a playback practice tool - the ability to define a lick with some slow down factor and perhaps some volume, maybe also set a loop that allows repetitive practice on that section. Once that "playback section" was properly defined, I wanted to be able to save to a preset just like I do with my Boss ME-70.
Previously, when I used BestPractice, I had to write down all the settings on paper and then manually 'dial' them in every time I started practice. With Practice#, I wanted to be able to make these settings persistent and also be able to save the same settings as two or more presets each having a different slow down (speed) so the lick can be played for example as 70% when you start, 85% when you feel more confident and finally 100% for regular speed. So once I had my internal 'Business Requirement', it was time to move on to Design and Implementation.
Architecture
I decided to go with .NET 2.0, C# and Windows Forms because the .NET Framework/C# is easy and fast to develop with (in particular UI), Windows Forms is very powerful and Visual Studio Express is free.
The architecture diagram shows the different layers of Practice#. Starting from top to bottom:
- User Interface - Responsible for user interface logic and also for controlling the Core Audio Logic ("Back-End") layer
- Presets - A collection of persistent user defined settings such as Tempo, Pitch, Volume, etc.
- Core Audio Logic - Contains all the specific audio processing logic that is needed for Practice# to function. Controls and coordinates the other frameworks and libraries (NAudio, SoundTouch, Vorbis#). Does not handle any User Interface logic.
- NAudio - (3rd Party) An audio playback platform that handles all the lower level API needed to play audio on an operating system.
SoundTouchSharp
- A managed C# Interop wrapper to SoundTouch
. SoundTouch
- (3rd Party) An audio DSP library that allows time stretching (tempo changes) and pitch changes. - Vorbis# - (3rd Party) A library that reads Ogg Vorbis files and returns the compressed samples as uncompressed PCM samples that can be processed by NAudio.
LibFlacSharp
- A managed C# Interop wrapper to libFlac
- libFlac - (3rd Party) A C library that reads FLAC files and returns the compressed samples as uncompressed PCM samples that can be processed by NAudio.
Design
Time Stretching
The biggest problem was how to stretch time - how to change the audio speed (or tempo) without changing its pitch. What exactly does it mean? If you have some audio file (e.g. mp3 or wav) and you just play it faster by sampling every second sample, then it will run twice as fast as the original audio but the pitch (tone of sound) would go up two times higher - sounding like cartoons. The same problem happens when you slow down an audio file using a 'naïve' brute-force way: The pitch will go down, e.g., the singer that used to be a Alto will become Bass. That's not good - the sound should have the same pitch, just slower.
A somewhat similar problem is how to change the pitch of the audio without changing its speed. That use-case is not as useful as the prior use-case but can still be used in some cases - for example for matching keys. Both use cases - Time Stretching and Pitch Changing are similar since they only change one parameter (time or pitch) but without affecting the other playback parameter (pitch or time, respectively).
The theory behind this problem is very interesting, but beyond the scope of this article. For those who wish to learn more about the theory behind Time Stretching and Pitch Changing, please refer to this page on DSPDimension. The topic is described there in detail, quite comprehensive and compares different algorithms with audio examples.
The basic requirements for a Time Stretching library are:
- It has to be open source and LGPL.
- It has to run on Windows 32/64.
- The audio quality must be good.
- An adequate API has to be provided and it has to work with .NET properly. Managed code is preferred but not mandatory.
- It must have high performance (low CPU usage) and small latency. High latency is OK for batch utilities, but a front end practicing playback utility must have low latency.
SoundTouch
The only candidate library that matched these requirements was SoundTouch. SoundTouch
is a LGPL C++ Library that provides an API for performing Time Stretching and Pitch Changing. SoundTouch
's quality is pretty good as can be heard from these samples. The main challenge with this library was how to use it from a managed .NET application since it is a native C++ DLL.
SoundTouchSharp
In order to achieve this integration between .NET and SoundTouch
, a wrapper was written - It is called SoundTouchSharp
. Basically SoundTouchSharp
is an C# Interop Class that wraps the SoundTouch
C++ native DLL and exposes the DLL's functions as C# managed API. Note: SoundTouchSharp
can be used out of the scope of Practice# for applications that need to implement Time Stretching or Pitch Changing. If, for example, an ASP.NET Web Application needs that functionality, it can use SoundTouchSharp
together with SoundTouch
.
The main API methods are:
public void SetTempo(float newTempo)
public void PutSamples(float[] pSamples, uint numSamples)
public uint ReceiveSamples(float[] pOutBuffer, uint maxSamples)
The method SetTempo()
sets the tempo (or speed) of the playback. It should be set before putting samples into the SoundTouch
queue. The method PutSamples()
puts samples into SoundTouch
queue. To receive these samples back as time stretched (or pitched changed) samples, the method ReceiveSamples()
needs to be called. When the tempo is not 100% (i.e., regular speed), then it is important to understand that the number of received samples are different than the number of sample put in the queue, due to the inherent nature time stretching. Therefore, the client calling ReceiveSamples()
needs to take this fact into account - ReceiveSamples
needs to be called until the internal buffer in SoundTouch
has no more samples to return. One call of PutSamples()
with X samples might require more than one call to ReceiveSamples()
, depending on the tempo value set in SetTempo()
.
Audio Playback Framework
I had some experience with Audio playback on Windows, but that was done in C++ and DirectSound
and it was a pain to directly deal with DirectSound
, generally speaking. My target was to write a practice tool that allows practicing without wasting too much time on core technologies as playing sound.
DirectSound
is also unmanaged. Luckily, there is a very good managed library for Audio processing and playback - NAudio. NAudio takes care of all of the low level APIs (like DirectSound
) and provides a simple interface which is easy to use, but also easy to extend. With NAudio, I managed to play sound files after a few minutes, but for dynamic time stretched playback, that was not enough. The main requirement from an interactive practicing tool is to be able to change the tempo on-the-fly. Therefore, a special audio processor was needed, one that can handle samples of different tempos.
Based on an idea that appeared on NAudio's discussion site, the AdvancedBufferedWaveProvider
class was created. It manages a queue of audio buffers each starting on a different time (CurrentTime
). The AdvancedBufferedWaveProvider
doesn't hold too many audio buffers in its queue, new audio buffers are dynamically added to the queue all the time, as needed, with a dynamic time stretched parameter.
This technique allows the user to change the pitch on-the-fly and get the sound to change with low latency.
Ogg Vorbis
I really like open source products and open source formats. That is why I felt odd with a LGPL utility that can only play back WAV (Uncompressed and basically a Microsoft format) or MP3 (Compressed but proprietary). Ogg Vorbis is compressed and free - Why not use it? Unfortunately, NAudio (as of writing of this article) does not support out-of-the-box playback of Ogg Vorbis files. I had to come up with some solution - Luckily, there is a LGPL library named Vorbis# (or csvorbis) that does provide Ogg Vorbis playback support for managed code. Vorbis# is a port of the Jorbis Java library which itself is a port of the original xiph orbis decoder which was written in C. In order to allow Vorbis# to work with NAudio, an assembly (project) was written: NAudioOggVorbis
. NAudioOggVorbis
encapsulates the Vorbis# code and also provides an NAudio Ogg Vorbis adapter class, OggVorbisFileReader
, which plugs-in into NAudio as it inherits from the NAudio core abstract class: WaveStream
. NAudio takes care of the handling audio playback logic and commands OggVorbisFileReader
to return back buffers and/or change file positions when needed. OggVorbisFileReader
delegates these requests to a Vorbis# class VorbisFile
which takes care of decoding the compressed Ogg Vorbis packets into uncompressed PCM packets. VorbisFile
is a high level wrapper of the decoding logic that is implemented by the many Vorbis# classes. Using it as a single point of entry keeps the client code in OggVorbisFileReader
nice and clean.
The code OggVorbisFileReader
that uses VorbisFile
to decode the next packet:
public override int Read(byte[] sampleBuffer, int offset, int numBytes)
{
int bytesRead = 0;
lock (m_repositionLock)
{
bytesRead = m_vorbisFile.read(sampleBuffer, numBytes,
_BIGENDIANREADMODE, _WORDREADMODE, _SGNEDREADMODE, null);
}
return bytesRead;
}
Once PCM packets are returned to NAudio, NAudio plays them as if they came from any other source - i.e., NAudio has no idea that the packets were originally Ogg Vorbis encoded packets, nor should it care about this fact. That approach proved to be quick and easy to implement - Ogg Vorbis are played back (and slowed down) just like WAV and MP3 files. Mission accomplished.
FLAC
Encouraged by the success of adding Ogg Vorbis playback (and later WMA playback, the details of which I left out), I decided to go on and add support for one more file format (decoder) - FLAC. Ogg Vorbis is a great open source audio format but it is lossy. A Free Lossless Audio Codec (AKA FLAC) which is also compressed would be a nice thing to have.
FLAC is starting to emerge as 'THE' format for audiophiles - its usability is really superb (in terms of quality and file size) and it is very fast to decode.
So, after some Googling and research, I stumbled upon a very nice demo (written by Stanimir Stoyanov) that shows how to decode FLAC files in C#. The demo was decoding FLAC by communicating with the official libFlac
C API through P/Invoke calls.
I took Stan's code, added some missing APIs needed for decoding (e.g. Meta data, seek absolute) and re-factored it to be a new C# managed integration layer to libFlac
API: LibFlacSharp
. I've put Decoder and Encoder API together, though Practice# only uses the decoding API.
LibFlacSharp
is a class that can be used by any C# client (not just for Practice#).
Its main decoding API is: (The API is documented very well in libFlac web site.)
public class LibFLACSharp
{
...
public static extern IntPtr FLAC__stream_decoder_new();
public static extern bool
FLAC__stream_decoder_process_until_end_of_metadata(IntPtr context);
public static extern bool FLAC__stream_decoder_process_single(IntPtr context);
public static extern bool FLAC__stream_decoder_finish(IntPtr context);
...
}
The second thing I did was to write an NAudio File Reader adapter that was somewhat similar to the Ogg Vorbis, however it was more complex. libFlac
returns Frames in some size which is not equal to the NAudio buffer size. libFlac
also works with Callbacks which are inherently not as easy as a direct control. The callbacks are synchronous, as libFlac
has no threads, but it is still somewhat cumbersome.
I found an elegant solution to these issues - the FLAC Frame is read into an intermediate samples buffer. Then when NAudio needs new samples to play, first there is an attempt pull samples from the intermediate buffer (if there are such samples available there).
After the buffer samples are used (if at all), and if there are still samples to fill the NAudio playback buffer, then a request is made to libFlac
(through LibFlacSharp
) to get one more FLAC Frame.
This design pattern is really not new, but in context of Practice# it reminds me of playing a bagpipe..so I will call it the Bagpiper Design Pattern
With a real bagpipe, a bagpiper is blowing air into the bag at his will (FLAC frames) but the actual playing (NAudio Playback) is continuous using the air in the bag (Intermediate FLAC Samples Buffer).
UI Design
As mentioned above, I was aiming for a user interface which would be productive and intuitive by being close in spirit to the Boss ME-70. Some other elements that I liked, like the Loop controls and Now buttons, were inspired by the design of BestPractice utility. Minimal menus are used (only for Recent Files, About Form) and only three Modal Dialog (Open File, Preset Description dialogs and About Form) - all other operational aspects of the tool are laid directly on the form. The 4 presets resemble the 4 Boss presets pedals - only one is active at a time. To write to a preset, the Write button (Floppy Disk Icon) has to be clicked - at this point, the LEDs of all presets light in Red waiting for the user to select which preset to write into. Once a preset is clicked, the preset settings are written to a file.
To cancel the Preset Write Mode, simply click the Write button again, and all Preset LEDS will revert to Green (regular mode). While in Preset Write Mode, clicking on another preset essentially acts as a 'Copy Preset' function, because the current preset's settings are going to be written also in the selected preset.
Each audio file automatically gets its own preset file - all these preset files are kept in a user folder %LOCALAPPDATA%\PracticeSharp (e.g. on my Win7 laptop, it is: C:\Users\Yuval\AppData\Local\PracticeSharp). This is good for a few reasons - each file gets its preset persistent without requiring menus, and when a file is re-opened, the correct presets are loaded automatically. If a preset needs to be reset to default values, the user has to click and hold the Eraser icon for a second or more and the current preset would blink a few times in with an orange LED, then all the settings will revert to default. Once again: no menus or modal dialogs.
Implementation - Audio Processing
The heart of Practice# is the PracticeSharpLogic
class. It contains the Audio processing thread which is implemented by ProcessAudio
and does the following things:
- Reads chunks of uncompressed samples from the input file
- Processes the samples using
SoundTouch
- Receives process samples from
SoundTouch
- Plays the processed samples with NAudio
- Handles control logic required for dynamically changing values on-the-fly of: Volume, Loop, Cue and Current Play Position.
- Processes the samples through an Equalizer DSP effect, to put the equalizer into effect
The first important code to notice is the reading of samples from the input file (shown in the image above as first green rectangle from top). This is achieved with NAudio's WaveChannel
class. Note: There is a nice trick I used for converting a float
array to byte
array without actually requiring CPU or memory. The class is ByteAndFloatsConverter
and it is based on this discussion.
#region Read samples from file
lock (m_currentPlayTimeLock)
{
if (m_newPlayTimeRequested)
{
m_waveChannel.CurrentTime = m_newPlayTime;
m_newPlayTimeRequested = false;
}
}
bytesRead = m_waveChannel.Read(convertInputBuffer.Bytes, 0,
convertInputBuffer.Bytes.Length);
#endregion
The second important code to notice is putting the read samples into SoundTouch
, via SoundTouchSharp
, for DSP processing (shown in the image above as the second green rectangle from top). The requested tempo and pitch are set before the call (in SetSoundSharpValues
).
#region Put samples in SoundTouch
SetSoundSharpValues();
m_soundTouchSharp.PutSamples(convertInputBuffer.Floats, (uint)floatsRead);
#endregion
Finally, the third and perhaps most important code receives the processed samples back from SoundTouch
and then puts them in the queued buffered player where they will be played by NAudio (shown in the image above as the third green rectangle from top).
#region Receive & Play Samples
do
{
samplesProcessed = m_soundTouchSharp.ReceiveSamples
(convertOutputBuffer.Floats, outBufferSizeFloats);
if (samplesProcessed > 0)
{
TimeSpan currentBufferTime = m_waveChannel.CurrentTime;
m_inputProvider.AddSamples(convertOutputBuffer.Bytes, 0,
(int)samplesProcessed * sizeof(float) *
format.Channels, currentBufferTime);
while (!m_stopWorker &&
m_inputProvider.GetQueueCount() > BusyQueuedBuffersThreshold)
{
Thread.Sleep(10);
}
bufferIndex++;
}
} while (!m_stopWorker && samplesProcessed != 0);
#endregion
Usage Demonstration
Please see YouTube video.
Source Code
Since CodeProject does not have support for Subversion, I decided to keep the Practice# source code in Google Code. To obtain the latest sources, please check out the latest Practice# sources. To obtain the latest binary, please download the latest Practice# binary.
History
1.6.4:
Released version: 3/20/2013
- SoundTouch library updated to 1.7 (optimizations and fixes by library author)
- NAudio library updated to 1.6 (optimizations and fixes by library author)
- Change in logic: Cue now occurs in front of the loop instead of at the end
- Change: Pitch resolution is now in 1/4 semi-tones, instead of 1/2 semi-tones
- Bug fixes:
- Major issue - http://code.google.com/p/practicesharp/issues/detail?id=10
- Major issue - http://code.google.com/p/practicesharp/issues/detail?id=12
- http://code.google.com/p/practicesharp/source/detail?r=288
- http://code.google.com/p/practicesharp/source/detail?r=290
- http://code.google.com/p/practicesharp/source/detail?r=292
- http://code.google.com/p/practicesharp/issues/detail?id=8
- http://code.google.com/p/practicesharp/issues/detail?id=9
1.5.0:
Released: 3/2/2012
- Fixed random crash/hang when loading recent files - DirectSound ? is not stable in NAudio, moved instead to WaveOut ? (XP) or Wasapi (Vista,7)
- Fixed speed and pitch track bar mouse behavior. Values are now rounded up properly and the ticks are 'sticky'
- Added a 'Show technical log' (F12) use-case. It is useful for viewing & sending the log if things don't work properly for some reason
- Fixed issue when loading files after an existing was loaded. There was a short playback of the old song. (SoundTouch ? buffers not flushed properly and had some left over samples). Not affecting stability, but annoying.
- New feature! Vocal Suppression (AKA Voice Removal or Karaoke), note: works on Stereo files only
1.4.1:
Released: 2/10/2012
- 8 Presets (instead of 4)
- Keyboard shortcuts
- Added a new Use-Case: Preset Quick-Write (Using Ctrl+W)
- Improved look and feel (glass buttons)
- Changed pitch into accurate semi-tones intervals
- Fixed a minor bug with Loop reset to start
1.3.0:
Released: 1/5/2012
- Recompiled with new versions of NAudio and SoundTouch? libraries
- Support for AIFF Playback
- Slight change of Graphics
1.2.0:
Released: 2/9/2011
- Improved slow down playback quality by fine tuning the SoundTouch engine. There is a significant improvement in sound quality when playback is slowed down, in particular for singing/speech parts but also for music.
- Manual settings are now used instead of the default automatic provide by Sound Touch
- Added
TimeStretchProfiles
, to support custom tuning of the SoundTouch
engine - Fixed minor bug with
positionLabel
handling: Clicking on it was not working in Pause mode - Position Reset (Back to start) keyboard changed from Home (used by other TrackBars by default) to F5
1.0.1:
Released: 1/22/2011
- Note: I apologize, but I released a bad 1.0 version, 1.0.1 replaces 1.0.
- Added Wix/MSI setup, with dotNetInstaller boot strapper
- 'Initialized' Status -> Renamed to 'Ready'
- Fix: When application was loaded, the previous file did not show the loop boundaries ("bar"). Only after playing the file, it then showed up.
- Major bug fix: Stopping the audio was not done correctly and caused threading issues (lockups) and/or crashes (was noticeable in particular on slower machines I've tested)