SFX brings audio and MIDI capabilities to your IoT devices. Do simple sounds and play wavs and tones, or get into the weeds with the advanced features of SFX.
Introduction
Audio is not a subject you typically hear or see in the same paragraph as IoT. The fact is, IoT devices just aren't generally geared for that sort of application. That being said, there are plenty of uses for simple audio as part of a complete user experience on your IoT device. There is also something to be said for generating MIDI messages and a MIDI clock using an IoT device, so that you can make musical gadgets for actual artists.
SFX allows for these features, eschewing some of the "fun but useless" features of other audio libraries, like playing MP3s. At the same time, it provides advanced capabilities like mixing different audio streams or allowing you to create your own audio targets, such as sound sources and drivers.
Why no MP3 support? It's simple, really. These devices don't sound great. They are not MP3 players unless you start throwing hardware at them. Once you start doing that, you quickly find that there are hardware MP3 modules out there, which are far better suited for this application than doing it in firmware, on say an ESP32 would be.
Why even make an audio library in the first place? It's actually not well covered territory from what I've seen. There are a few major libraries, like ESP8266Audio, but they are licensed GPL and they take a lot of firmware space, and increase compile times and upload times, primarily because they support things like MP3 and AAC decoding. They also don't do MIDI very well, if at all.
After seeing what was out there, I wanted more in some areas, and less in others than what was being presented to me. SFX aims to bring more to the table in terms of balanced featureset, all behind a modular, consistent API.
Note: This library is still a work in progress, and there are probably bugs, features that need implementing, and interfaces that will change. It's usable now, but the API isn't set in stone yet.
We'll be covering the basics of audio in this article. MIDI deserves its own treatment.
Understanding this Mess
Audio Targets
Audio targets either produce or consume audio data. Currently, the only format supported is PCM, so all audio data is PCM encoded, like a wav file is. There are two kinds of audio targets: audio sources, and audio destinations. A source produces audio data, and a destination consumes it.
Every audio target reports its sample rate, bit depth, channels, and format. The bit depth is currently limited to 8 or 16 and the channels are limited to 1 or 2. The preferred figures are 44100Hz, 16-bit, 2-channel. When you use two or more audio targets together, they must have the same bit depth, channels, and format. The sample rate can be different but no resampling is done so the speed of the playback will be affected.
Audio sources support a read()
method which reads a number of samples from the source. Audio destinations support a write()
method which writes a number of samples to the destination.
Currently, audio sources include a wav_file_source
, a mixer_source<>
, a waveform_source<>
, and a silence_source<>
. Each of these generates audio data.
There are currently no audio destinations included with SFX. However, codewitch-honey-crisis/htcw_i2s_audio implements audio destinations on the ESP32 for both external I2S modules, and the internal I2S that is connected to the ESP32s internal DAC. Including the above in the lib_deps
of your platformio.ini file will also include codewitch-honey-crisis/htcw_sfx. Including SFX explicitly is not necessary.
wav_file_source
This source takes an open stream
to a WAV file and after reading the header, keeps the stream open and fetches PCM data out of it on request, optionally wrapping the read operations so it loops continuously.
mixer_source<>
The mixer source takes up to the specified number of voices, represented by audio sources, and mixes them together into a single audio source. You can change each individual voice()
and level()
of each voice. The voices can be null in which case they will not be mixed.
waveform_source<>
This source generates a waveform at the specified frequency()
and amplitude()
and of the specified shape()
.
silence_source<>
Silence! This source generates silence, which is useful to keep pumping the output driver with something so it doesn't stutter and glitch like it will if you simply stop feeding it.
The Transport
The transport
class is responsible for moving audio data from a source()
to a destination()
. This "drives" the sound. Being cooperatively multitasked, it must be pumped repeatedly using update()
frequently enough to keep the destination's audio buffer from emptying. You would use this class to effective "connect" an audio source to an audio destination and then pump it with update()
as mentioned.
The Performer<>
Wiring up sources and destinations is nice when you need the flexibility, but when it comes to just playing sound as part of the user experience, it can be overkill. Writing all that code will give you RSI. Rather than refer you to a good pain specialist, I've provided a class that handles all of the gritty details and provides a simple to use interface to play sound. That's what the performer<>
class is. You feed it the number of voices as a template parameter and it manages allocating and freeing voices for you. You can simply play a wav()
a shape()
or some other source()
and it will give you an integer handle back. When you want to stop the sound, you can call stop()
passing the handle, or with no argument to stop all sound.
Coding this Mess
Currently, you either need Platform IO to use this, or you would need to copy SFX, the driver, and all dependent libraries into your Arduino library folder. I don't go out of my way to support the Arduino IDE because frankly, you just can't do that much with it. This will run with it, but it's not easy to set it up the first time.
I'm going to assume an ESP32 because I haven't written drivers for anything else yet. You either need an external I2S module like a MAX98357A, or some speaker wired to pin 25, and maybe one on 26. You can't remap those pins, not because of my library, but because of a limitation of the I2S system on the ESP32.
Add the following lines to your project's platformio.ini:
lib_ldf_mode = deep; always include this
lib_deps=
codewitch-honey-crisis/htcw_i2s_audio
That will include the I2S driver which also includes SFX.
In your main.cpp, you can add the following:
#include <i2s_internal.hpp> // for internal I2S
#include <i2s_external.hpp> // for an external module
#include <sfx.hpp>
using namespace arduino;
using namespace sfx;
Now you need to instantiate the driver. Typically, you'll simply need the channel configuration. This will depend on whether you're using the 2nd pin (26) for the internal driver or not, or whether you're using two external I2S modules to drive two speakers instead of simply one. We'll create an external one using on the left channel below:
i2s_internal<i2s_channels::left> i2s_output;
Basics
The easiest way to get sound going is to create a performer<>
object with how many voices you need, giving it the audio destination you want to send to.
performer<4> perform(i2s_output);
Note that this class is still a work in progress and it is rough around the edges but will be improving. It can take a significant amount of stack depending on the number of voices you want, so you should probably declare it as a global or otherwise on the heap. It's best to declare maybe one more voice than you think you'll actually use to give it some wiggle room. For some reason, it likes that better. This should change when I get the kinks worked out.
Finally, it's time to play some stuff.
There are a couple of steps to loading a wav file under Arduino with SFX.
File file;
file_stream file_stm(file);
You may have to declare these in the global scope, and then open the file in setup()
. After you do that, call file_stm.set(file);
to reestablish the connection with the now valid file object. So after the above, in setup()
you would do like this:
file = SPIFFS.open("/hello.wav","rb");
file_stm.set(file);
The reason you may have to declare it globally is the performer needs access to the stream while it reads, so the stream must remain valid for the entire time the performer is playing it.
Now finally, we can play the wav:
int handle = perform.wav(file_stm,.3333,true);
If you want to stop it, you can call stop with the handle you got back.
perform.stop(handle);
If you want to play a waveform it's even easier, since there's no file to set up:
handle = perform.shape(440,.33333,waveform_shape::square);
And again, you can stop it with stop(handle);
.
You'll note that both of these will play simultaneously, and will continue to play until we call stop()
. If we play too many voices at once, the older voices will be replaced with newer voices.
None of this will actually do anything unless we pump the performer<>
with update()
. You typically do that in loop()
, but you'll also need to do it in your own loops to keep the sound playing:
perform.update();
More Advanced Stuff
The performer<>
template class hides a lot of complexity behind a simple to use construct. However, it might not do exactly what you want how you want. As covered at the beginning of the article, there are several classes that make up the audio engine of SFX, like transport
and the various audio sources. The performer orchestrates these in concert if you'll forgive the expression, in order to provide a seamless and easy programming interface, at the expense of flexibility. You can use these classes yourself.
transport
The transport class simply feeds an audio source into a destination. The simplest configuration that will produce audio in SFX is connecting an audio source and an audio destination to a transport. The source and destination must have the same audio format and sample footprint. No conversion or resampling is done. You must call the static transport::create()
method to create a valid, initialized instance.
waveform_source wfrm;
wfrm.frequency(1000); wfrm.amplitude(.5); wfrm.shape(waveform_shape::triangle);
transport trans;
sfx_result errcode = transport::create(i2s_output,wfrm);
if(errcode!=sfx_result::success) {
Serial.println("Error creating transport");
while(true);
}
while(trans.update()==sfx_result::success);
You can set the source()
or destination()
at any time, and it will take effect the next time update()
is called.
mixer_source
You may need to play multiple sources at one time. To do so requires mixing the sound together. The mixer_source<>
handles that for you. You specify the maximum number of voices to mix as the first template argument. We can modify the above code to demonstrate multiple tones playing at once:
waveform_source<> wfrm1;
wfrm1.frequency(1000); wfrm1.amplitude(.5); wfrm1.shape(waveform_shape::triangle);
waveform_source<> wfrm2;
wfrm2.frequency(500); wfrm2.amplitude(.25); wfrm2.shape(waveform_shape::sine);
mixer_source<2> mixer;
sfx_result errcode = mixer_source<2>::create(&mixer);
if(errcode!=sfx_result::success) {
Serial.println("Error creating mixer");
while(true);
}
mixer.voice(0,&wfrm1);
mixer.voice(1,&wfrm2);
transport trans;
errcode = transport::create(i2s_output,mixer);
if(errcode!=sfx_result::success) {
Serial.println("Error creating transport");
while(true);
}
while(trans.update()==sfx_result::success);
wav_file_source
We've already kind of covered this. The stuff about initializing a stream
and keeping the stream
around in the global scope from above also applies here. The stream
must be valid for the entire time that the wav_file_source
is being read from.
Other than that, creating and playing a wav_file_source
is similar to creating other sources:
File file = SD.open("/demo.wav","rb");
file_stream fs(file);
wav_file_source wav;
sfx_result errcode = wav_file_source::open(fs,&wav);
if(errcode!=sfx_result::success) {
Serial.println("Error opening WAV");
while(true);
}
wav.loop(true);
transport trans;
errcode = transport::create(i2s_output,wav);
if(errcode!=sfx_result::success) {
Serial.println("Error creating transport");
while(true);
}
while(trans.update()==sfx_result::success);
waveform_source<>
This class was basically already covered above. It's so simple to use it really doesn't deserve further exploration.
silence_source<>
Like the above, this source is so simple it doesn't need much of an explanation. What might be confusing is when to use it. If you simply stop feeding a driver, it won't stop the sound. The sound will loop whatever was last in its internal "DMA buffer" sounding glitchy and terrible until you feed it something else. This source allows you to feed silence to a driver.
Still to be Done
I still need to do more testing. There's a known bug with the voice management in performer<>
that I'm working on next. It drops voices when it shouldn't, and sometimes won't play a new voice. I need to add the ability to convert from stereo to mono or vice versa, the ability to downsample and upsample to different bit depths, and maybe the ability to resample, although I don't know if that last bit is realistic.
What the Heck Happened Here?
I spent a week running down how to talk to the ESP32's I2S hardware effectively. Everything else was easy by comparison. I did learn a little about audio so there's that. The actual mixer I thought worked for a day. Turns out it only worked in one narrow circumstance so that's fixed now. The performer took an inordinate amount of time, and sent me pulling my hair out because I thought it was nulling my data, but it was mixer that was at fault. In the end, there was a lot of fighting with my code to get it to do what I meant rather than what I was telling it.
History
- 25th July, 2022 - Initial submission