Introduction
Text-to-speech in an ASP.NET MVC website - this tip shows how to setup a website to generate a text-to-speech MP3, then stream it for a browser client using HTML 5 audio controls.
Background
Finding a good text-to-speech implementation in ASP.NET was rather difficult for the requirements of my project. I was able to find enough forums and documentation to assemble a simple solution to generate text to speech MP3 audio to a browser client. The voice generator comes from the .NET Microsoft Speech Synthesizer. A WAV audio stream is created that is then passed through the Naudio Lame framework to be converted to a MP3 stream. Why a MP3 format versus the standard WAV file? MP3s are smaller in file size and play nicer with most modern browser clients.
Using the Code
I recommend downloading and running the project code attached to this tip to see a working example.
Prerequisites
- Using IIS 7.5 or newer (It's also been tested on IIS 10 express)
- Using application in integrated mode
- Application pool identity of website needs to be Local System
- MVC3 or newer
- Reference
System.Speech
- Nuget packages:
Place the proper references in the home controller.
using NAudio.Lame;
using NAudio.Wave;
using System;
using System.Globalization;
using System.IO;
using System.Speech.AudioFormat;
using System.Speech.Synthesis;
using System.Threading;
using System.Web;
using System.Web.Mvc;
Place the following method called TextToMp3
in the home controller. The only input is the text in which to be converted. The text gets converted to a WAV stream using the Microsoft speech synthesizer. The WAV stream is then converted to Mp3 stream using Naudio.Lame
framework. The result is returned in bytes, as a FileResult
, to the browser client. It can then be played via html5 audio controls.
public FileResult TextToMp3(string text)
{
var mp3Stream = new MemoryStream();
var speechAudioFormatConfig = new SpeechAudioFormatInfo
(samplesPerSecond: 8000, bitsPerSample: AudioBitsPerSample.Sixteen,
channel: AudioChannel.Stereo);
var waveFormat = new WaveFormat(speechAudioFormatConfig.SamplesPerSecond,
speechAudioFormatConfig.BitsPerSample, speechAudioFormatConfig.ChannelCount);
try
{
var prompt = new PromptBuilder
{ Culture = CultureInfo.CreateSpecificCulture("en-US") };
prompt.StartVoice(prompt.Culture);
prompt.StartSentence();
prompt.StartStyle(new PromptStyle()
{ Emphasis = PromptEmphasis.Reduced, Rate = PromptRate.Slow });
prompt.AppendText(text);
prompt.EndStyle();
prompt.EndSentence();
prompt.EndVoice();
using (var synthWavMs = new MemoryStream())
{
var resetEvent = new ManualResetEvent(false);
ThreadPool.QueueUserWorkItem(arg =>
{
try
{
var siteSpeechSynth = new SpeechSynthesizer();
siteSpeechSynth.SetOutputToAudioStream
(synthWavMs, speechAudioFormatConfig);
siteSpeechSynth.Speak(prompt);
}
catch (Exception ex)
{
Response.AddHeader
("EXCEPTION", ex.GetBaseException().ToString());
}
finally
{
resetEvent.Set(); }
});
WaitHandle.WaitAll(new WaitHandle[] { resetEvent });
var bitRate = (speechAudioFormatConfig.AverageBytesPerSecond * 8);
synthWavMs.Position = 0;
using (var mp3FileWriter = new LameMP3FileWriter
(outStream: mp3Stream, format: waveFormat, bitRate: bitRate))
synthWavMs.CopyTo(mp3FileWriter);
}
}
catch (Exception ex)
{
Response.AddHeader("EXCEPTION", ex.GetBaseException().ToString());
}
finally
{
Response.Cache.SetExpires(DateTime.UtcNow.AddMinutes(-1));
Response.Cache.SetCacheability(HttpCacheability.NoCache);
Response.Cache.SetNoStore();
Response.AppendHeader("Accept-Ranges", "bytes");
Response.AddHeader("Content-Length",
mp3Stream.Length.ToString(CultureInfo.InvariantCulture));
}
return File(mp3Stream.ToArray(), "audio/mp3");
}
The Naudio Lame DLL files need to be loaded into memory on application start. The code below will need to be added to the global.aspx.cs file.
public static void CheckAddBinPath()
{
var binPath = Path.Combine(new string[]
{ AppDomain.CurrentDomain.BaseDirectory, "bin" });
var path = Environment.GetEnvironmentVariable("PATH") ?? "";
if (!path.Split(Path.PathSeparator).Contains(binPath, StringComparer.CurrentCultureIgnoreCase))
{
path = string.Join(Path.PathSeparator.ToString
(CultureInfo.InvariantCulture), new string[] { path, binPath });
Environment.SetEnvironmentVariable("PATH", path);
}
}
In the same file, the Application_Start
method should look like below with the CheckAddBinPath
added to the bottom of the method.
protected void Application_Start()
{
AreaRegistration.RegisterAllAreas();
FilterConfig.RegisterGlobalFilters(GlobalFilters.Filters);
RouteConfig.RegisterRoutes(RouteTable.Routes);
BundleConfig.RegisterBundles(BundleTable.Bundles);
CheckAddBinPath();
}
Example of Use
On the home view, add the following HTML, JavaScript, and jQuery code.
<label for="inputText">Type it!</label><br />
<textarea id="inputText" class="form-control"
rows="5" style="width:100%;"></textarea><br />
<button id="playAudio" type="button"
class="btn btn-primary btn-lg btn-block">Say it!</button>
<div id="divAudio_Player" class="hidden">
<audio id="audio_player">
<source id="audio_player_wav" src="@Url.Action
("PlayTextArea", "Home",
new { text = "type something in first" })" type="audio/mp3" />
<embed height="50" width="100"
src="@Url.Action("PlayTextArea", "Home",
new { text = "type something in first" })">
</audio>
</div>
$(function () {
$('#playAudio').click(function () {
var newUrl = '@Url.Action("PlayTextArea", "Home")?text='+
encodeURIComponent($('#inputText').text()) + '×tamp=' + new Date().getTime();
var new_audio = $(this).attr('rel');
var source = '<audio id="audio_player">';
source += '<source id="audio_player_wav" src="' + newUrl + '" type="audio/mp3" />';
source += '</audio>';
setTimeout(function() {
$('#divAudio_Player').html(source);
var aud = $('#audio_player').get(0);
aud.play();
}, 500);
});
});
Add the following ActionResult
to the home controller that will be used in this example:
public ActionResult PlayTextArea(string text)
{
if (String.IsNullOrEmpty(text)) {
text = "Type something in first";
}
return TextToMp3(text);
}
Run the project, type something in, and click "Say it!
".
Points of Interest
Making any application speak has always been of interest to me. It adds usefulness as an application.
Know Issues
- High CPU usage does occur when converting a WAV memory stream to MP3 stream.
- Application identity would be preferred as a user in IIS but speech synthesizer needs to have a user profile.
References