Speech recognition, speech to text, text to speech, and speech synthesis in C#

Thomas Daniels

4.91/5 (147 votes)

8 Dec 2015CPOL9 min read

1.2M

80.2K

This article explains speech recognition, speech to text, text to speech and speech synthesis in C#.

Disclaimer
Introduction
Speech recognition in C#
- Speech recognition
- Speech rejected
Make sure that the computer speaks to you (text to speech)
Emulate speech recognition
SpeechRecognizer vs. SpeechRecognitionEngine
Other techniques on grammar building
Prompt building
Training your speech recognition engine
History

Disclaimer

If the code isn't working for you, then some speech features aren't installed or not enabled. If you don't have a English version of Windows, or non-English speech recognition, then you can use all code from this article, but then you need to change all words into the language of your speech recognizer.

According to MSDN[^], the SpeechRecognitionEngine class is available in .NET 4.5, 4, 3.5, 3.0 and .NET 4 Client Profile, and the supported Windows versions are:

Windows 8
Windows Server 2012
Windows 7
Windows Vista SP2
Windows Server 2008 (Server Core Role not supported)
Windows Server 2008 R2 (Server Core Role supported with SP1 or later; Itanium not supported).
Windows Vista SP1 or later
Windows Server 2008 (Server Core not supported)
Windows Server 2008 R2 (Server Core supported with SP1 or later)
Windows Server 2003 SP2
Windows XP SP2
Windows Server 2008 R2
Windows Server 2008
Windows Server 2003
Windows 98, Windows Server 2000 SP4
Windows CE
Windows Millennium Edition
Windows Mobile for Pocket PC
Windows Mobile for Smartphone
Windows XP Media Center Edition
Windows XP Professional x64 Edition
Windows XP SP2
Windows XP Starter Edition

The italic platforms are only shown on the MSDN page if you change the .NET Framework version on the page (using the "Other Framework" link on the top of the MSDN page). Please note: the SpeechRecognitionEngine class is not available in .NET for Windows Store apps.

Introduction

In this article, I tell you how to program speech recognition, speech to text, text to speech and speech synthesis in C# using the System.Speech library.

Speech recognition in C#

Speech recognition

To create a program with speech recognition in C#, you need to add the System.Speech library. Then, add this using namespace statement at the top of your code file:

using System.Speech.Recognition;
using System.Speech.Synthesis;
using System.Threading;

Then, create an instance of the SpeechRecognitionEngine:

SpeechRecognitionEngine _recognizer = new SpeechRecognitionEngine();

Then, we need to load grammars into the SpeechRecognitionEngine. If you don't do that, the speech recognizer will not recognize phrases. For example, add a grammar with the phrase "test" and we give the grammar the name "testGrammar":

_recognizer.LoadGrammar(new Grammar(new GrammarBuilder("test")) { Name = "testGrammar" }); // load a grammar "test"

Or:

Grammar gr = new Grammar(new GrammarBuilder("test"));
gr.Name = "testGrammar";
_recognizer.LoadGrammar(gr);

If you don't want to give a name to the grammar, do this:

_recognizer.LoadGrammar(new Grammar(new GrammarBuilder("test"))); // load a "test" grammar

Adding a name is only necessary if you want to unload a grammar in your program. To load grammars asynchronous, use the method LoadGrammarAsync. If you want to load a grammar while the recognizer is running, call the RequestRecognizerUpdate method[^] before loading the grammar, and load the grammar(s) in a RecognizerUpdateReached[^] event handler.

Then, add this event handler:

_recognizer.SpeechRecognized += _recognizer_SpeechRecognized;

If the speech is recognized, the method _recognizer_SpeechRecognized will be invoked. So, we need to create the method. What you can do, is when the program recognized the phrase "test", that you write "The test was successful!". To do that, use this:

void _recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
     if (e.Result.Text == "test") // e.Result.Text contains the recognized text
     {
         Console.WriteLine("The test was successful!");
     } 
}

As you can see in the comment line, e.Result.Text contains the recognized text. That's useful if you've more then one grammar. But, the speech recognizer wasn't started. To do that, add this code after the _recognizer.SpeechRecognized += _recognizer_SpeechRecognized line:

_recognizer.SetInputToDefaultAudioDevice(); // set the input of the speech recognizer to the default audio device
_recognizer.RecognizeAsync(RecognizeMode.Multiple); // recognize speech asynchronous

Now, if we merge all methods, we get this:

static void Main(string[] args)
{
     SpeechRecognitionEngine _recognizer = new SpeechRecognitionEngine();
     _recognizer.LoadGrammar(new Grammar(new GrammarBuilder("test")) Name = { "testGrammar" }); // load a grammar
     _recognizer.SpeechRecognized += _recognizer_SpeechRecognized; 
     _recognizer.SetInputToDefaultAudioDevice(); // set the input of the speech recognizer to the default audio device
     _recognizer.RecognizeAsync(RecognizeMode.Multiple); // recognize speech asynchronous
} 
void _recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
     if (e.Result.Text == "test") // e.Result.Text contains the recognized text
     {
         Console.WriteLine("The test was successful!");
     } 
}

If you run that, it will not work. The program will be ended immediately. So, we must ensure that the program does not stop before the speech recognition is completed. We need to create a ManualResetEvent (System.Threading.ManualResetEvent), with the name _completed, and if the speech recognition is completed, we will call the Set method, and then the program will end. I loaded also a "exit" grammar. If the user says "exit", we will call the Set method. Because there're two threads, the Main thread and the speech recognition thread, we can pause the Main thread until the speech recognition thread isn't completed. And after the speech recognition is completed, we dispose the speech recognition engine (can take 3 seconds time at worst, at best 50 milliseconds):

static ManualResetEvent _completed = null;
static void Main(string[] args)
{
     _completed = new ManualResetEvent(false);
     SpeechRecognitionEngine _recognizer = new SpeechRecognitionEngine();
     _recognizer.LoadGrammar(new Grammar(new GrammarBuilder("test")) Name = { "testGrammar" }); // load a grammar
     _recognizer.LoadGrammar(new Grammar(new GrammarBuilder("exit")) Name = { "exitGrammar" }); // load a "exit" grammar
     _recognizer.SpeechRecognized += _recognizer_SpeechRecognized; 
     _recognizer.SetInputToDefaultAudioDevice(); // set the input of the speech recognizer to the default audio device
     _recognizer.RecognizeAsync(RecognizeMode.Multiple); // recognize speech asynchronous
     _completed.WaitOne(); // wait until speech recognition is completed
     _recognizer.Dispose(); // dispose the speech recognition engine
} 
void _recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
     if (e.Result.Text == "test") // e.Result.Text contains the recognized text
     {
         Console.WriteLine("The test was successful!");
     } 
     else if (e.Result.Text == "exit")
     {
         _completed.Set();
     }
}

If you're programming a Windows application, you don't need to create a ManualResetEvent, because the UI thread ends only if the user closes the form.

To unload a grammar, use the method UnloadGrammar in the speech recognition engine, and to unload all grammars use the method UnloadAllGrammars. Don't forget to invoke the method RequestRecognizerUpdate and to load the grammars in a RecognizerUpdateReached event handler if the recognizer is running.
Unloading the "test" grammar for example:

foreach (Grammar gr in _recognizer.Grammars)
{
       if (gr.Name == "testGrammar")
       {
             _recognizer.UnloadGrammar(gr);
             break;
       }
}

Create a grammar and load the grammar like this:

Grammar testGrammar = new Grammar(new GrammarBuilder("test"));
_recognizer.LoadGrammar(testGrammar);

Then, you can unload the grammar like this:
_recognizer.UnloadGrammar(testGrammar);

If you unload a grammar with the second way, then you must ensure that all access modifiers are right. The first way is the easiest way, because if you use the first way, the access modifiers don't matter.

Speech rejected

If you add a SpeechRecognitionRejected event handler to the SpeechRecognitionEngine, you can show candidate phrases found by the speech recognition engine. First, add a SpeechRecognitionRejected event handler:

_recognizer.SpeechRecognitonRejected += _recognizer_SpeechRecognitionRejected;

Then, create the _recognizer_SpeechRecognitionRejected function:

static void _recognizer_SpeechRecognitionRejected(object sender, SpeechRecognitionRejectedEventArgs e)
{
   if (e.Result.Alternates.Count == 0)
   {
     Console.WriteLine("Speech rejected. No candidate phrases found.");
     return;
   }
   Console.WriteLine("Speech rejected. Did you mean:");
   foreach (RecognizedPhrase r in e.Result.Alternates)
   {
    Console.WriteLine("    " + r.Text);
   }
}

This function shows all candidate phrases found by the speech recognition engine if the speech recognition was rejected.

Make sure that the computer speaks to you (text to speech)

In the same library, there's a namespace System.Speech.Synthesis. In that namespace, you'll find a class SpeechSythesizer, and in the class there's a Speak method. Add the namespace add the top of your code file, and then try this:

SpeechSynthesizer _synthesizer = new SpeechSynthesizer();
_synthesizer.Speak("Now the computer is speaking to you.");

If you run the code, the computer says: "Now the computer is talking to you." If you know that, you can use the speech recognition code, but instead of the test grammar use this grammar:

_recognizer.LoadGrammar(new Grammar(new GrammarBuilder("hello computer"))); // load a grammar

And in the _recognizer_SpeechRecognizer method, add this:

void _recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
     if (e.Result.Text == "hello computer") // e.Result.Text contains the recognized text
     {
         SpeechSynthesizer synthesizer = new SpeechSynthesizer();
         synthesizer.Speak("hello user");
         synthesizer.Dispose(); // dispose the SpeechSynthesizer
     }
     _completed.Set();
}

Use SpeechSynthesizer.Dispose to dispose the SpeechSynthesizer. Now, if you say "hello computer", the computer responds "hello user".

Emulate speech recognition

It's also possible to emulate speech recognition with the SpeechRecognitionEngine. You can do that with the EmulateRecognize method, and to do it asynchronous, use the EmulateRecognizeAsync method:

RecognitionResult result = _recognizer.EmulateRecognize("test"); // not asynchronous, this does NOT invoke the _recognizer_SpeechRecognized method, because EmulateRecognize returns a RecognitionResult

_recognizer.EmulateRecognizeAsync("test"); // asynchronous, invokes the _recognizer_SpeechRecognized method; the return type of EmulateRecognizeAsync is 'void'

But a warning: You can't emulate speech recognition if the speech recognition engine is recognizing speech. So, you need to invoke this method before the method RecognizeAsync is invoked. You can also do it if the engine is ready with speech recognition.

SpeechRecognizer vs. SpeechRecognitionEngine

In this article, I used the SpeechRecognitionEngine class. There's also a SpeechRecognizer class. So, what's the difference between the SpeechRecognizer class and the SpeechRecognitionEngine class? If you use the SpeechRecognizer class, you'll see the Windows Speech Recognizer:

If you use the SpeechRecognitionEngine class, you'll not see the Windows Speech Recognizer, the SpeechRecognitionEngine is the engine of a SpeechRecognizer. Also, the SpeechRecognizer class doesn't contain the methods SetInputToDefaultAudioDevice and RecognizeAsync.

Other techniques on grammar building

Choices

If you load more grammars, you can do this (here we load a phrase "dog", "cat" and "snake"):

_recognizer.LoadGrammar(new Grammar(new GrammarBuilder(new Choices("dog","cat","snake"))) { Name = "animalGrammar" });

Advantages:

The code is easier to read.
The UnloadAllGrammars function is faster.

Disadvantages:

If you unload a single grammar, you unload more then one phrase.

You can also combine both ways to load grammars. For example you can load phrases like "dog", "cat", "snake" in a single grammar using Choices, because these are animals. But if you want to unload a single phrase, build only grammars with a single phrase. Instead of passing all phrases as parameters, we can use the Add method:

Choices animalChoices = new Choices();
animalChoices.Add("dog");
animalChoices.Add("cat");
animalChoices.Add("snake");

Or:

Choices animalChoices = new Choices();
animalChoices.Add("dog", "cat", "snake");

Choices and GrammarBuilder.Append

It's possible that you want to load complete phrases like "I like dogs", "I dislike dogs", "I like cats", "I dislike cats", ... It's not a good idea to load all phrases separately. Using the GrammarBuilder.Append method, we can append Choices to the grammar builder:

SpeechRecognitionEngine _recognizer = new SpeechRecognitionEngine();
GrammarBuilder grammarBuilder = new GrammarBuilder();
grammarBuilder.Append("I"); // add "I"
grammarBuilder.Append(new Choices("like", "dislike")); // load "like" & "dislike"
grammarBuilder.Append(new Choices("dogs", "cats", "birds", "snakes", 
   "fishes", "tigers", "lions", "snails", "elephants")); // add animals
_recognizer.LoadGrammar(new Grammar(grammarBuilder)); // load grammar
_recognizer.SpeechRecognized += _recognizer_SpeechRecognized;
_recognizer.SetInputToDefaultAudioDevice(); // set input to default audio device
_recognizer.RecognizeAsync(RecognizeMode.Multiple); // recognize speech

If the user says "I like dogs", _recognizer_SpeechRecognized will be called. It will be called also if the user says "I like cats", "I like birds", "I dislike snails", ... Now, we can create the _recognizer_SpeechRecognized function. If the user says "I like cats", then "Do you really like cats?" is shown on the console, and if the user says "I dislike cats", then "Do you really dislike cats?" is shown on the console. e.Result.Words[0].Text is the first spoken word:

static void speechRecognitionWithChoices_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
     Console.WriteLine("Do you really " + e.Result.Words[1].Text + 
             " " + e.Result.Words[2].Text + "?");
     manualResetEvent.Set();
}

Dictation: to recognize ALL speech

If you use a DictationGrammar, your program will recognize all speech using the Windows Desktop Speech technology. You can add a DictationGrammar and a "exit" grammar:

SpeechRecognitionEngine _recognizer = new SpeechRecognitionEngine();
_recognizer.LoadGrammar(new Grammar(new GrammarBuilder("exit")));
_recognizer.LoadGrammar(new DictationGrammar());
_recognizer.SpeechRecognized += _recognizer_SpeechRecognized;
_recognizer.SetInputToDefaultAudioDevice(); // set input to default audio device
_recognizer.RecognizeAsync(RecognizeMode.Multiple); // recognize speech

And the _recognizer_SpeechRecognized method:

static void _recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
    if (e.Result.Text == "exit")
    {
        manualResetEvent.Set();
        return;
    }
    Console.WriteLine("You said: " + e.Result.Text);
}

new DictationGrammar() returns an instance of the standard dictation grammar provided by Windows Desktop Speech technology.

Prompt building

Using a System.Speech.Synthesis.PromptBuilder, you can build prompt for the SpeechSynthesizer. You can add breaks, styles, sentences ... using the PromptBuilder.
Using the StartSentence and EndSentence method, you can indicate the start and the end of a sentence:

PromptBuilder builder = new PromptBuilder();

builder.StartSentence();
builder.AppendText("This is a sentence.");
builder.EndSentence();

SpeechSynthesizer synthesizer = new SpeechSynthesizer();
synthesizer.Speak(builder);
synthesizer.Dispose();

Using the AppendBreak method, you can append a break:

PromptBuilder builder = new PromptBuilder();

builder.StartSentence();
builder.AppendText("This is a sentence.");
builder.EndSentence();

builder.AppendBreak(new TimeSpan(0, 0, 1)); // a break of 1 second

builder.StartSentence();
builder.AppendText("This is another sentence.");
builder.EndSentence();

SpeechSynthesizer synthesizer = new SpeechSynthesizer();
synthesizer.Speak(builder);
synthesizer.Dispose();

Using the StartStyle and EndStyle method, you can indicate the style in the PromptBuilder (for example: loud, fast)

PromptBuilder builder = new PromptBuilder();

builder.StartStyle(new PromptStyle(PromptRate.Fast));
builder.AppendText("This text is spoken fast.");
builder.EndStyle();

builder.StartStyle(new PromptStyle(PromptVolume.ExtraSoft));
builder.AppendText("This text is spoken extra soft.");
builder.EndStyle();

SpeechSynthesizer synthesizer = new SpeechSynthesizer();
synthesizer.Speak(builder);
synthesizer.Dispose();

Using the StartVoice and EndVoice method, you can indicate the voice, if installed

PromptBuilder builder = new PromptBuilder();

builder.StartVoice(VoiceGender.Male, VoiceAge.Child);
builder.AppendText("This is a male child voice, if installed.");
builder.EndVoice();

SpeechSynthesizer synthesizer = new SpeechSynthesizer();
synthesizer.Speak(builder);
synthesizer.Dispose();

On my computer, there's just one voice installed. So if I try another voice using the StartVoice method, then I don't get another voice.

Training your speech recognition engine

This question is asked frequently in comments: how to train your speech recognition engine? From code, it is impossible, unfortunately. But you can train it through Windows Speech Recognition:

Open Control Panel
Go to Ease of Access
Choose Speech Recognition
Then choose Train your computer to better understand you

Then you'll see this form:
Press Next and then the training begins. Speak the sentences aloud:

History

8 Dec 2015: Fixed a bug related to RequestRecognizerUpdate as pointed out by George I. Birbilis
26 Mar 2014: Fixed problem with no-exe zip.
24 Mar 2014: Updated info about RequestRecognizerUpdate().
1 Mar 2014: Added Training your speech recognition engine
12 Jun 2013: Emulate speech recognition updated
2 Apr 2013: Prompt building added
18 Jan 2013: Bug fixed, and VB.NET downloads added
16 Jan 2013: To recognize ALL speech added, Table of Contents added
5 Jan 2013: Disclaimer updated, additional information added in the Make sure that the computer speaks to you paragraph, and a bug in the download files fixed
1 Jan 2013: Disclaimer updated
27 Dec 2012: Another technique on grammar building renamed to Other techniques on grammar building, and Choices and GrammarBuilder.Append added to Other techniques on grammar building.
20 Dec 2012: Another technique on grammar building and Speech rejected paragraph added and additional information added in the Speech recognition in C# paragraph
13 Dec 2012: Disclaimer updated
18 Nov 2012: I updated the SpeechRecognizer vs. SpeechRecognitionEngine paragraph
16 Nov 2012: SpeechRecognizer vs. SpeechRecognitionEngine paragraph added
27 Oct 2012: This is my second version of the article. I added the download files (it was suggested by Sandeep Mewara). I solved a little bug, and I added additional information at the Emulate speech recognition paragraph
27 Oct 2012: First version.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)