If the code isn't working for you, then some speech features aren't installed or not enabled. If you don't have a English version of Windows, or non-English speech recognition, then you can use all code from this article, but then you need to change all words into the language of your speech recognizer.
According to MSDN[^], the SpeechRecognitionEngine
class is available in .NET 4.5, 4, 3.5, 3.0 and .NET 4 Client Profile, and the supported Windows versions are:
- Windows 8
- Windows Server 2012
- Windows 7
- Windows Vista SP2
- Windows Server 2008 (Server Core Role not supported)
- Windows Server 2008 R2 (Server Core Role supported with SP1 or later; Itanium not supported).
- Windows Vista SP1 or later
- Windows Server 2008 (Server Core not supported)
- Windows Server 2008 R2 (Server Core supported with SP1 or later)
- Windows Server 2003 SP2
- Windows XP SP2
- Windows Server 2008 R2
- Windows Server 2008
- Windows Server 2003
- Windows 98, Windows Server 2000 SP4
- Windows CE
- Windows Millennium Edition
- Windows Mobile for Pocket PC
- Windows Mobile for Smartphone
- Windows XP Media Center Edition
- Windows XP Professional x64 Edition
- Windows XP SP2
- Windows XP Starter Edition
The italic platforms are only shown on the MSDN page if you change the .NET Framework version on the page (using the "Other Framework" link on the top of the MSDN page). Please note: the SpeechRecognitionEngine
class is not available in .NET for Windows Store apps.
In this article, I tell you how to program speech recognition, speech to text, text to speech and speech synthesis in C# using the System.Speech
library.
Speech recognition
To create a program with speech recognition in C#, you need to add the System.Speech library. Then, add this using
namespace statement at the top of your code file:
using System.Speech.Recognition;
using System.Speech.Synthesis;
using System.Threading;
Then, create an instance of the SpeechRecognitionEngine
:
SpeechRecognitionEngine _recognizer = new SpeechRecognitionEngine();
Then, we need to load grammars into the SpeechRecognitionEngine
. If you don't do that, the speech recognizer will not recognize phrases. For example, add a grammar with the phrase "test" and we give the grammar the name "testGrammar":
_recognizer.LoadGrammar(new Grammar(new GrammarBuilder("test")) { Name = "testGrammar" });
Or:
Grammar gr = new Grammar(new GrammarBuilder("test"));
gr.Name = "testGrammar";
_recognizer.LoadGrammar(gr);
If you don't want to give a name to the grammar, do this:
_recognizer.LoadGrammar(new Grammar(new GrammarBuilder("test")));
Adding a name is only necessary if you want to unload a grammar in your program. To load grammars asynchronous, use the method LoadGrammarAsync
. If you want to load a grammar while the recognizer is running, call the RequestRecognizerUpdate method[^] before loading the grammar, and load the grammar(s) in a RecognizerUpdateReached[^] event handler.
Then, add this event handler:
_recognizer.SpeechRecognized += _recognizer_SpeechRecognized;
If the speech is recognized, the method _recognizer_SpeechRecognized
will be invoked. So, we need to create the method. What you can do, is when the program recognized the phrase "test", that you write "The test was successful!". To do that, use this:
void _recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
if (e.Result.Text == "test")
{
Console.WriteLine("The test was successful!");
}
}
As you can see in the comment line, e.Result.Text
contains the recognized text. That's useful if you've more then one grammar. But, the speech recognizer wasn't started. To do that, add this code after the _recognizer.SpeechRecognized += _recognizer_SpeechRecognized
line:
_recognizer.SetInputToDefaultAudioDevice();
_recognizer.RecognizeAsync(RecognizeMode.Multiple);
Now, if we merge all methods, we get this:
static void Main(string[] args)
{
SpeechRecognitionEngine _recognizer = new SpeechRecognitionEngine();
_recognizer.LoadGrammar(new Grammar(new GrammarBuilder("test")) Name = { "testGrammar" });
_recognizer.SpeechRecognized += _recognizer_SpeechRecognized;
_recognizer.SetInputToDefaultAudioDevice();
_recognizer.RecognizeAsync(RecognizeMode.Multiple);
}
void _recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
if (e.Result.Text == "test")
{
Console.WriteLine("The test was successful!");
}
}
If you run that, it will not work. The program will be ended immediately. So, we must ensure that the program does not stop before the speech recognition is completed. We need to create a ManualResetEvent
(System.Threading.ManualResetEvent
), with the name _completed
, and if the speech recognition is completed, we will call the Set
method, and then the program will end. I loaded also a "exit"
grammar. If the user says "exit", we will call the Set method. Because there're two threads, the Main thread and the speech recognition thread, we can pause the Main thread until the speech recognition thread isn't completed. And after the speech recognition is completed, we dispose the speech recognition engine (can take 3 seconds time at worst, at best 50 milliseconds):
static ManualResetEvent _completed = null;
static void Main(string[] args)
{
_completed = new ManualResetEvent(false);
SpeechRecognitionEngine _recognizer = new SpeechRecognitionEngine();
_recognizer.LoadGrammar(new Grammar(new GrammarBuilder("test")) Name = { "testGrammar" });
_recognizer.LoadGrammar(new Grammar(new GrammarBuilder("exit")) Name = { "exitGrammar" });
_recognizer.SpeechRecognized += _recognizer_SpeechRecognized;
_recognizer.SetInputToDefaultAudioDevice();
_recognizer.RecognizeAsync(RecognizeMode.Multiple);
_completed.WaitOne();
_recognizer.Dispose();
}
void _recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
if (e.Result.Text == "test")
{
Console.WriteLine("The test was successful!");
}
else if (e.Result.Text == "exit")
{
_completed.Set();
}
}
If you're programming a Windows application, you don't need to create a ManualResetEvent
, because the UI thread ends only if the user closes the form.
To unload a grammar, use the method UnloadGrammar
in the speech recognition engine, and to unload all grammars use the method UnloadAllGrammars
. Don't forget to invoke the method RequestRecognizerUpdate
and to load the grammars in a RecognizerUpdateReached
event handler if the recognizer is running.
Unloading the "test" grammar for example:
foreach (Grammar gr in _recognizer.Grammars)
{
if (gr.Name == "testGrammar")
{
_recognizer.UnloadGrammar(gr);
break;
}
}
- Create a grammar and load the grammar like this:
Grammar testGrammar = new Grammar(new GrammarBuilder("test"));
_recognizer.LoadGrammar(testGrammar);
- Then, you can unload the grammar like this:
- _recognizer.UnloadGrammar(testGrammar);
If you unload a grammar with the second way, then you must ensure that all access modifiers are right. The first way is the easiest way, because if you use the first way, the access modifiers don't matter.
If you add a SpeechRecognitionRejected
event handler to the SpeechRecognitionEngine
, you can show candidate phrases found by the speech recognition engine. First, add a SpeechRecognitionRejected
event handler:
_recognizer.SpeechRecognitonRejected += _recognizer_SpeechRecognitionRejected;
Then, create the _recognizer_SpeechRecognitionRejected
function:
static void _recognizer_SpeechRecognitionRejected(object sender, SpeechRecognitionRejectedEventArgs e)
{
if (e.Result.Alternates.Count == 0)
{
Console.WriteLine("Speech rejected. No candidate phrases found.");
return;
}
Console.WriteLine("Speech rejected. Did you mean:");
foreach (RecognizedPhrase r in e.Result.Alternates)
{
Console.WriteLine(" " + r.Text);
}
}
This function shows all candidate phrases found by the speech recognition engine if the speech recognition was rejected.
In the same library, there's a namespace System.Speech.Synthesis
. In that namespace, you'll find a class SpeechSythesizer
, and in the class there's a Speak
method. Add the namespace add the top of your code file, and then try this:
SpeechSynthesizer _synthesizer = new SpeechSynthesizer();
_synthesizer.Speak("Now the computer is speaking to you.");
If you run the code, the computer says: "Now the computer is talking to you." If you know that, you can use the speech recognition code, but instead of the test grammar use this grammar:
_recognizer.LoadGrammar(new Grammar(new GrammarBuilder("hello computer")));
And in the _recognizer_SpeechRecognizer
method, add this:
void _recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
if (e.Result.Text == "hello computer")
{
SpeechSynthesizer synthesizer = new SpeechSynthesizer();
synthesizer.Speak("hello user");
synthesizer.Dispose();
}
_completed.Set();
}
Use SpeechSynthesizer.Dispose
to dispose the SpeechSynthesizer
. Now, if you say "hello computer", the computer responds "hello user".
It's also possible to emulate speech recognition with the SpeechRecognitionEngine
. You can do that with the EmulateRecognize
method, and to do it asynchronous, use the EmulateRecognizeAsync
method:
RecognitionResult result = _recognizer.EmulateRecognize("test");
_recognizer.EmulateRecognizeAsync("test");
But a warning: You can't emulate speech recognition if the speech recognition engine is recognizing speech. So, you need to invoke this method before the method RecognizeAsync
is invoked. You can also do it if the engine is ready with speech recognition.
In this article, I used the SpeechRecognitionEngine
class. There's also a SpeechRecognizer
class. So, what's the difference between the SpeechRecognizer
class and the SpeechRecognitionEngine
class? If you use the SpeechRecognizer
class, you'll see the Windows Speech Recognizer:
If you use the SpeechRecognitionEngine
class, you'll not see the Windows Speech Recognizer, the SpeechRecognitionEngine
is the engine of a SpeechRecognizer
. Also, the SpeechRecognizer
class doesn't contain the methods SetInputToDefaultAudioDevice
and RecognizeAsync
.
If you load more grammars, you can do this (here we load a phrase "dog", "cat" and "snake"):
_recognizer.LoadGrammar(new Grammar(new GrammarBuilder(new Choices("dog","cat","snake"))) { Name = "animalGrammar" });
Advantages:
- The code is easier to read.
- The
UnloadAllGrammars
function is faster.
Disadvantages:
- If you unload a single grammar, you unload more then one phrase.
You can also combine both ways to load grammars. For example you can load phrases like "dog", "cat", "snake" in a single grammar using Choices
, because these are animals. But if you want to unload a single phrase, build only grammars with a single phrase. Instead of passing all phrases as parameters, we can use the Add
method:
Choices animalChoices = new Choices();
animalChoices.Add("dog");
animalChoices.Add("cat");
animalChoices.Add("snake");
Or:
Choices animalChoices = new Choices();
animalChoices.Add("dog", "cat", "snake");
It's possible that you want to load complete phrases like "I like dogs", "I dislike dogs", "I like cats", "I dislike cats", ... It's not a good idea to load all phrases separately. Using the GrammarBuilder.Append
method, we can append Choices
to the grammar builder:
SpeechRecognitionEngine _recognizer = new SpeechRecognitionEngine();
GrammarBuilder grammarBuilder = new GrammarBuilder();
grammarBuilder.Append("I");
grammarBuilder.Append(new Choices("like", "dislike"));
grammarBuilder.Append(new Choices("dogs", "cats", "birds", "snakes",
"fishes", "tigers", "lions", "snails", "elephants"));
_recognizer.LoadGrammar(new Grammar(grammarBuilder));
_recognizer.SpeechRecognized += _recognizer_SpeechRecognized;
_recognizer.SetInputToDefaultAudioDevice();
_recognizer.RecognizeAsync(RecognizeMode.Multiple);
If the user says "I like dogs", _recognizer_SpeechRecognized
will be called. It will be called also if the user says "I like cats", "I like birds", "I dislike snails", ... Now, we can create the _recognizer_SpeechRecognized
function. If the user says "I like cats", then "Do you really like cats?" is shown on the console, and if the user says "I dislike cats", then "Do you really dislike cats?" is shown on the console. e.Result.Words[0].Text
is the first spoken word:
static void speechRecognitionWithChoices_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
Console.WriteLine("Do you really " + e.Result.Words[1].Text +
" " + e.Result.Words[2].Text + "?");
manualResetEvent.Set();
}
If you use a DictationGrammar
, your program will recognize all speech using the Windows Desktop Speech technology. You can add a DictationGrammar
and a "exit" grammar:
SpeechRecognitionEngine _recognizer = new SpeechRecognitionEngine();
_recognizer.LoadGrammar(new Grammar(new GrammarBuilder("exit")));
_recognizer.LoadGrammar(new DictationGrammar());
_recognizer.SpeechRecognized += _recognizer_SpeechRecognized;
_recognizer.SetInputToDefaultAudioDevice();
_recognizer.RecognizeAsync(RecognizeMode.Multiple);
And the _recognizer_SpeechRecognized
method:
static void _recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
if (e.Result.Text == "exit")
{
manualResetEvent.Set();
return;
}
Console.WriteLine("You said: " + e.Result.Text);
}
new DictationGrammar()
returns an instance of the standard dictation grammar provided by Windows Desktop Speech technology.
Using a System.Speech.Synthesis.PromptBuilder
, you can build prompt for the SpeechSynthesizer
. You can add breaks, styles, sentences ... using the PromptBuilder
.
Using the StartSentence
and EndSentence
method, you can indicate the start and the end of a sentence:
PromptBuilder builder = new PromptBuilder();
builder.StartSentence();
builder.AppendText("This is a sentence.");
builder.EndSentence();
SpeechSynthesizer synthesizer = new SpeechSynthesizer();
synthesizer.Speak(builder);
synthesizer.Dispose();
Using the AppendBreak
method, you can append a break:
PromptBuilder builder = new PromptBuilder();
builder.StartSentence();
builder.AppendText("This is a sentence.");
builder.EndSentence();
builder.AppendBreak(new TimeSpan(0, 0, 1));
builder.StartSentence();
builder.AppendText("This is another sentence.");
builder.EndSentence();
SpeechSynthesizer synthesizer = new SpeechSynthesizer();
synthesizer.Speak(builder);
synthesizer.Dispose();
Using the StartStyle
and EndStyle
method, you can indicate the style in the PromptBuilder
(for example: loud, fast)
PromptBuilder builder = new PromptBuilder();
builder.StartStyle(new PromptStyle(PromptRate.Fast));
builder.AppendText("This text is spoken fast.");
builder.EndStyle();
builder.StartStyle(new PromptStyle(PromptVolume.ExtraSoft));
builder.AppendText("This text is spoken extra soft.");
builder.EndStyle();
SpeechSynthesizer synthesizer = new SpeechSynthesizer();
synthesizer.Speak(builder);
synthesizer.Dispose();
Using the StartVoice
and EndVoice
method, you can indicate the voice, if installed
PromptBuilder builder = new PromptBuilder();
builder.StartVoice(VoiceGender.Male, VoiceAge.Child);
builder.AppendText("This is a male child voice, if installed.");
builder.EndVoice();
SpeechSynthesizer synthesizer = new SpeechSynthesizer();
synthesizer.Speak(builder);
synthesizer.Dispose();
On my computer, there's just one voice installed. So if I try another voice using the StartVoice
method, then I don't get another voice.
This question is asked frequently in comments: how to train your speech recognition engine? From code, it is impossible, unfortunately. But you can train it through Windows Speech Recognition:
- Open Control Panel
- Go to Ease of Access
- Choose Speech Recognition
- Then choose Train your computer to better understand you
Then you'll see this form:
Press Next and then the training begins. Speak the sentences aloud:
- 8 Dec 2015: Fixed a bug related to RequestRecognizerUpdate as pointed out by George I. Birbilis
- 26 Mar 2014: Fixed problem with no-exe zip.
- 24 Mar 2014: Updated info about
RequestRecognizerUpdate()
. - 1 Mar 2014: Added Training your speech recognition engine
- 12 Jun 2013: Emulate speech recognition updated
- 2 Apr 2013: Prompt building added
- 18 Jan 2013: Bug fixed, and VB.NET downloads added
- 16 Jan 2013: To recognize ALL speech added, Table of Contents added
- 5 Jan 2013: Disclaimer updated, additional information added in the Make sure that the computer speaks to you paragraph, and a bug in the download files fixed
- 1 Jan 2013: Disclaimer updated
- 27 Dec 2012: Another technique on grammar building renamed to Other techniques on grammar building, and Choices and GrammarBuilder.Append added to Other techniques on grammar building.
- 20 Dec 2012: Another technique on grammar building and Speech rejected paragraph added and additional information added in the Speech recognition in C# paragraph
- 13 Dec 2012: Disclaimer updated
- 18 Nov 2012: I updated the SpeechRecognizer vs. SpeechRecognitionEngine paragraph
- 16 Nov 2012: SpeechRecognizer vs. SpeechRecognitionEngine paragraph added
- 27 Oct 2012: This is my second version of the article. I added the download files (it was suggested by Sandeep Mewara). I solved a little bug, and I added additional information at the Emulate speech recognition paragraph
- 27 Oct 2012: First version.