Introduction
CSharp and SAPI 5.1 make Speech Recognition (SR) and Text to Speech (TTS) development fun, easy and very rewarding. Within hours, you'll be able to produce exciting demo apps to impress your friends and colleagues. It could be a command and control Add-In to your existing app or something that makes your computer speak with HAL like voice.
Getting Started with SAPI
SAPI (Microsoft Sound API), can be downloaded from this URL. Attached examples were written with SAPI 5.1 and Visual Studio 8.
I will not cover instructions on how to install SAPI as it's covered very well in the provided documentation, but I will highlight a couple of setup steps that took me more than 15 minutes to figure out. To make SAPI accessible in your project/code files, you need to add "using SpeechLib
" to your code file and to Add reference to a COM component called "Microsoft Speech Object Library" from the Class View window.
Overview of the Functionality Covered in the Attached Code
SpSharedRecoContextClass
class is your interface to the speech recognition engine. You can create an instance of this class and then register to events this class produces. In the example, I only implemented the recognition event which gets triggered when the engine decides that it provided its best guess at the phrase/utterance it heard. You shouldn't need to do anything special to hook this class to your default microphone, however if you want to process speech from source other than the microphone, you should be using a sister class called SpInProcRecoContextClass
.
ISpRecoGrammar
class will provide you with some basic tools to control the type of recognition you want. The two basic types are Dictation and Context Free Grammar driven recognition. Dictation will give you decent enough quality for a cool demo but in my opinion it falls short from the minimum quality needed for commercial quality apps. Using CFGs on the other hand, you can define fairly good quality command and control type apps.
SpVoice
class will give you a very straight forward interface to TTS. I've not played around with this class much yet as SpVoice::Speak()
method gave me all the functionality I needed thus far.
An object of ISpeechRecoResult
class will be passed to your Recognition method handler. It'll give you access to the text that was understood by the SR engine and can be your portal to a lot of other cool under the hood info about probabilities and alternatives that the engine considered while evaluating the utterance.
Overview of the Included Code
- Form1.cs -- Bulk of the code is here
- Form1.Designer.cs -- Class declaration and UI code
- DirectoryService.xml -- A simple Context Free Grammar
- DirectoryService.cfg -- Compiled version of CFG, you should be able to self produce by running gc.exe included in the SAPI SDK
Summary
Overall I found the time I spent playing around with SAPI SDK to be quite rewarding. Perhaps the biggest frustration was the included documentation, it was quite cryptic and circular at times and is mainly geared towards VC++ developers. Being a novice C# developer, I found it quite challenging at times to figure out the datatypes I needed to pass to various methods. I also ran into a number of instances where methods and classes are either unique to CSharp API or were not supported.
Feel free to drop me a note with your SAPI experience.
Masksim Kozyarchuk (maksim_kozyarchuk@yahoo.com)