This article solves a problem faced by speech recognition programmers when they have to handle a grammar of dozens of words or phrases; it loads two external text files: one for the command list and one for the actions to be made according to the commands
Background
This article ideally integrates any Speech Recognition article (like the ones present in the MSDN directory or C.P. itself)
Using the code
The article uses the System.Speech functionality available since Windows 7 Enterprise (some versions) and improved in Windows 8.x; the SpeechRecognitionEngine is used to recognize spoken words according to a grammar built on an external file.
This article handles with the problem using VB.NET (the project has been created with Visual Basic 2010 XE).
The first stem in the program is to import the features we will need along the programming experience:
Imports System.Speech
Imports System.Speech.Recognition
Imports System.Runtime.InteropServices
Imports System.IO
Imports System.Net
Please notice that the System.Speech shall be added as a reference as well.
The main class contains almost all code in two sections: the '_Load
' and the '_SpeechRecognized
' event.
Prior to writing the two main events, we need some settings to be done via global variables, and we need to implement the recognition engine:
Public Class Vera
Dim WithEvents reco As New Recognition.SpeechRecognitionEngine
Dim commandset() As String
Dim cmdList As New GrammarBuilder
In the Load
event we add all the code handling the command set, the grammar, and the recognition / response to the commands:
Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
Dim npath As String = Application.StartupPath & "\commandlist.txt"
Dim nsr As StreamReader = File.OpenText(npath)
Dim i As Integer
Dim ls As String
For Each ls In File.ReadLines(npath)
ReDim Preserve commandset(i)
commandset(i) = ls
i += 1
ListBox1.Items.Add(ls)
Application.DoEvents()
Next
reco.SetInputToDefaultAudioDevice()
cmdList.Append(New Choices(commandset))
reco.LoadGrammar(New Recognition.Grammar(cmdList))
reco.RecognizeAsync()
End Sub
As seen in the code, the form must contain a ListBox where all the commands will be loaded for reference; this avoids keeping all the commands in meory or the command list file open.
The commandlist.txt file contains only the commands, one per line, with no spaces or blank lines, e.g.:
hello computer
what is the timer?
What is your name?
Open chrome
Go full screen
[…]
The next step is to set what the computer must do when it recognizes a command contained in the commandlist.txt file; this is achieved in the _SpeechRecognized
event for the Reco object.
Actually in this example we use an approach based on two events: RecognizeComplete
and SpeechRecognized
The first simply tells the computer to start another recognition asynchronously:
Private Sub reco_RecognizeCompleted(ByVal sender As Object, ByVal e As System.Speech.Recognition.RecognizeCompletedEventArgs) Handles reco.RecognizeCompleted
reco.RecognizeAsync()
End Sub
The second will handle all the comands / actiions and will be using the SpeechRecognized
event for the Reco
object. This long code is the 'core' of the program.
We present, in this example, a simple but very helpful method based on an external file constructed in the following way:
command^action1^commandtype^action2
A random number generated by a routine will be used to randomly allow the computer to perform action1 or action2; this is particularly helpful for social commands where we might want to associate more answers to the same command.
Here is an example of the file (named: commandactionlist.txt ):
hello vera^Hello creator, how are you?^social^Good morning master, I am ready to operate
honey are you there?^of course I am... where else should i be?^social^Here and running
where are you vera?^come find me, creator!^social^That is a very stupid question! I am trapped in this goddamn computer
open my computer^explorer.exe^comando^noaction
open chrome^chrome.exe^comando^noaction
close chrome^chrome.exe^comando^noaction
open wordpad^wordpad.exe^comando^noaction
navigate to facebook^www.facebook.com^website^noaction
navigate to hotmail^www.hotmail.com^website^noaction
navigate to twitter^http:
show commands^noaction^internal^noaction
hide commands^noaction^internal^noaction
go full screen^noaction^internal^noaction
goodbye^It's been a pleasure,^internal^noaction
As can be seen by the code, in this approach we use in the 3rd position of the string (commandtype) four possible choices: social, comando, website, internal.
The 'social' command type will simply be used to chat with the computer, which performs no real actions, so the action1 and action2 will be the 2 possible sentences with which the computer will answer the spoken command; the 'comando' command type will be used to handle operations on windows-installed programs, like in th example to open wordpad and such, in this case we will ony need action1, so action2 is set to a neutre value; the 'website' command type is formally equal to the 'comando' type' except that we pass a website address instead of an exe file; the 'internal' command type is used to operate on the program itself, in the cases above we use it to show and hide the commands listbox, and to set the form full screen.
Let's see now first the random generation:
Public Function getrandom(ByVal min As Integer, ByVal max As Integer) As Integer
Static generator As System.Random = New System.Random()
Return generator.Next(min, max)
End Function
And now we can handle the four command types. Notice that since the program has social functions, it uses a speech synthesis engine to interact with the user.
Private Sub reco_SpeechRecognized(ByVal sender As Object, ByVal e As System.Speech.Recognition.RecognitionEventArgs) Handles reco.SpeechRecognized
Dim response As String = ""
Dim synth As New Synthesis.SpeechSynthesizer
Dim npath As String = Application.StartupPath & "\commandactionlist.txt"
Dim nsr As StreamReader = File.OpenText(npath)
Dim ls As String
The above part uses the same procedure as per the comand list, to load the command/actions file; this will be parsed with its compoent splitted in an array usinc the ^ as a separator, and the random number generation will be applied to decide what action must be performed (this is valid only for the 'social' commands):
Dim params(3) As String
Dim execute = e.Result.Text.ToLower
Dim answer As String = ""
For Each ls In File.ReadLines(npath)
Dim value As Integer = getrandom(0, 6)
params = ls.Split("^"c)
Dim Command As String = params(0).ToLower
Dim comtype As String = params(2)
If comtype = "social" Then
If value <= 3 Then
response = params(1)
ElseIf value > 3 Then
response = params(3)
End If
Else
response = params(1)
End If
Dim Action As String = response
If execute.Contains(Command) And comtype = "social" Then
Dim robotvoice = CreateObject("sapi.spvoice")
answer = Action
robotvoice.Speak(answer)
ElseIf execute.Contains(Command) And comtype = "comando" Then
Dim robotvoice = CreateObject("sapi.spvoice")
If execute.Contains("open") Then
answer = "Opening application"
robotvoice.Speak(answer)
Process.Start(Action)
ElseIf execute.Contains("close") Then
answer = "Closing application"
robotvoice.Speak(answer)
For Each myprocess As Process In Process.GetProcessesByName(Action)
myprocess.CloseMainWindow()
Next
ElseIf execute.Contains(Command) And comtype = "website" Then
Try
If My.Computer.Network.IsAvailable = True Then
answer = "Website is opening in a while"
robotvoice.Speak(answer)
Process.Start(Action)
Else
answer = "It looks like there is no Internet connection available"
robotvoice.Speak(answer)
End If
Catch ex As Exception
End Try
End If
ElseIf execute.Contains(Command) And comtype = "internal" Then
Dim robotvoice = CreateObject("sapi.spvoice")
answer = "Ok Creator"
robotvoice.Speak(answer)
If Command = "show commands" Then
ListBox1.Visible = True
ElseIf execute.Contains("goodbye") Then
answer = Action
robotvoice.Speak(answer)
reco.UnloadAllGrammars()
reco.RecognizeAsyncStop()
reco.Dispose()
Me.Close()
ElseIf Command = "hide commands" Then
ListBox1.Visible = False
ElseIf Command = "go full screen" Then
Me.WindowState = FormWindowState.Maximized
End If
Application.DoEvents()
End If
Next
End Sub
The last step is of course to close the class.
End Class
Points of Interest
The main point of interest in writing this code was when I tried to make the program handle more complicated actions (like handling media files) using different forms. One feature that bugged me was that when the program is speaking long sentences (15 or more words) the program stops every other action. For example, my program also contains a clock that is refreshed every second, and a 'presentation' internal commands that makes the PC speak "My name is Vera I am a speech recognition software that can handle different kind of comands and interact socially". When you ask the computer to introduce itself, the clock stops.
I will be studying the problem and hopefully find a solution.