Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Making a Jarvis that reads commands from external file

0.00/5 (No votes)
20 Aug 2015 1  
Making a Jarvis that reads commands from external file

This article solves a problem faced by speech recognition programmers when they have to handle a grammar of dozens of words or phrases; it loads two external text files: one for the command list and one for the actions to be made according to the commands

Background

This article ideally integrates any Speech Recognition article (like the ones present in the MSDN directory or C.P. itself)

Using the code

The article uses the System.Speech functionality available since Windows 7 Enterprise (some versions) and improved in Windows 8.x; the SpeechRecognitionEngine is used to recognize spoken words according to a grammar built on an external file.

This article handles with the problem using VB.NET (the project has been created with Visual Basic 2010 XE).

The first stem in the program is to import the features we will need along the programming experience:

Imports System.Speech
Imports System.Speech.Recognition
Imports System.Runtime.InteropServices
Imports System.IO
Imports System.Net

Please notice that the System.Speech shall be added as a reference as well.

The main class contains almost all code in two sections: the '_Load' and the '_SpeechRecognized' event.

Prior to writing the two main events, we need some settings to be done via global variables, and we need to implement the recognition engine:

Public Class Vera

Dim WithEvents reco As New Recognition.SpeechRecognitionEngine
   
    Dim commandset() As String
    Dim cmdList As New GrammarBuilder

In the Load event we add all the code handling the command set, the grammar, and the recognition / response to the commands:

Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load

Dim npath As String = Application.StartupPath & "\commandlist.txt"
        Dim nsr As StreamReader = File.OpenText(npath)
        Dim i As Integer
        Dim ls As String
        For Each ls In File.ReadLines(npath)
            ReDim Preserve commandset(i)
            commandset(i) = ls
            i += 1
            ListBox1.Items.Add(ls)
            Application.DoEvents()
        Next
        
        reco.SetInputToDefaultAudioDevice()

        cmdList.Append(New Choices(commandset))
        reco.LoadGrammar(New Recognition.Grammar(cmdList))
        reco.RecognizeAsync()
End Sub

As seen in the code, the form must contain a ListBox where all the commands will be loaded for reference; this avoids keeping all the commands in meory or the command list file open.

The commandlist.txt file contains only the commands, one per line, with no spaces or blank lines, e.g.:

hello computer
what is the timer?
What is your name?
Open chrome
Go full screen
[…]

The next step is to set what the computer must do when it recognizes a command contained in the commandlist.txt file; this is achieved in the _SpeechRecognized event for the Reco object.

Actually in this example we use an approach based on two events: RecognizeComplete and SpeechRecognized

The first simply tells the computer to start another recognition asynchronously:

Private Sub reco_RecognizeCompleted(ByVal sender As Object, ByVal e As System.Speech.Recognition.RecognizeCompletedEventArgs) Handles reco.RecognizeCompleted

        reco.RecognizeAsync()
End Sub

The second will handle all the comands / actiions and will be using the SpeechRecognized event for the Reco object. This long code is the 'core' of the program.

We present, in this example, a simple but very helpful method based on an external file constructed in the following way:

command^action1^commandtype^action2

A random number generated by a routine will be used to randomly allow the computer to perform action1 or action2; this is particularly helpful for social commands where we might want to associate more answers to the same command.

Here is an example of the file (named: commandactionlist.txt ):

hello vera^Hello creator, how are you?^social^Good morning master, I am ready to operate

honey are you there?^of course I am... where else should i be?^social^Here and running

where are you vera?^come find me, creator!^social^That is a very stupid question! I am trapped in this goddamn computer

open my computer^explorer.exe^comando^noaction

open chrome^chrome.exe^comando^noaction

close chrome^chrome.exe^comando^noaction

open wordpad^wordpad.exe^comando^noaction

navigate to facebook^www.facebook.com^website^noaction

navigate to hotmail^www.hotmail.com^website^noaction

navigate to twitter^http://twitter.com^website^noaction

show commands^noaction^internal^noaction

hide commands^noaction^internal^noaction

go full screen^noaction^internal^noaction

goodbye^It's been a pleasure,^internal^noaction

As can be seen by the code, in this approach we use in the 3rd position of the string (commandtype) four possible choices: social, comando, website, internal.

The 'social' command type will simply be used to chat with the computer, which performs no real actions, so the action1 and action2 will be the 2 possible sentences with which the computer will answer the spoken command; the 'comando' command type will be used to handle operations on windows-installed programs, like in th example to open wordpad and such, in this case we will ony need action1, so action2 is set to a neutre value; the 'website' command type is formally equal to the 'comando' type' except that we pass a website address instead of an exe file; the 'internal' command type is used to operate on the program itself, in the cases above we use it to show and hide the commands listbox, and to set the form full screen.

Let's see now first the random generation:

Public Function getrandom(ByVal min As Integer, ByVal max As Integer) As Integer
        Static generator As System.Random = New System.Random()
        Return generator.Next(min, max)
    End Function

And now we can handle the four command types. Notice that since the program has social functions, it uses a speech synthesis engine to interact with the user.

Private Sub reco_SpeechRecognized(ByVal sender As Object, ByVal e As System.Speech.Recognition.RecognitionEventArgs) Handles reco.SpeechRecognized
        Dim response As String = ""
        Dim synth As New Synthesis.SpeechSynthesizer
        Dim npath As String = Application.StartupPath & "\commandactionlist.txt"
        Dim nsr As StreamReader = File.OpenText(npath)
        Dim ls As String

The above part uses the same procedure as per the comand list, to load the command/actions file; this will be parsed with its compoent splitted in an array usinc the ^ as a separator, and the random number generation will be applied to decide what action must be performed (this is valid only for the 'social' commands):

    Dim params(3) As String
    Dim execute = e.Result.Text.ToLower
    Dim answer As String = ""
    For Each ls In File.ReadLines(npath)

        Dim value As Integer = getrandom(0, 6)
        params = ls.Split("^"c)
        Dim Command As String = params(0).ToLower
        Dim comtype As String = params(2)
        If comtype = "social" Then

            If value <= 3 Then
                response = params(1)
            ElseIf value > 3 Then
                response = params(3)
            End If

        Else
            response = params(1)

        End If
        Dim Action As String = response


        If execute.Contains(Command) And comtype = "social" Then
            Dim robotvoice = CreateObject("sapi.spvoice")
            answer = Action
            robotvoice.Speak(answer)

        ElseIf execute.Contains(Command) And comtype = "comando" Then
            Dim robotvoice = CreateObject("sapi.spvoice")

            If execute.Contains("open") Then
                answer = "Opening application"
                robotvoice.Speak(answer)
                Process.Start(Action)
            ElseIf execute.Contains("close") Then
                answer = "Closing application"
                robotvoice.Speak(answer)
                For Each myprocess As Process In Process.GetProcessesByName(Action)
                    myprocess.CloseMainWindow()
                Next

            ElseIf execute.Contains(Command) And comtype = "website" Then

                Try
                    If My.Computer.Network.IsAvailable = True Then

                        answer = "Website is opening in a while"
                        robotvoice.Speak(answer)
                        Process.Start(Action)

                    Else

                        answer = "It looks like there is no Internet connection available"
                        robotvoice.Speak(answer)
                    End If
                Catch ex As Exception

                End Try

            End If


        ElseIf execute.Contains(Command) And comtype = "internal" Then
            Dim robotvoice = CreateObject("sapi.spvoice")
            answer = "Ok Creator"
            robotvoice.Speak(answer)
            If Command = "show commands" Then
                ListBox1.Visible = True

            ElseIf execute.Contains("goodbye") Then
                answer = Action
                robotvoice.Speak(answer)
                reco.UnloadAllGrammars()
                reco.RecognizeAsyncStop()
                reco.Dispose()

                Me.Close()


            ElseIf Command = "hide commands" Then
                ListBox1.Visible = False


            ElseIf Command = "go full screen" Then
                Me.WindowState = FormWindowState.Maximized


            End If
            Application.DoEvents()
        End If

    Next

End Sub

The last step is of course to close the class.

End Class

Points of Interest

The main point of interest in writing this code was when I tried to make the program handle more complicated actions (like handling media files) using different forms. One feature that bugged me was that when the program is speaking long sentences (15 or more words) the program stops every other action. For example, my program also contains a clock that is refreshed every second, and a 'presentation' internal commands that makes the PC speak "My name is Vera I am a speech recognition software that can handle different kind of comands and interact socially". When you ask the computer to introduce itself, the clock stops.

I will be studying the problem and hopefully find a solution.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here