Text to Speech with the Microsoft Speech Library and SDK version 5.1

salysle

4.67/5 (17 votes)

31 Aug 20069 min read

8.9K

This article describes an application used to exercise some of the Text To Speech features available to .NET developers through the Microsoft Speech 5.1 SDK.

Download source project - 249 Kb

Introduction

This article describes an application used to exercise some of the Text To Speech features available to .NET developers through the Microsoft Speech 5.1 SDK. This article does not address the newer speech server related libraries, nor does it address web based deployments of speech related technologies.

The application performs several functions although all work in basically the same manner. The application is intended to provide an introduction to working with the TTS library, by illustrating how to go about gaining access to and manipulating voices, and playing text out as synthesized voice. The application provides examples of generating speech as you type, passing canned phrases to TTS, and passing entire text files to TTS.

Getting Started

In order to get started, unzip the included project and open the solution in the Visual Studio 2005 environment. You will note that the project contains a file cleverly named “Form1.vb”. This form contains all of the code necessary to get a start with programming TTS.

To begin, you may not have the necessary references on your machine as the application requires the installation of Microsoft’s Speech 5.1 SDK and the Microsoft sample TTS engine library. These may be downloaded with the SDK at no cost, from this URL:

Speech 5.1 SDK

You may also obtain a couple of additional voices (the SDK includes Microsoft Mary, Microsoft Mike, and Microsoft Sam) by downloading the Microsoft Reader and additional TTS components found here (not required, but you will gain two additional voices if you do add these to your system).

You do not need to activate the reader for this to work, however, you can’t install the additional voices unless you have the reader installed.

If you have any other voices on your system, they may also be exposed to the application. For example, my Toshiba laptop has an additional voice called “TOSHIBA male adult (U.S.)” and this voice also appears as available to this application at runtime.

If you need to update the project references, do so prior to attempting to run the application. Once you have installed the Speech SDK, go back to the project and run a build. If the references are absent, remove these (highlighted) references: (Figure 1)

Figure 1: Speech Related Project References

After removing the old references, right click on the project and select “Add Reference”. Once the dialog opens, select the COM tab (then go get a cup of coffee while it takes forever to load, and when you get back), look for and add these two references (figures 2 and 3) (Note: you really don’t need the second reference to the sample TTS engine):

Figure 2: Add the Microsoft Speech Object Library Reference

Figure 3: Add the Sample TTS Engine Type Library Reference

Having added these references, go ahead and do a build to see if anything else is missing. If anything turns up, add the missing project references in the same manner and build again. Once you have a good build, go ahead and run the application. On application start, you will see this form appear:

Figure 4: The Main Form of the TTS Reader Application

Looking at the form, note that it has five control groups: “Configuration”, “Speak As You type”, “Speak Specific Phrases”, “Speak On Enter”, and “Load a Text File and Read It”.

Configuration

This control group contains two controls, the speaker combo box and the speech rate track bar control. The speaker combo box is populated with the names of each of the TTS speaker voices; you may change the current speaker by selecting a different option form this combo box.

The rate track bar control will speed up or reduce the cadence of the synthesized speech. It is set to contain five positions, and whenever its value is changed, the rate of speech will be altered to execute at the newly set rate.

Speak As You Type

This control group contains a single text box which has been configured such that, whenever the user hits the space bar, the speaker will read the contents of the text box and, once finished reading, it will clear the text box. The intent here is to see if you could type as you go and speak through TTS. It seemed like a nice idea, and it seems like it would be worthwhile for someone lacking the capacity for speech to use a function like this to speak by typing. In reality, the action is a little choppy, and the speech rendered is not too terrific. With the application running, you may key in a word and listen to the results for yourself. If you type slow enough, it is adequate, but it is not quite quick enough to use as a form of conversation.

Speak Specific Phrases

This control group contains a single combo box; whenever a new value is selected from the box, it will immediately be read by the speaker.

Speak On Enter

This appears to be a far more viable way to conduct a conversation using TTS as a voice medium. This control works in a manner very similar to the “Speak As You Type” option; however, it reads and clears the text box only after the user hits the “Enter” key. You may try typing in a sentence and then hitting the Enter key to get a feel for how that works.

Load a Text File and Read It

This control group contains a single multi-line text box control, and three buttons: “Open File”, “Stop”, and “Read File”. Click on the “Open File” button and use the open file dialog box to navigate to any text file. The text file will load into the text box, and with a file loaded, you may click on the “Read File” button to have the speaker read the contents of the text box end to end. TTS does a fair job of this; however, I will point out that punctuation and abbreviations do not work out too well for the 5.1 SDK.

You may also key text into the text box and evoke the “Read File” function to read the contents of the text box.

The Code

The code is pretty straightforward and easy to follow. The class definition begins as follows:

Imports SpeechLib
Imports System.Environment
Imports System.DateTime

Public Class Form1

#Region "Declarations"
     Public WithEvents vox As New SpVoice
    Public RateOfSpeech As Integer = 3

#End Region


    Private Sub Form1_Load(ByVal sender As System.Object, _
            ByVal e As System.EventArgs) Handles MyBase.Load

        ' Load the voices combo box
        Dim Token As ISpeechObjectToken
        For Each Token In vox.GetVoices
            cboVoxOptions.Items.Add(Token.GetDescription())
        Next
        cboVoxOptions.SelectedIndex = 0

        Dim str As String = Environment.UserName.ToString()
        SayGreeting(str)

    End Sub

As you can see, the imports section includes the speech library. A Declaration region was next defined, and two variables were declared within that region. The first creates an instance of an SpVoice, and note that the declaration is made with events. The other variable, RateOfSpeech, is used to keep track of the current rate of speech selected using the Rate of Speech track bar control.

In form load, we begin by collecting all of the current voices and adding them to the combo box used to select a speaker. The current index is set to zero such that, when the form loads, a current speaker will be defined.

The last two lines of the form load subroutine are used to capture the user’s name (however, it may be defined on the target machine) and to pass the name to the Say Greeting subroutine. The “Say Greeting” subroutine is used to present a welcome message to the user through TTS. The “Say Greeting” subroutine is written as follows:

Public Sub SayGreeting(ByVal strUser As String)

    ' Now say something
    vox.Voice = vox.GetVoices().Item(cboVoxOptions.SelectedIndex)

    Dim dt As DateTime
    dt = Now

    ' clear your throat
    vox.Rate = RateOfSpeech
    vox.Speak("".ToString, SpeechVoiceSpeakFlags.SVSFPurgeBeforeSpeak)

    Try
        vox.Speak("Greetings " & strUser & " from Text To Speech", _
        SpeechVoiceSpeakFlags.SVSFPurgeBeforeSpeak)
        vox.Speak("Today's Date is " & dt.ToShortDateString, _
        SpeechVoiceSpeakFlags.SVSFPurgeBeforeSpeak)
        vox.Speak("The time is " & dt.ToShortTimeString, _
        SpeechVoiceSpeakFlags.SVSFPurgeBeforeSpeak)
    Catch ex As Exception
        MsgBox(ex.ToString, MsgBoxStyle.Exclamation, "I'm Speechless")
    End Try

End Sub

As you can see, the subroutine formats a message containing the passed in username as well as the date and time, and then reads that message aloud using the current speaker voice. Note the use of the SVSFPurgeBeforeSpeak flag; it is there to ensure that the speaker will finish the last statement before progressing on to the next one.

Next up is the track bar control’s handler, it is written as follows:

Private Sub tbarRateOfSpeech_Scroll(ByVal sender As System.Object, _
        ByVal e As System.EventArgs) Handles tbarRateOfSpeech.Scroll

    Me.RateOfSpeech = tbarRateOfSpeech.Value

End Sub

This function merely sets the rate of speech variable to contain the current track bar value. The variable is used to set the rate property for the speaker whenever the speaker is passed text to read.

Following the track bar control handler, you will see the following code:

Private Sub TextBox1_KeyPress(ByVal sender As Object, _
        ByVal e As System.Windows.Forms.KeyPressEventArgs) _
        Handles TextBox1.KeyPress

    vox.Rate = RateOfSpeech

    ' this will try to speak each word as you type, it does not keep up
    ' all that well
    If e.KeyChar = Microsoft.VisualBasic.ChrW(Keys.Space) Or _
       e.KeyChar = Microsoft.VisualBasic.ChrW(Keys.Enter) Then
        vox.Speak(TextBox1.Text, SpeechVoiceSpeakFlags.SVSFDefault)
        TextBox1.Text = ""
    End If

End Sub

This bit of code is used to drive the Speak As You Type function. Here the rate of speech is set to the current RateOfSpeech variable’s value, and the text box is set to look for a space key hit; whenever a space is entered, the code will pass the contents of the text box to the speaker, the speaker will read the text, and then the text box will be cleared and made ready for the next word to be typed.

The next bit of code will drive the Speak On Enter function; the code is identical to that used in the Speak As You Type function, but rather than reading out the contents of the text box on space, the contents will be read out whenever the user hits the Enter key. That code looks like this:

Private Sub TextBox2_KeyPress(ByVal sender As Object, _
        ByVal e As System.Windows.Forms.KeyPressEventArgs) _
        Handles TextBox2.KeyPress

    vox.Rate = RateOfSpeech

    ' this will try to speak the contents of the textbox on Enter
    If e.KeyChar = Microsoft.VisualBasic.ChrW(Keys.Enter) Then
        vox.Speak(TextBox2.Text, SpeechVoiceSpeakFlags.SVSFDefault)
        TextBox2.Text = ""
    End If

End Sub

The last pieces of code to look at manage the function used to read from a text file. The first item is used to open a file open dialog and read a text file into the control group’s text box. That code looks like this:

Private Sub btnOpenFile_Click(ByVal sender As System.Object, _
        ByVal e As System.EventArgs) Handles btnOpenFile.Click

    vox.Rate = RateOfSpeech

    If OpenFileDialog1.ShowDialog() = Windows.Forms.DialogResult.OK Then

        Dim sr As New System.IO.StreamReader(OpenFileDialog1.FileName)
        Me.txtReadFile.Text = sr.ReadToEnd.ToString()
        sr.Close()

    End If

End Sub

The next bit is used to read the file, it looks like this:

Private Sub btnReadFile_Click(ByVal sender As System.Object, _
        ByVal e As System.EventArgs) Handles btnReadFile.Click

    vox.Rate = RateOfSpeech

    vox.Speak(txtReadFile.Text.ToString(), _
    SpeechVoiceSpeakFlags.SVSFlagsAsync)

End Sub

You will note that the function is basically the same as that used to read from one of the other form text boxes (note that the speak flag is set to the asynchronous mode). The next item to look at is used to stop the speaker from continuing to read from the text; that code looks like this:

Private Sub btnStop_Click(ByVal sender As System.Object, _
        ByVal e As System.EventArgs) Handles btnStop.Click

    vox.Speak("", SpeechVoiceSpeakFlags.SVSFPurgeBeforeSpeak)

End Sub

This subroutine passes an empty string to the speaker, and in so doing stops the speaker from continuing.

The last bit of code in the application is used to change the speaker’s voice to one selected from the speaker combo box, that code looks like this:

Private Sub cboVoxOptions_SelectedIndexChanged(ByVal sender _
        As System.Object, ByVal e As System.EventArgs) _
        Handles cboVoxOptions.SelectedIndexChanged

    vox.Voice = vox.GetVoices().Item(cboVoxOptions.SelectedIndex)

End Sub

Summary

This article and code sample was intended to provide a very easy introduction into TTS based speech synthesis; there are a great many more things that you can do with the speech SDK than have been addressed in this document. A review of the contents of the speech SDK will provide greater details on the use of the speech libraries.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here