AISpeech API ASSDK tutorial 2: A more completed version

Menrfa

0.00/5 (No votes)

10 Apr 2012CPOL8 min read

11.4K

In this article, we are going to build up a more completed version. We will explore almost every main methods ASSDK has provided, in addition to looking at those ASSDK returns and responses.

Introduction

In the previous post, we have quickly went though the process of building an spoken English assessment application using AISpeech API clouding service, in additon to their ASSDK (ActionScript 3.0).

In this article, we are going to build up a more completed version. We will explore almost every main methods ASSDK has provided, in addition to looking at those ASSDK returns and responses.

The source code of this tutorial can be found from github.

Target

We are to build a spoken English assessment application. It is a Flex application. The UI would like Figure 1.

Figure 1 Application wireframe

Score display

This time, we colour each word of the reference text to indicate the pronunciation score. Green (#00FF00) means very good. Yellow (#FFD800) means good. Orange (#FF6A00) means bad. Red (#FF0000) means very bad. We give each word a <span> HTML tag, and assign its color attribute. We then show the HTML contents using Flex TextArea (spark) component.

We start with a Flex TextArea (spark) component:

XML

<s:TextArea id="txtScores" fontSize="30" editable="false"/>

We then add the following scripts:

JavaScript

import spark.utils.TextFlowUtil; 
private var test:String =
    "<span color='#00ff00'>I </span> " +
    "<span color='#ffd800'>like </span> " +
    "<span color='#ffd800'>to </span> " +
    "<span color='#ff6a00'>play </span> " +
    "<span color='#ff0000'>piano </span> ";
private function init():void
{
    txtScores.textFlow = TextFlowUtil.importFromString(test);
}

We need to assign the init() method to the application’s creationComplete callback method. Also note the space we append to each word. It shows as:

Figure 2 Colourful score output

The other controls

As the wireframe shown in Figure 1, we need several other controls. The scripts are:

XML

<s:VGroup> 
    <s:HGroup>
        <s:Label text="Reference Text"/>
        <s:TextInput id="txtRefText" text="I like to play piano" fontSize="30" width="627"/>
    </s:HGroup>
    <s:HGroup>
        <s:Button label="Start Record" fontSize="20" />
        <s:Button label="Stop Record" fontSize="20" />
        <s:CheckBox label="Play 'ding'" id="chboxPlayDing" selected="true" />
        <s:Button label="Replay the latest" fontSize="20" />
    </s:HGroup>
    <s:HGroup>
        <s:Label text="Scores" />
        <s:TextArea id="txtScores" width="666" height="50" editable="false" fontSize="30" />
    </s:HGroup>
    <s:HGroup>
        <s:Label text="RecorderLib returns and events"/>
        <s:TextArea id="txtReturns" width="531" height="100"/>
    </s:HGroup>
    <s:Label text="RecordLib components states"/>
    <s:Label text="Microphone: " id="txtMicrophoneState" fontSize="20"/>
    <s:Label text="Connection: " id="txtConnectionState" fontSize="20"/>
    <s:Label text="Core requester: " id="txtCoreRequesterState" fontSize="20"/>
    <s:Label text="Recorder: " id="txtRecorderState" fontSize="20"/>
</s:VGroup>

The completed layout is shown as the following figure.

Figure 3 The completed layout

Record user’s speech using AISpeech API ASSDK

As illustrated in the previous post, AISpeech API ASSDK comes as an AISpeechLib.swc. Set the Flex application link to this library. The following codes prepare and initialise a RecorderLib instance.

JavaScript

import com.aispeech.RecorderLib; 
private static const RECORDERLIB_PARAMS:Object = {
    appKey:"your application ID", 
    secretKey:"your secret Key", 
    serverList:["rtmp://demo.aispeech.com:80/v2.0/aistream","rtmpt://demo.aispeech.com:433/v2.0/aistream"]
}; 
private var _coreRequesterParams:Object = {
    refText:"past", // this to be updated for each core request (record)
    rank:4,        // we use four-level-scores this time
    coreType:"en.sent.score",  // request the English-Senetence core 
    userId:"xxxxxx",
    applicationId:"your application ID"  // application ID again 
}; 
private var _recorder:RecorderLib; 
private function init():void
{
    _recorder = new RecorderLib();
    _recorder.init(RECORDERLIB_PARAMS);
}

Several things to note to make this tutorial self-completed, though the following has been covered in the previous tutorial.

RECORDERLIB_PARAMS is used to initialise a RecorderLib instance. appKey and secretKey come from AISpeech by asking. serverLists contains a list of AISpeech API clouding access points.
_coreRequesterParams are used to make requests to the speech core. Each record follows one core request. refText may be updated for each request/record.

Run the application, as shown in the following figure, you should see a dialogue asking for use’s permission to use the microphone device. We said “this prompt is a nice indicator that the RecorderLib has been loaded successfully”, which is however not very true. We will find out later.

Figure 4 Dialogue asking for user’s permission

We then add a startRecord() and a stopRecord() methods as the following:

JavaScript

private function startRecord():void 
{ 
    _coreRequesterParams.refText = txtRefText.text; 
    var recordLength:int = 2500 + txtRefText.text.split(" ").length * 450; 
    var recorderParams:Object = 
        { 
            serverParam:_coreRequesterParams, 
            recordLength:recordLength // ms  
        }; 
    _recorder.startRecord(recorderParams); 
} 
private function stopRecord():void 
{ 
    _recorder.stopRecord(); 
}

Several things to note:

We update _coreRequesterParams.refText by user’s input (txtRefText.txt) before sending it to _recorder’s startRecord() method.
We calculate record length (ms) with respect to the number of words of the reference text.
We then assign startRecord() method to the Start Record button’s click callback, and stopRecord() method to the Stop Record button’s click callback.

Run the project. Click the Start button. If you hear a “ding”, it means the recording works well now.

Show scores

We will go through three steps in this section: obtain API responses, compose the HTML string, and show scores.

The following scripts give a event handler to parse the score results:

JavaScript

private function coreRequesterEventHandler(event:CoreRequesterEvent):void
{
    if (event.type == CoreRequesterEvent.RESULT)
    {
        // parse results
        var details:Array = event.data.result.details;
        trace("length of details: " + details.length);
    }
}

The following scripts (inside the init() method) let the RecorderLib instance listen to CoreRequesterEvent.RESULT event. Note that, addEventListener() method is put right after the RecorderLib instance is newed, before it’s initialised.

JavaScript

_recorder = new RecorderLib(); 
_recorder.addEventListener(CoreRequesterEvent.RESULT, coreRequesterEventHandler); 
_recorder.init(RECORDERLIB_PARAMS);

We trace out details’ length in coreResultEventHandler() method. Debug the project, check the console, and see the details’ length is 5, given the reference text is “I like to play piano”, which is correct.

Set a breakpoint at the line we call trace() method. Debug the project again. Speak to microphone and the program stops. As shown in the following figure, score property of the element of details array is “4”. Since we set the rank property of _coreRequesterParams object as 4, now the speech core returns scores in 4-level. 4 is the highest score while 1 is the lowest.

Figure 5 Results

The following codes give a helper function, composing the HTML string, which colourly labels the word with respect to its score, respectively. Remember the rule: red means very bad, while green means very good. There are orange and yellow in between.

JavaScript

private function colourfulScoreHelper(refText:String, details:Array):String
{
    // red, orange, yellow, green
    var colourArray:Array = ["#FF0000", "#FF6A00", "#FFD800", "#00FF00"];
    var refTextArray:Array = refText.split(" ");
    var htmlString:String = "";
    var temp:String = "";
    for (var i:int = 0; i < refTextArray.length; i ++)
    {
        // recall the example: "<span color='#00ff00'>I </span>"
        temp = "<span color='" +
            colourArray[details[i].score - 1] +
            "'>" +
            refTextArray[i] +
            " </span>";
        htmlString += temp;
    }
    return htmlString;
}

The idea is that: we use word’s score as the index to select colour value from colourArray. The rest are simple string concatenation. We may further make colourArray a constant member variable.

Go back to coreRequesterEventHandler() method. The following codes call the HTML string helper and update txtScores TextArea component.

JavaScript

private function coreRequesterEventHandler(event:CoreRequesterEvent):void
{
    if (event.type == CoreRequesterEvent.RESULT)
    {
        // parse results
        var details:Array = event.data.result.details;
        var htmlString:String = colourfulScoreHelper(_coreRequesterParams.refText, details);
        txtScores.textFlow = TextFlowUtil.importFromString(htmlString);
    }
}

The project should work and show colourful and meaningful speech assessment results.

So far, we have almost repeated what we’ve done in the previous tutorial. But this time, we implemented a new score view, and tried 4-level-scores. In the next section, we will look into RecorderLib a bit further.

RecorderLib components

Although RecorderLib exposes only a handful methods and properties, the inside is quite complex. To put it simple, RecorderLib contains four major components: microphone, recorder, connection, and core requester. We won’t dig into their implementation. (Actually, AI Speech Ltd and his engineers have planned to open source the ASSDK, perhaps after the next major update.) In the demo, we will see what else information, rather than core results we’ve coped with, RecorderLib would return.

We start from method returns.

According to ASSDK documents, methods such as init() and startRecord() all returns a StatusCode. We now show them in txtReturns, a TextArea instance we’ve defined in the layout.

We first give a function to append string to txtReturns:

JavaScript

private function appendReturns(message:String):void
{ 
    txtReturns.text += (message + "\n");
}

We then apply appendReturns() method to the call of RecorderLib’s init(), startRecord() and stopRecord() method (so far we’ve known only these three methods). E.g.:

JavaScript

appendReturns(_recorder.init(RECORDERLIB_PARAMS));

Run the application, and play with it (e.g. click the Start Record and the Stop Record buttons). This time, the “RecorderLib returns and events” shows some “logs”. Most of these logs are StatusCode, or numbers. According to ASSDK documents, I list several StatusCode and its meaning as the following:

50004: Successful
50005: Parameters error
50003: Microphone device not available
…

We now know RecorderLib methods return <code>StatusCode. Let’s look at what events RecordeLib dispatches.

We’ve known CoreRequesterEvent.RESULT. This event is vital to catch API response and to parse results (scores). RecorderLib dispatches many other events. The following is a full list:

FactoryEvent

READY
EXCEPTION_TIMEOUT

NetEvent

EXCEPTION_CLOSED

CoreRequesterEvent

RESULT
EXCEPTION_TIMEOUT
EXCEPTION_PARAMETERS_ERROR
EXCEPTION_RESPONSE_ERROR

MicrophoneDeviceEvent

MIC_ALLOWED
EXCEPTION_MIC_NOT_ALLOWED
EXCEPTION_MIC_NOT_FOUND

RecorderEvent

RECORD_STARTED
RECORD_STOPPED
REPLAY_STARTED
REPLAY_STOPPED
RECORDID_GOT
EXCEPTION_NO_RECORD

Note that, RecorderLib has a factory component, which initialises RecorderLib instance, connects to the server, and automatically wires the aforementioned four major components. Before the factory finishes its job, RecorderLib instance should not make records or requests the speech core. So there comes FactoryEvent.READY event. That’s why we pointed out that, seeing the dialogue asking for uers’ permission doe not necessarily means the RecorderLib instance has been successfully initialised. Event won’t lie.

In the following script, we add several event listeners to listen to all defined events. Note that, all the above listed events extend from an AIEvent class. Except CoreRequesterEvent.RESULT, we use only one event handler to process the other events: print them out. This is for demo purpose only. In real application, each event should be handled carefully. E.g., RecorderEvent.RECORD_STARTED and RecorderEvent.RECORD_STOPPED can be combined to control a recording progress bar. For another instance, RecorderEvent.RECORDID_GOT can be used to retrieve recordId of the local cached records.

JavaScript

_recorder = new RecorderLib(); 
_recorder.addEventListener(CoreRequesterEvent.RESULT, coreRequesterEventHandler);
// add the other event listeners
_recorder.addEventListener(FactoryEvent.READY, eventHandler);
_recorder.addEventListener(FactoryEvent.EXCEPTION_TIMEOUT, eventHandler);
_recorder.addEventListener(NetEvent.EXCEPTION_CLOSED, eventHandler);
_recorder.addEventListener(CoreRequesterEvent.EXCEPTION_PARAMETERS_ERROR, eventHandler);
_recorder.addEventListener(CoreRequesterEvent.EXCEPTION_RESPONSE_ERROR, eventHandler);
_recorder.addEventListener(CoreRequesterEvent.EXCEPTION_TIMEOUT, eventHandler);
_recorder.addEventListener(MicrophoneDeviceEvent.MIC_ALLOWED, eventHandler);
_recorder.addEventListener(MicrophoneDeviceEvent.EXCEPTION_MIC_NOT_ALLOWED, eventHandler);
_recorder.addEventListener(MicrophoneDeviceEvent.EXCEPTION_MIC_NOT_FOUND, eventHandler);
_recorder.addEventListener(RecorderEvent.EXCEPTION_NO_RECORD, eventHandler);
_recorder.addEventListener(RecorderEvent.RECORD_STARTED, eventHandler);
_recorder.addEventListener(RecorderEvent.RECORD_STOPPED, eventHandler);
_recorder.addEventListener(RecorderEvent.RECORDID_GOT, eventHandler);
_recorder.addEventListener(RecorderEvent.REPLAY_STARTED, eventHandler);
_recorder.addEventListener(RecorderEvent.REPLAY_STOPPED, eventHandler);
appendReturns(_recorder.init(RECORDERLIB_PARAMS));

Please note, we add event listeners before initialising the RecorderLib instance.

eventHandler() method simply appends event.type to txtReturns TextArea:

JavaScript

private function eventHandler(event:AIEvent):void 
{
    appendReturns(event.type);
}

Now, we can see many more log outputs. e.g.

microphone.device.exception.mic.not.allowed 
[{"message":"","statusCode":"50004"}] 
factory.ready 
microphone.device.mic.allowed 
recorder.recordID.got 
recorder.record.started 
[{"message":"start recording","statusCode":"50004"}] 
recorder.record.stopped 
[{"message":"stop record","statusCode":"50004"}]

Besides method returns and events, RecorderLib also allow application to peek its components’ states.

In the following codes, we setup a timer, automatically check the RecorderLib instance’s states, and print them to the screen. First is a checkStates() method:

JavaScript

private function checkStates(event:TimerEvent):void
{
    txtMicrophoneState.text = "Microphone: " + _recorder.microphoneDeviceState;
    txtRecorderState.text = "Recorder: " + _recorder.recorderState;
    txtConnectionState.text = "Connection: " + _recorder.connectionState;
    txtCoreRequesterState.text = "Core requester: " + _recorder.coreRequesterState;
}

And then in init() method, we setup a timer:

JavaScript

_timer = new Timer(2000); 
_timer.addEventListener(TimerEvent.TIMER, checkStates);
_timer.start()

Note that, _timer is a Timer instance, defined as a private member variable.

Now, the application checks the RecorderLib instance’s components states every 2 seconds, and shows them (as String) as the following figure:

Figure 6 Component states

Two more RecorderLib functions.

We want to listen to the latest record. Assign the following code to the “Replay the latest” button’s click callback:

JavaScript

appendReturns(_recorder.startReplay({}));

Several notes:

We call startReplay() method to replay. We give this method an empty object “{}” in order to play the latest local cached records.
We wrap the startReplay() method with appendReturns(), so that we can see its returns on the screen.

Recall the Figure 1. In that wireframe, we designed a checkbox saying “Play ding”. We notice RecorderLib plays a “ding” to notify user to start to speak. This is a good voice user interface design that system should response user’s action (clicked the Start Record button). This is also necessary to make sure that user speaks after the real recording function starts. Nevertheless, if you do not want to play “ding”, then feel free to do so. However, it is a best practice to response user’s action, so maybe a recording progress bar is a good complementation.

Recall startRecord() method, we set two properties (serverParam, and recordLength) for recorderParam variable. RecorderLib.startRecord() method accept another property: playDing. See the following code:

JavaScript

private function startRecord():void
{
    _coreRequesterParams.refText = txtRefText.text;
    var recordLength:int = 2500 + txtRefText.text.split(" ").length * 450;
    _playDing = chboxPlayDing.selected;
    var recorderParams:Object =
        {
            serverParam:_coreRequesterParams,
            recordLength:recordLength, // ms 
            playDing:_playDing
        };
    appendReturns(_recorder.startRecord(recorderParams));
}

Conclusion

All right, it is a long article. But we have all done now.

In this tutorial, before Section “RecorderLib components”, comparing to the previous tutorial, there are two new things:

We use Flex TextArea to render a HTML string to show scores in colour.
We ask speech core to return 4-level scores.

In Section “RecorderLib components” and onwards, we capture RecorderLib’s method returns, events, and component states. We simply print them on the screen. But this information is important to build a error-proof and user friendly application.

As promised, I’ve asked AI Speech Ltd for an English version of ASSDK documents. As far as I know, at the time of writing this article, they are adding ASDoc comments to the sources. I may upload a PDF version somewhere.

Also, the source code of this tutorial can be found from github.

Thanks for reading.

History

This is the first edition of this article.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)