Introduction
In the previous post, we have
quickly went though the process of building an spoken English assessment
application using AISpeech API clouding service, in additon to their ASSDK
(ActionScript 3.0).
In this article, we are going to build up a more completed version. We will
explore almost every main methods ASSDK has provided, in addition to looking at
those ASSDK returns and responses.
The source code of this tutorial can be found from github.
Target
We are to build a spoken English assessment application. It is a Flex
application. The UI would like Figure 1.
Figure 1 Application wireframe
Score display
This time, we colour each word of the reference text to indicate the pronunciation score. Green (#00FF00
) means very good. Yellow (#FFD800
) means good. Orange (#FF6A00
) means bad. Red (#FF0000
) means very bad. We give each word a <span>
HTML tag, and assign its color attribute. We then show the HTML contents using Flex TextArea
(spark) component.
We start with a Flex TextArea
(spark) component:
<s:TextArea id="txtScores" fontSize="30" editable="false"/>
We then add the following scripts:
import spark.utils.TextFlowUtil;
private var test:String =
"<span color='#00ff00'>I </span> " +
"<span color='#ffd800'>like </span> " +
"<span color='#ffd800'>to </span> " +
"<span color='#ff6a00'>play </span> " +
"<span color='#ff0000'>piano </span> ";
private function init():void
{
txtScores.textFlow = TextFlowUtil.importFromString(test);
}
We need to assign the init()
method to the application’s creationComplete
callback method. Also note the space we append to each word. It shows as:
Figure 2 Colourful score output
The other controls
As the wireframe shown in Figure 1, we need several other controls. The scripts are:
<s:VGroup>
<s:HGroup>
<s:Label text="Reference Text"/>
<s:TextInput id="txtRefText" text="I like to play piano" fontSize="30" width="627"/>
</s:HGroup>
<s:HGroup>
<s:Button label="Start Record" fontSize="20" />
<s:Button label="Stop Record" fontSize="20" />
<s:CheckBox label="Play 'ding'" id="chboxPlayDing" selected="true" />
<s:Button label="Replay the latest" fontSize="20" />
</s:HGroup>
<s:HGroup>
<s:Label text="Scores" />
<s:TextArea id="txtScores" width="666" height="50" editable="false" fontSize="30" />
</s:HGroup>
<s:HGroup>
<s:Label text="RecorderLib returns and events"/>
<s:TextArea id="txtReturns" width="531" height="100"/>
</s:HGroup>
<s:Label text="RecordLib components states"/>
<s:Label text="Microphone: " id="txtMicrophoneState" fontSize="20"/>
<s:Label text="Connection: " id="txtConnectionState" fontSize="20"/>
<s:Label text="Core requester: " id="txtCoreRequesterState" fontSize="20"/>
<s:Label text="Recorder: " id="txtRecorderState" fontSize="20"/>
</s:VGroup>
The completed layout is shown as the following figure.
Figure 3 The completed layout
Record user’s speech using AISpeech API ASSDK
As illustrated in the previous post, AISpeech API ASSDK comes as an AISpeechLib.swc. Set the Flex application link to this library. The following codes prepare and initialise a RecorderLib
instance.
import com.aispeech.RecorderLib;
private static const RECORDERLIB_PARAMS:Object = {
appKey:"your application ID",
secretKey:"your secret Key",
serverList:["rtmp://demo.aispeech.com:80/v2.0/aistream","rtmpt://demo.aispeech.com:433/v2.0/aistream"]
};
private var _coreRequesterParams:Object = {
refText:"past",
rank:4,
coreType:"en.sent.score",
userId:"xxxxxx",
applicationId:"your application ID"
};
private var _recorder:RecorderLib;
private function init():void
{
_recorder = new RecorderLib();
_recorder.init(RECORDERLIB_PARAMS);
}
Several things to note to make this tutorial self-completed, though the following has been covered in the previous tutorial.
RECORDERLIB_PARAMS
is used to initialise a RecorderLib
instance. appKey
and secretKey
come from AISpeech by asking. serverLists
contains a list of AISpeech API clouding access points. _coreRequesterParams
are used to make requests to the speech core. Each record follows one core request. refText
may be updated for each request/record.
Run the application, as shown in the following figure, you should see a dialogue asking for use’s permission to use the microphone device. We said “this prompt is a nice indicator that the RecorderLib
has been loaded successfully”, which is however not very true. We will find out later.
Figure 4 Dialogue asking for user’s permission
We then add a startRecord()
and a stopRecord()
methods as the following:
private function startRecord():void
{
_coreRequesterParams.refText = txtRefText.text;
var recordLength:int = 2500 + txtRefText.text.split(" ").length * 450;
var recorderParams:Object =
{
serverParam:_coreRequesterParams,
recordLength:recordLength
};
_recorder.startRecord(recorderParams);
}
private function stopRecord():void
{
_recorder.stopRecord();
}
Several things to note:
- We update
_coreRequesterParams.refText
by user’s input (txtRefText.txt
) before sending it to _recorder
’s startRecord()
method. - We calculate record length (ms) with respect to the number of words of the reference text.
- We then assign
startRecord()
method to the Start Record button’s click callback, and stopRecord()
method to the Stop Record button’s click callback.
Run the project. Click the Start button. If you hear a “ding”, it means the recording works well now.
Show scores
We will go through three steps in this section: obtain API responses, compose the HTML string, and show scores.
The following scripts give a event handler to parse the score results:
private function coreRequesterEventHandler(event:CoreRequesterEvent):void
{
if (event.type == CoreRequesterEvent.RESULT)
{
var details:Array = event.data.result.details;
trace("length of details: " + details.length);
}
}
The following scripts (inside the init()
method) let the RecorderLib
instance listen to CoreRequesterEvent.RESULT
event. Note that, addEventListener()
method is put right after the RecorderLib
instance is newed, before it’s initialised.
_recorder = new RecorderLib();
_recorder.addEventListener(CoreRequesterEvent.RESULT, coreRequesterEventHandler);
_recorder.init(RECORDERLIB_PARAMS);
We trace out details
’ length in coreResultEventHandler()
method. Debug the project, check the console, and see the details
’ length is 5, given the reference text is “I like to play piano”, which is correct.
Set a breakpoint at the line we call trace()
method. Debug the project again. Speak to microphone and the program stops. As shown in the following figure, score
property of the element of details
array is “4”. Since we set the rank property of _coreRequesterParams
object as 4, now the speech core returns scores in 4-level. 4 is the highest score while 1 is the lowest.
Figure 5 Results
The following codes give a helper function, composing the HTML string, which colourly labels the word with respect to its score, respectively. Remember the rule: red means very bad, while green means very good. There are orange and yellow in between.
private function colourfulScoreHelper(refText:String, details:Array):String
{
var colourArray:Array = ["#FF0000", "#FF6A00", "#FFD800", "#00FF00"];
var refTextArray:Array = refText.split(" ");
var htmlString:String = "";
var temp:String = "";
for (var i:int = 0; i < refTextArray.length; i ++)
{
temp = "<span color='" +
colourArray[details[i].score - 1] +
"'>" +
refTextArray[i] +
" </span>";
htmlString += temp;
}
return htmlString;
}
The idea is that: we use word’s score as the index to select colour value from colourArray
. The rest are simple string concatenation. We may further make colourArray
a constant member variable.
Go back to coreRequesterEventHandler()
method. The following codes call the HTML string helper and update txtScores
TextArea component.
private function coreRequesterEventHandler(event:CoreRequesterEvent):void
{
if (event.type == CoreRequesterEvent.RESULT)
{
var details:Array = event.data.result.details;
var htmlString:String = colourfulScoreHelper(_coreRequesterParams.refText, details);
txtScores.textFlow = TextFlowUtil.importFromString(htmlString);
}
}
The project should work and show colourful and meaningful speech assessment results.
So far, we have almost repeated what we’ve done in the previous tutorial. But this time, we implemented a new score view, and tried 4-level-scores. In the next section, we will look into RecorderLib
a bit further.
RecorderLib components
Although RecorderLib
exposes only a handful methods and properties, the inside is quite complex. To put it simple, RecorderLib
contains four major components: microphone, recorder, connection, and core requester. We won’t dig into their implementation. (Actually, AI Speech Ltd and his engineers have planned to open source the ASSDK, perhaps after the next major update.) In the demo, we will see what else information, rather than core results we’ve coped with, RecorderLib
would return.
We start from method returns.
According to ASSDK documents, methods such as init()
and startRecord()
all returns a StatusCode
. We now show them in txtReturns
, a TextArea instance we’ve defined in the layout.
We first give a function to append string to txtReturns
:
private function appendReturns(message:String):void
{
txtReturns.text += (message + "\n");
}
We then apply appendReturns()
method to the call of RecorderLib
’s init()
, startRecord()
and stopRecord()
method (so far we’ve known only these three methods). E.g.:
appendReturns(_recorder.init(RECORDERLIB_PARAMS));
Run the application, and play with it (e.g. click the Start Record and the Stop Record buttons). This time, the “RecorderLib returns and events” shows some “logs”. Most of these logs are StatusCode
, or numbers. According to ASSDK documents, I list several StatusCode
and its meaning as the following:
- 50004: Successful
- 50005: Parameters error
- 50003: Microphone device not available
- …
We now know RecorderLib
methods return <code>StatusCode
. Let’s look at what events RecordeLib
dispatches.
We’ve known CoreRequesterEvent.RESULT
. This event is vital to catch API response and to parse results (scores). RecorderLib
dispatches many other events. The following is a full list:
- FactoryEvent
- NetEvent
- CoreRequesterEvent
- RESULT
- EXCEPTION_TIMEOUT
- EXCEPTION_PARAMETERS_ERROR
- EXCEPTION_RESPONSE_ERROR
- MicrophoneDeviceEvent
- MIC_ALLOWED
- EXCEPTION_MIC_NOT_ALLOWED
- EXCEPTION_MIC_NOT_FOUND
- RecorderEvent
- RECORD_STARTED
- RECORD_STOPPED
- REPLAY_STARTED
- REPLAY_STOPPED
- RECORDID_GOT
- EXCEPTION_NO_RECORD
Note that, RecorderLib
has a factory component, which initialises RecorderLib
instance, connects to the server, and automatically wires the aforementioned four major components. Before the factory finishes its job, RecorderLib
instance should not make records or requests the speech core. So there comes FactoryEvent.READY
event. That’s why we pointed out that, seeing the dialogue asking for uers’ permission doe not necessarily means the RecorderLib
instance has been successfully initialised. Event won’t lie.
In the following script, we add several event listeners to listen to all defined events. Note that, all the above listed events extend from an AIEvent
class. Except CoreRequesterEvent.RESULT
, we use only one event handler to process the other events: print them out. This is for demo purpose only. In real application, each event should be handled carefully. E.g., RecorderEvent.RECORD_STARTED
and RecorderEvent.RECORD_STOPPED
can be combined to control a recording progress bar. For another instance, RecorderEvent.RECORDID_GOT
can be used to retrieve recordId
of the local cached records.
_recorder = new RecorderLib();
_recorder.addEventListener(CoreRequesterEvent.RESULT, coreRequesterEventHandler);
_recorder.addEventListener(FactoryEvent.READY, eventHandler);
_recorder.addEventListener(FactoryEvent.EXCEPTION_TIMEOUT, eventHandler);
_recorder.addEventListener(NetEvent.EXCEPTION_CLOSED, eventHandler);
_recorder.addEventListener(CoreRequesterEvent.EXCEPTION_PARAMETERS_ERROR, eventHandler);
_recorder.addEventListener(CoreRequesterEvent.EXCEPTION_RESPONSE_ERROR, eventHandler);
_recorder.addEventListener(CoreRequesterEvent.EXCEPTION_TIMEOUT, eventHandler);
_recorder.addEventListener(MicrophoneDeviceEvent.MIC_ALLOWED, eventHandler);
_recorder.addEventListener(MicrophoneDeviceEvent.EXCEPTION_MIC_NOT_ALLOWED, eventHandler);
_recorder.addEventListener(MicrophoneDeviceEvent.EXCEPTION_MIC_NOT_FOUND, eventHandler);
_recorder.addEventListener(RecorderEvent.EXCEPTION_NO_RECORD, eventHandler);
_recorder.addEventListener(RecorderEvent.RECORD_STARTED, eventHandler);
_recorder.addEventListener(RecorderEvent.RECORD_STOPPED, eventHandler);
_recorder.addEventListener(RecorderEvent.RECORDID_GOT, eventHandler);
_recorder.addEventListener(RecorderEvent.REPLAY_STARTED, eventHandler);
_recorder.addEventListener(RecorderEvent.REPLAY_STOPPED, eventHandler);
appendReturns(_recorder.init(RECORDERLIB_PARAMS));
Please note, we add event listeners before initialising the RecorderLib
instance.
eventHandler()
method simply appends event.type
to txtReturns
TextArea:
private function eventHandler(event:AIEvent):void
{
appendReturns(event.type);
}
Now, we can see many more log outputs. e.g.
microphone.device.exception.mic.not.allowed
[{"message":"","statusCode":"50004"}]
factory.ready
microphone.device.mic.allowed
recorder.recordID.got
recorder.record.started
[{"message":"start recording","statusCode":"50004"}]
recorder.record.stopped
[{"message":"stop record","statusCode":"50004"}]
Besides method returns and events, RecorderLib
also allow application to peek its components’ states.
In the following codes, we setup a timer, automatically check the RecorderLib
instance’s states, and print them to the screen. First is a checkStates()
method:
private function checkStates(event:TimerEvent):void
{
txtMicrophoneState.text = "Microphone: " + _recorder.microphoneDeviceState;
txtRecorderState.text = "Recorder: " + _recorder.recorderState;
txtConnectionState.text = "Connection: " + _recorder.connectionState;
txtCoreRequesterState.text = "Core requester: " + _recorder.coreRequesterState;
}
And then in init()
method, we setup a timer:
_timer = new Timer(2000);
_timer.addEventListener(TimerEvent.TIMER, checkStates);
_timer.start()
Note that, _timer
is a Timer
instance, defined as a private member variable.
Now, the application checks the RecorderLib
instance’s components states every 2 seconds, and shows them (as String
) as the following figure:
Figure 6 Component states
Two more RecorderLib functions.
We want to listen to the latest record. Assign the following code to the “Replay the latest” button’s click callback:
appendReturns(_recorder.startReplay({}));
Several notes:
- We call
startReplay()
method to replay. We give this method an empty object “{}
” in order to play the latest local cached records. - We wrap the
startReplay()
method with appendReturns()
, so that we can see its returns on the screen.
Recall the Figure 1. In that wireframe, we designed a checkbox saying “Play ding”. We notice RecorderLib
plays a “ding” to notify user to start to speak. This is a good voice user interface design that system should response user’s action (clicked the Start Record button). This is also necessary to make sure that user speaks after the real recording function starts. Nevertheless, if you do not want to play “ding”, then feel free to do so. However, it is a best practice to response user’s action, so maybe a recording progress bar is a good complementation.
Recall startRecord()
method, we set two properties (serverParam
, and recordLength
) for recorderParam
variable. RecorderLib.startRecord()
method accept another property: playDing
. See the following code:
private function startRecord():void
{
_coreRequesterParams.refText = txtRefText.text;
var recordLength:int = 2500 + txtRefText.text.split(" ").length * 450;
_playDing = chboxPlayDing.selected;
var recorderParams:Object =
{
serverParam:_coreRequesterParams,
recordLength:recordLength,
playDing:_playDing
};
appendReturns(_recorder.startRecord(recorderParams));
}
Conclusion
All right, it is a long article. But we have all done now.
In this tutorial, before Section “RecorderLib components”, comparing to the
previous tutorial, there are two new things:
- We use Flex TextArea to render a HTML string to show scores in colour.
- We ask speech core to return 4-level scores.
In Section “RecorderLib components” and onwards, we capture RecorderLib
’s
method returns, events, and component states. We simply print them on the
screen. But this information is important to build a error-proof and user
friendly application.
As promised, I’ve asked AI Speech Ltd for an English version of ASSDK
documents. As far as I know, at the time of writing this article, they are
adding ASDoc comments to the sources. I may upload a PDF version somewhere.
Also, the source code of this tutorial can be found from github.
Thanks for reading.
History
This is the first edition of this article.