How to Write Your Own Siri Application (Mobile Assistant Application)

Yildirim Kocdag

4.72/5 (36 votes)

29 Nov 2013CPOL10 min read

165.7K

This article helps you to understand how you can write your own Siri application.

Download SiriUI.zip - 1.1 MB

Introduction

This article helps you to understand how you can write your own Siri application. I have already had a responsibility to develop an Android Siri application last year. It is complete and now in the Google Store. I will try to write my experiences while I did it.

What is a Mobile Assistant Application?

A mobile assistant application should consist of the below functions:,

It should be a mobile application (Android, IOS, Windows Phone Application etc.),
You can ask written or vocal questions,
You can get response written, vocal, graphical or activity for your questions,
It should use mobile device skills and abilities such as microphone, screen, GPS, internet, speaker, and your information stored in device.

What a Mobile Assistant Application can Do

A mobile assistant application can do a lot of features, the first version of the mobile assistant application that I developed could understand and respond only 15 commands. Now it can understand and response more than 50 commands. The basic command types should be about news, weathers, set alarm and call a contact. While I search the mobile assistants in mobile markets I found out above commands are common. What is more you can add the below commands as a set of your advance command list to your mobile assistant.

Set alarm,
Get Info about news, weather, match scores, wiki infos,
Run an application,
Open a media File (Video, Music),
Share something on Facebook or twitter, etc,
Read/Write SMS or Email,
Read some shared feeds on Social Media,
Find the nearest Market, Pharmacy, Hospital, Restaurant, etc,
Call someone,
Do basic Mathematical problems,
Check your bank balance,
Make a Money Transfer to someone,
Check latest currency or stock exchange,
Read/Set Calendar,
Buy a concert or travel ticket,
Etc.

Some of the command types that can be implemented only with third party company integrations. For instance you can make an integration with Amazon or best buy to order an item with your mobile assistant.

Mobile Assistants in the Market

There are more than 60 known mobile assistants in markets. Popular ones are Siri and Google Voice Search.

Here is a list of mobile assistants in markets and mobile assistant development environments,

Siri,
Google Voice Search,
Nuance Nina,
Dragon Mobile Assistant,
Angel Lexee,
AIVC,
Iris,
Skyvi,
EverFriends,
EasyLuncher,
Speaktoit,
Evi,
Turkcell Mobil Asistan(Turkish).

Siri and Google Voice Search are popular ones, I will share some information and video links about Nina, Lexee, Dragon Mobile Assistant and Turkcell Mobil Asistan.

Nuance Nina: Nuance company offers to large enterprise organizations a SDK to develop their own mobile assistant application which can be used as customer service application. It is a SDK that can be integrated to IOS and Android Application. You can get more information in their website Meet Nina .

I like the video that introduce the Nuance Nina in Youtube.

Lexee: Lexee is the mobile assistant of Angel Labs Company. Lexee offers a web environment to create your own mobile assistant also. You can add, update and delete your scenarios without coding via this web interface. The other point about Lexee is Analyze tools, Angel Labs are good at analyzing tools. Lexee environment offers professionals a variety reports and data about usage.

You can get more information and watch the video via this link.

Dragon Mobile Assistant: Dragon Mobile Assistant is also a product of Nuance Company. Dragon Mobile Assistant offers users speak naturally to access wide range of content and do the everyday task on the their phone easily. You can get more information via this link.

You can download the application and watch my favorite mobile assistant video by clicking here.

Turkcell Mobil Asistan: Turkcell Mobil Asistan is the only one Turkish Mobile Assistant in Google Play. Turkcell is one of the biggest GSM companies in Europe. Via this application you can get customer care service such as your phone bill details, tariff info. In addition to this you can ask some info about news, whether, currency, traffic in Istanbul.

To get more information and download Turkcell Mobil Asistan click here.

Technologies in Mobile Assistants

I hope above information would be helpful to understand the basic concepts of mobile assistants. Lets look at some technical points about the applications. A mobile assistant application should have the below Technologies,

Speech to Text (STT) Engine,
Text to Speech (TTS) Engine,
Tagging (Intelligence),
Noise Reduction Engine,
Voice Biometrics,
Speech Compression Engine,
UI for Call Outs.
STT: Speech2Text engine should get the voice from a user then convert it to text. The voice could be a voice file or a stream.
TTS: Text2Speech engine should convert text to voice. It is important for a user that listen the response while for example the user drives.
Tagging: The text which is created via STT is not always simple, The tagging technology should tag the text as what is the user wants via that speech. For Example, user asks what should I wear tomorrow, then the tagging engine can tag the information with weather or calendar info tag.
Noise Reduction Engine: User speech is not always simple, there could be some noise (for example, air-condition noise) around. The noise reduction engine should eliminate the white noise from the voice.
Voice Biometrics: Mobile Assistants can give account based information such as credit card monthly report. Therefore authentication is important, Voice biometrics one of the authentication methods. Via voice biometrics technology, the mobile assistant can authenticate you to do system.
Speech Compression Engine: If your assistants works slow, the users can give up quickly about the application and choose to search on web via writing the text. The Internet communication is really important, in addition to this the packet size for the transaction is also important. Small packets can transfer fast, and the result gets fast. That is why, A good mobile assistant application should have a speech compression engine. The client should send the compressed voice to server fast. The compression is different than the normal compression, because there is not so much repeating data in voice files. G711 can be chosen for the compression algorithm, one of the reason for this choice is that the algorithm is not lost the data.
UI for Call Outs: After the server sends result you should play an audio, in addition to this you should show some info on the device screen inside call outs. What I can advice you, using native components can limited your application, if you prefer a web based UI inside native application for call outs, it can be more convenient.

Architecture of Mobile Assistants

Mobile device and main server should have a communication as streaming, because users doesn't like waiting voice data download and slow communication. Being fast is really important for this application, because if it is fast, user feel more nature. User can feel that he is speaking with a real agent or assistant.

When users asks a question from client via clicking a button, client starts streaming the question byte by byte to Main Server. Main server sends the data to STT Server, STT server finds the text of the speech, The text sends to the main server then main server send the text to tagging server to find out what the user wants. Tagging server create a tag for the request. Such as “weather_info” . Tagging server sends the tag to the main server, main server sends the tag to information server, if the tag needs an authentication before the sends information server, security server checks the authentication. At last, the response comes to the main server, main server creates the response text, response graphic and speech text (via in communication TTS Server) and sends the response class to Mobile Device.

Information server can be in communication with 3rd pary servers for some informations that are not stored in Information server. Security server can consists more than one authentication technology such as Voice Biometrics, IMSI-IP Radius Lookup, Account-Password authentication, etc.

Callout UI

If you try to develop your native components for Call Outs, it would be difficult to handle all the formats in client and scroll all items, etc. What I advice you, you can create a custom web view and add your call outs formatted easily.

The picture in left shows how your SiriWebView will be shown in screen. The webview can be scrolled by user, in addition to this when a new callout comes, the web view moves automatically.

In this section I will simply mention how to write your own SiriWebView. Inside the article you will find also a sample project about the webview. Sorry for other platform users, my all examples will be in android platform.

First of all, create a new class and name it SiriWebView. It should be extended from simple android webview. The class should consists constructer and also overided OnDraw function. What is more, we should add two new function to this class one to initialize it, and second one is to add new callout. Code snippet below shows how the add new callout function works.

Java

public void AddNewCallOut(String message, Boolean ismsgResponse) {
		elementId = elementId + 1;
		StringBuilder messageBuilder = new StringBuilder();

		if (!message.contentEquals("")) {

			if (!ismsgResponse) {
				messageBuilder
						.append("<table class='bubble-gray' cellspacing='0' cellpadding='0'><tr><td class='head'></td></tr>");
				messageBuilder
						.append("<tr><td class='mid'><div class='txt shadow'>"
								+ message + "</div></td></tr>");
				messageBuilder
						.append("<tr><td class='foot'></td></tr></table>");
			} else {
				messageBuilder
						.append("<table class='bubble-blue' cellspacing='0' cellpadding='0'><tr><td class='bhead'></td></tr>");
				messageBuilder
						.append("<tr><td class='bmid'><div class='txt shadow'>"
								+ message + "</div></td></tr>");
				messageBuilder
						.append("<tr><td class='bfoot'></td></tr></table>");
			}

			loadUrl("javascript:document.getElementById(\"div" + elementId
					+ "\").innerHTML=\"" + messageBuilder.toString() + "\";");
		}
		StringBuilder jvscr = new StringBuilder();
		if (!ismsgResponse) {
			if (elementId != 1) {
				if (!ismsgResponse) {
					jvscr.append("var elem = document.getElementById('div"
							+ (elementId - 1)
							+ "');     var x = 0;     var y = 0;     while (elem != null) {         x += elem.offsetLeft;         y += elem.offsetTop;         elem = elem.offsetParent;     } ");
					jvscr.append("var endj=500; var i=window.scrollY; for(i=window.scrollY;i<y;i++){ var j=0; var a=0; for(j=0;j<endj;j++) {a=a+1; }  window.scrollTo(x, i); } ");
					loadUrl("javascript:" + jvscr.toString());
				}
			}
		}
	}

The function takes two parameters, they are message and isResponse. You can write your message as string and set the value of isResponse parameter to call function when you want to add new callout. IsResponse parameter shows if the message is response of Assistant or not. That parameter changes the color of callout and slides the scroll. In the first lines of function you can see the elementId Parameter. ElementId is important to slide the objects.

After you create your own component you can add it your main_activity.xml as shown below.

XML

<com.example.siriui.SiriWebView
    android:id="@+id/webview"
    android:layout_width="fill_parent"
    android:layout_height="fill_parent"
    android:keepScreenOn="true"
  android:layout_marginTop="0dp"
  android:layout_gravity="fill"
    android:layout_marginBottom="0dp"
    android:layout_marginLeft="0dp"
    android:layout_marginRight="0dp"
    android:scrollbars="horizontal"
     />

You can find out a working example of this component in this article.

Audio Compression

Audio compression reduces the size of audio data. The compressed audio data can be transferred more quickly via GSM Network. The compression type can be lossy and lossless.

Lossy: The method can reduces the amount of data during coding process. However, the retained data acceptable for recognition.The advantage of lossy method is that the data can be smaller.

Lossless: Via this method, the audio can be compressed without losing its original quality. It is important if the recognition or recording tools dont have any noise reduction process.

Some of data reduction does not effect directly the quality of speech data. Simply, if the recorded audio data will be used for speech recognition, The data which is not useful for speech recognition can be reduced. Human hearing sensivity is in 20 Hz - 20 KHz audiable frequency. The Outer of the range can be removed.

G.711: You can use G.711 standard for audio compression. The compression method is lossless one. It can compress your data as much as 50 percent. You can download the java source code of G711.java via this link ( https://code.google.com/p/sipdroid/source/browse/trunk/src/org/sipdroid/media/G711.java?r=386 ).

Other methods can be used are, MPEG-1 Layer III (MP3), MPEG-1 Layer II Multichannel, MPEG-1 Layer I, AAC, HE-AAC, MPEG Surround ,MPEG-4 ALS, MPEG-4 SLS, MPEG-4 DST, MPEG-4 HVXC, MPEG-4 CELP, USAC, G.718, G.719, G.722, G.722.1, G.722.2, G.723, G.723.1, G.726, G.728, G.729, G.729.1, Speex, Vorbis, WMA, Codec2 .

Revision History

I will add example code snippets about compression, streaming, playing buffer, Call Out UI, tagging, TTS and STT which can help programmers handle some difficult points.

18/04/13: Callout UI has been added to the article.

30/11/13: Audio Compression has been added to the article.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)