Scotty: "Computer... Computer..."
(McCoy hands Scotty the mouse)
Scotty: "Aye. Hello computer."
Nichols: "Just use the keyboard."
Scotty: "Keyboard. How quaint."
-- Star Trek IV: The Voyage Home
With the advent of "Siri", Apple’s voice-recognition system
for controlling (more or less) the iPhone via voice, the idea of being able to
control a device or phone via voice commands leapfrogged into the stratosphere
of "coolness". Not that voice interaction is a new idea, mind you: What
developer hasn’t imagined having conversations with her computer the same way
Tony Stark does with JARVIS in the movies or comic books; or, if you’re more of
the old school science fiction fan, your favorite Captain conversing with his Enterprise?
Fortunately, the Android OS provides some baked-in speech
recognition/speech-to-text capability, and accessing it is pretty
straightforward. Unfortunately, figuring how and when to use that functionality
is still something of an art—for example, trying to know when to start
listening to the user’s voice and process the associated commands can be
something of a user interface nightmare. If you force the user to push a button
on the screen first, you lose the "hands free" capability that makes
voice-driven commands so attractive; if you simply start listening to the audio
input from the moment your application begins execution, not only do you start
to run into problems with interacting with the Android activity lifecycle (do
you keep reading the input channel if they switch away from your app?), but it
can be nigh-impossible to distinguish between commands intended for the
application, as opposed to those intended for the other people in the room (or
in the other cars on the highway).
In truth, there’s a pretty simple signal to know when the
user wants to start using the application: when they pick up their phone, or
for a lot of users, when they pick up their headset and put it on. In fact,
that latter event is an even stronger signal than the former—many users pick up
their phone for a variety of tasks that have nothing to do with voice input,
while the headset really has only one use, and all of it has to do with audio
input or output. So, if we could somehow get the signal that the user was
starting to use the headset, the voice-controlled application could begin its
audio stream analysis. Or, should the user already be wearing the headset, the
user could push a button on the headset to "begin".
While this is certainly likely to only be the first of many,
the Voyager Legend UC® headset from Plantronics gives exactly this
kind of information. Plantronics sent me a unit to explore.
Bluetooth, Android, and You
As we talk about programming the headset on an Android
device, I’m going to assume that readers are familiar with the basics of
Android programming, meaning that Activities, Handlers and Intents shouldn’t be
foreign concepts. If you’re not comfortable yet with Android, it’s a fairly
easy OS for a developer to pick up—tutorials on the Web abound, and only a
rudimentary knowledge of the Java language is required.
The short version of the story is that once paired with an
Android device (and bear in mind, this can be either phone or a tablet, which
opens up some interesting possibilities for non-phone-related applications),
the Voyager Legend sends several bits of information over Bluetooth to the paired
Android device. These events range from "don" and "doff" events (meaning the
user has "donned" the device—putting it on—or "doffed" the device, taking it
off), to button-press events, to different sensor responses that the device
picks up.
Thus, the first step in any sort of headset-aware
application is to recognize the paired headset in your application, and ask
Android to begin sending you events that the device receives over Bluetooth. In
Android, receiving events from other parts of the device is done using a
BroadcastReceiver
; this is essentially a superclass whose onReceive()
method
receives all the messages destined for that BroadcastReceiver (filtered through
an IntentFilter
). Once that’s established, it’s a trivial matter to start
watching the event stream, looking for particular events—notifications coming
over Bluetooth from the device, usually device-specific unless we’re talking
about generic "paired" and "not-paired" events—and reacting according to the
application’s needs.
The Voyager Legend headset includes two buttons, one called the
"Voice" button and the other the "Call" button—when used normally, they’re used
to talk to the headset (press the "Voice" button, and it will tell you how much
talk time is left on the headset, which is a nifty little feature, if you ask
me) and to answer an incoming phone call, respectively. However, from the point
of view of the developer, they’re each just a button, and we can repurpose them
as we need to.
The exact nature of the data sent by the device to Android,
and from Android to the application, is described in Plantronics documentation.
It’s a set of codes and strings that are particular to the Plantronics device,
and sorting through all of them to figure out exactly what’s being sent can be
tricky. Fortunately, Cary Bran, a Plantronics evangelist, has written some
example code up on the Plantronics developer forum that provides an example
"event-streamer" application that demonstrates the different events that are
sent, and provides two classes, the PlatronicsReceiver
(the
BroadcastReceiver
-inheriting class) and a simple wrapper for the messages it
fires, PlantronicsXEventMessage
. See http://developer.plantronics.com/blogs/Cary/2012/11/26/plugging-into-plantronics-headset-sensor-events-via-android
for details.
Events, please
Receiving events from the Android OS, such as Bluetooth
events, involves the use of a BroadcastReceiver
-derived class, and must be
registered with the Android OS, so the OS knows to send events (Intent objects)
to the BroadcastReceiver
. This registration can come in two forms—one, where
the BroadcastReceiver
lives on outside of your Android application process,
requires the BroadcastReceiver
to be registered in the AndroidManifest.xml
file, while the other, where the BroadcastReceiver
is passed in a
registerReceiver()
method, means the BroadcastReceiver
only receives events as
long as your Android process is running, and requires no manifest entry. Thus,
the first thing the application will need to do is create one of these custom
BroadcastReceivers
(the PlantronicsReceiver
, below) and register it for
incoming Bluetooth events:
private void initBluetooth() {
handler = new BluetoothHandler();
receiver = new PlantronicsReceiver(btHandler);
intentFilter = new IntentFilter();
intentFilter.addCategory(
BluetoothHeadset.VENDOR_SPECIFIC_HEADSET_EVENT_COMPANY_ID_CATEGORY +
"." +
BluetoothAssignedNumbers.PLANTRONICS);
intentFilter.addAction(BluetoothDevice.ACTION_ACL_CONNECTED);
intentFilter.addAction(BluetoothDevice.ACTION_ACL_DISCONNECT_REQUESTED);
intentFilter.addAction(BluetoothDevice.ACTION_ACL_DISCONNECTED);
intentFilter.addAction(
BluetoothHeadset.ACTION_VENDOR_SPECIFIC_HEADSET_EVENT);
intentFilter.addAction(BluetoothHeadset.ACTION_AUDIO_STATE_CHANGED);
intentFilter.addAction(BluetoothHeadset.ACTION_CONNECTION_STATE_CHANGED);
registerReceiver(receiver, intentFilter);
}
Aside from the Bluetooth-specific parts of the IntentFilter
,
this is a pretty normal BroadcastReceiver
implementation. The actions in the
IntentFilter specify which actions the BroadcastReceiver will be open to
receiving (we don’t want to be flooded with every message sent to anywhere on
the device), and then it’s passed to registerReceiver()
, opening the PlantronicsReceiver
for business.
(The details of how the PlantronicsReceiver
unpacks the
packet of information sent across Bluetooth is really beyond the scope of this
article, but it’s not too difficult to reverse-engineer from the code that Cary
posted. Said simply, it’s all packaged up in the Intent, and the extra data
comes as "event extras" in that Intent, and the PlantronicsReceiver
unpacks the
"event extras" to discover the additional information, such as which button was
pressed and how, to set those as properties on the PlantronicsXEventMessage
instance that it creates.)
Just Handle it
Notice that the PlantronicsReceiver
takes a Handler as its
constructor argument—this is a Handler-extending class that handles the
messages being sent from the BroadcastReceiver
. This Handler is the ultimate
recipient of the message, and will be supplied by the application. It’s in here
that the application receives the PlantronicsXEventMessage
, determines the type
of event (DON, DOFF, BUTTON, or whatever else), and extracts any additional
information that comes with the event. For example, BUTTON events will come
with three additional properties on the message: "buttonId
", describing which
button was pressed, "buttonName
", the name of said button, and "pressType
",
indicating the kind of press (short or long) registered. BATTERY events, on the
other hand, will come with properties like "level", describing the charge level
of the headset, "charging", a "true"/"false" value indicating whether the
headset is plugged in and charging, and "minutesOfTalkTime
", which is pretty
self-explanatory.
The PlantronicsReceiver
class is the final arbiter on what
data is sent in the PlantronicsXEventMessage
, so check that code for details.
A Handler, then, will receive these message objects in its
handleMessage()
method, and "unpack" them like so:
@Override
public void handleMessage(Message msg) {
switch (msg.what) {
case PlantronicsReceiver.HEADSET_EVENT:
PlantronicsXEventMessage message =
(PlantronicsXEventMessage) msg.obj;
String type = message.getEventType();
if (type.equals(PlantronicsXEventMessage.BUTTON_EVENT)) {
}
if (type.equals(PlantronicsXEventMessage.BATTERY_EVENT)) {
}
break;
default:
break;
}
}
Note that the use of the new "static import" facility in
Java 6 can be used to reduce the verbosity of those typechecks.
Dictaphone
Say, for example, we want to create a Dictaphone kind of
application. (Readers under the age of 40 may not know this, but back in
ancient days of yore, back when dinosaurs roamed the Earth, and devices had to
be tethered via wires in order to operate, a "dictaphone" was a device
designed to record the human voice onto some kind of storage medium—early ones
actually used wax cylinders.) The application flow would be something like
this: if the application is running and if a Voyager Legend is paired to the device, then
we wait for a "button" event.
Once the button has been pressed, we can immediately kick
the device into speech-to-text mode, listening to the incoming audio stream:
public class BluetoothHandler extends Handler {
@Override
public void handleMessage(Message msg) {
switch (msg.what) {
case PlantronicsReceiver.HEADSET_EVENT:
PlantronicsXEventMessage message =
(PlantronicsXEventMessage) msg.obj;
String type = message.getEventType();
if (type.equals(PlantronicsXEventMessage.BUTTON_EVENT)) {
Toast.makeText(getApplicationContext(),
"Listening....",
Toast.LENGTH_SHORT).show();
Intent intent = new Intent(
RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL,
"en-US");
try {
startActivityForResult(intent, RESULT_SPEECH);
textPane.setText("");
} catch (ActivityNotFoundException a) {
Toast.makeText(getApplicationContext(),
"Ops! Your device doesn't support Speech to Text",
Toast.LENGTH_SHORT).show();
}
}
break;
default:
break;
}
}
}
When that Activity stops, Android will have listened to the
audio stream and made its best-guess interpretation of that spoken word,
returning an ArrayList<String>
of possibilities (the most likely one
being the first) that we can then extract and append to the text display in the
middle of the main Activity. Doing so, however, isn’t just a matter of
capturing the return value from a method call; in Android, when one Activity
wants the result from another, the requesting Activity launches the second
using startActivityForResult()
, passing in the Intent used to launch the second
Activity (which in this case, is the Intent designed to launch the built-in
speech recognition Activity, as obtained from RecognizerIntent
, above) and a
"result code" that will be used to identify this particular exchange, which in
this case is a simple constant called RESULT_SPEECH
(with a value of "1"). When
the speech-recognition Activity finishes, it will fire an activity result back
at this Activity, which will trigger the onActivityResult()
method, and will
include a request code (RESULT_SPEECH
), a result code (defined by Android, and
usually RESULT_OK
, unless something went pear-shaped), and an Intent containing
the Activity’s results (in this case, our collection of Strings representing
the user’s speech):
@Override
protected void onActivityResult(int requestCode,
int resultCode, Intent data) {
super.onActivityResult(requestCode, resultCode, data);
switch (requestCode) {
case RESULT_SPEECH: {
if (resultCode == RESULT_OK && null != data) {
ArrayList<String> text =
data.getStringArrayListExtra(
RecognizerIntent.EXTRA_RESULTS);
textPane.setText(text.get(0));
}
break;
}
}
}
Once the data has been retrieved from the Intent, it’s an
easy matter to set its contents on the TextView (textPane
) in the application.
From here, it would be fairly easy to imagine how this
application could save the note, load other notes, and so on, including voice
commands to do all of the above, but this is the basics.
Summary
Readers may actually be a little surprised at how little
code there actually is here—between the PlantronicsReceiver
interpreting the
Bluetooth data and handing us pretty easy-to-consume message objects, and the
Android OS doing the "heavy lifting" of translating speech into text, we have a
pretty functional application in just three classes and (not counting the code
Cary wrote) about 150 lines of code. That’s a pretty hefty amount of
functionality for fairly small engineering effort, and makes me, at least,
smile in anticipation of the ways this can be used in modern mobile
applications. Enjoy!