Introduction
I was excited to discover open web services like Google has, and it was very amazing when I heard about Google speech recognition.
In this article, I write some tips to use Google speech recognition API in Windows application with direct recording voice from audio input devices. And also, like a delicious spice - wear simple program for speech recognition into the utility for quick issues adding in Redmine project.
Background
The basic idea was: you push the button, some timer starts elapse together with wave-in device opening, main loop starts and pcm data from buffers with your voice records to file, timer stops and audio file is posted to Google for recognition.
First task was in understanding flac encoding in realtime, you can tell 'In *nix, I can write couple commands in terminal and do all: record, encode, post flac file and receive answer from server. So, why do you not encode file with encoder program started after recording wave file?' - because it's boring, just imagine: your program writes already prepared flac audio file!
From the time then I wrote some application for batch converting mp3 files to OGG/Vorbis, I have stayed library that can encode pcm to vorbis in realtime, there also was ring buffer for that.
At that point, the appropriate handler for the flac did not wait. You might know that Google accepts flac in 16 kHz and 16 bit per sample with 1(mono) channel format. By using example in libflac, I add three functions: InitialiseEncoder
, ProcessEncoder
, CloseEncoder
which are, respectively: open file and prepare encoder, upload to encoder 16bit pcm samples, close file and destroy encoder. One thing: don't understand why it can't add metadata to flac file? Maybe charset
problems?
The wonderful article: WaveLib, which has wave-in API implementation included, that uses Recorder
class: starts the WaveInRecorder
and in parallel uses thread for transmitting pcm data to encoder.
File Uploading
The basic upload function usage is below, change lang parameter optionally:
string result = WebUpload.UploadFileEx(flacpath,
"http://www.google.com/speech-api/v1/recognize?lang=ru&client=chromium",
"file", "audio/x-flac; rate=16000", parameters, null);
Response from server is received in JSON format.
Issue Creating
In which case can you use the speech recognition? Maybe for issue creating? Maybe it is not practical, but certainly funny.
The Redmine web application includes REST web service. By it, we can create issues as much as we need to, just specify project and tracker, by the way the list of trackers I could only get younger version 1.3*.
RedmineManager manager = new RedmineManager(Configuration.RedmineHost,
Configuration.RedmineUser, Configuration.RedminePassword);
var newIssue = new Issue
{
Subject = Title,
Description = Description,
Project = new IdentifiableName() { Id = ProjectId },
Tracker = new IdentifiableName() { Id = TrackerId }
};
User thisuser = (from u in manager.GetObjectList
<user>(new System.Collections.Specialized.NameValueCollection())
where u.Login == Configuration.RedmineUser
select u).FirstOrDefault();
if (thisuser != null)
newIssue.AssignedTo = new IdentifiableName() { Id = thisuser.Id };
manager.CreateObject(newIssue);
Points of Interest
When it was over, I drew attention to record timeout, it gives you 4 secs for your speech: not for all expressions of it may be appropriate, form maybe needs some stop button?
Ring buffer will save you from data loss in case of such records directly to flac. When the data comes from the wave-in, they go in the ring buffer.
History
- February 28, 2012: First version