Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C#4.0

Fun with Google Speech Recognition Service

4.14/5 (7 votes)
18 Oct 2024CPOL2 min read 82.6K   10.4K  
Create text issues in Redmine by using Google speech recognition service
Screenshot

Introduction

I was excited to discover open web services like Google has, and it was very amazing when I heard about Google speech recognition.

In this article, I write some tips to use Google speech recognition API in Windows application with direct recording voice from audio input devices. And also, like a delicious spice - wear simple program for speech recognition into the utility for quick issues adding in Redmine project.

Background

The basic idea was: you push the button, some timer starts elapse together with wave-in device opening, main loop starts and pcm data from buffers with your voice records to file, timer stops and audio file is posted to Google for recognition.

First task was in understanding flac encoding in realtime, you can tell 'In *nix, I can write couple commands in terminal and do all: record, encode, post flac file and receive answer from server. So, why do you not encode file with encoder program started after recording wave file?' - because it's boring, just imagine: your program writes already prepared flac audio file!

From the time then I wrote some application for batch converting mp3 files to OGG/Vorbis, I have stayed library that can encode pcm to vorbis in realtime, there also was ring buffer for that.

At that point, the appropriate handler for the flac did not wait. You might know that Google accepts flac in 16 kHz and 16 bit per sample with 1(mono) channel format. By using example in libflac, I add three functions: InitialiseEncoder, ProcessEncoder, CloseEncoder which are, respectively: open file and prepare encoder, upload to encoder 16bit pcm samples, close file and destroy encoder. One thing: don't understand why it can't add metadata to flac file? Maybe charset problems?

The wonderful article: WaveLib, which has wave-in API implementation included, that uses Recorder class: starts the WaveInRecorder and in parallel uses thread for transmitting pcm data to encoder.

File Uploading

The basic upload function usage is below, change lang parameter optionally:

C#
string result = WebUpload.UploadFileEx(flacpath, 
	"http://www.google.com/speech-api/v1/recognize?lang=ru&client=chromium",
	"file", "audio/x-flac; rate=16000", parameters, null);

Response from server is received in JSON format.

Issue Creating

In which case can you use the speech recognition? Maybe for issue creating? Maybe it is not practical, but certainly funny.

The Redmine web application includes REST web service. By it, we can create issues as much as we need to, just specify project and tracker, by the way the list of trackers I could only get younger version 1.3*.

C#
RedmineManager manager = new RedmineManager(Configuration.RedmineHost,
    Configuration.RedmineUser, Configuration.RedminePassword);
    
// New ISSUE
var newIssue = new Issue
{
    Subject = Title,
    Description = Description,
    Project = new IdentifiableName() { Id = ProjectId },
    Tracker = new IdentifiableName() { Id = TrackerId }
};
// GET ID OF CURRENT USER
User thisuser = (from u in manager.GetObjectList
		<user>(new System.Collections.Specialized.NameValueCollection())
                 where u.Login == Configuration.RedmineUser
                 select u).FirstOrDefault();
if (thisuser != null)
    newIssue.AssignedTo = new IdentifiableName() { Id = thisuser.Id };
    
manager.CreateObject(newIssue);

Points of Interest

When it was over, I drew attention to record timeout, it gives you 4 secs for your speech: not for all expressions of it may be appropriate, form maybe needs some stop button?

Ring buffer will save you from data loss in case of such records directly to flac. When the data comes from the wave-in, they go in the ring buffer.

History

  • February 28, 2012: First version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)