Learning to count in Japanese can be a bit unintuitive, especially when you get to the larger numbers (above 10,000), so I thought it’d be cool if I could put together a little webapp that would let the user enter a number, and use Text-to-speech to read it aloud, showing the student how to pronounce it.
It turns out that there are actually a couple of options when it comes to Text-to-speech.
1. (Undocumented) Google Translate API
This appears to be an undocumented little gem that’s mentioned on a few blogs, like here and here. It’s free, anonymous and incredibly easy to set up.
var source = `https://translate.google.com/translate_tts?tl=ja-JP&q=
${encodeURIComponent('konnichiwa')}&client=tw-ob`;
var audio = new Audio(source);
audio.play();
This undocumented API endpoint can be accessed via JavaScript, or a HTML5 audio
tag. There are a couple of requirements to get it to work, though:
- You need to have either a
<meta name="referrer" content="no-referrer">
in your document body, or a rel="noreferrer"
in your Audio
tag. - You need to specify a
client
in the request. The value tw-ob
gets thrown around on Stack Overflow and some other blogs, but it looks like any value will do the trick. - You can specify the language to use using the
tl
parameter. This needs to be a supported language code, which you can find from the Google Cloud Doco.
…which brings us to our second option.
2. Google Cloud API
It turns out that Google does in fact have an official API for text-to-speech, which is much more comprehensive than the undocumented one. The service is free for the first 4 million characters per month for standard-quality voices, after which it’s charged at $4 USD/month. That sounds like quite a lot, but if you get a troublesome user that tries to process a PhD thesis using your service, this could quickly get very expensive.
The client libraries for this service don’t seem to include a browser option, but you could, of course, use the REST API directly at https://cloud.google.com/text-to-speech/docs/reference/rest/v1/text/synthesize.
3. ResponsiveVoice
There are a whole bunch of third-party TTS tools out there, but one interesting option that I found was ResponsiveVoice. They offer a super-simple JavaScript library that appears to be designed for the browser specifically, and they offer a free licence for non-commercial use. For hobby / NFP / educational purposes, this sounds like it could be a good choice.
Which Option Did I Opt For?
Since my application of this service is mostly for fun, albeit with the desire to help out our students, I opted for the simplest solution, which was the undocumented Google Translate implementation. Since this is an undocumented feature, there is the risk that the service may stop in the future. If that happens, I’ll probably move to ResponsiveVoice - though in actuality, I’m pretty keen to try out their service anyway, and maybe migrate the solution over to it if I qualify for the non-commercial free licence. If that happens, I’ll write up my experience in another article!
Anyway, that’s all from me for now. Catch ya!