Jason Sultana

Follow this space for writings (and ramblings) about interesting things related to software development.

Text to Speech with Google Translate

09 Mar 2021 » other

G’day guys!

Most people that know me in person know that I run private Japanese lessons online and from home with my wife, called Westside Japanese Lessons. We had some new students start with us recently, and one of the things that we cover early on is how to count in Japanese. It can be a bit unintuitive, especially when you get to the larger numbers (above 10,000), so I thought it’d be cool if I could put together a little webapp that would let the user enter a number, and use Text-to-speech to read it aloud, showing the student how to pronounce it. If you’re interested, you can check it out at https://westsidejapanese.com.au/practice/numbers.html.

It turns out that there are actually a couple of options when it comes to Text-to-speech.

1. (Undocumented) Google Translate API

This appears to be an undocumented little gem that’s mentioned on a few blogs, like here and here. It’s free, anonymous and incredibly easy to set up.

var source = `https://translate.google.com/translate_tts?tl=ja-JP&q=${encodeURIComponent('konnichiwa')}&client=tw-ob`;
var audio = new Audio(source);
audio.play();      

This undocumented API endpoint can be accessed via JavaScript, or a HTML5 audio tag. There are a couple of requirements to get it to work, though:

  1. You need to have either a <meta name="referrer" content="no-referrer"> in your document body, or a rel="noreferrer" in your Audio tag.
  2. You need to specify a client in the request. The value tw-ob gets thrown around on Stack Overflow and some other blogs, but it looks like any value will do the trick.
  3. You can specify the language to use using the tl parameter. This needs to be a supported language code, which you can find from the Google Cloud Doco.

…which brings us to our second option.

2. Google Cloud API

It turns out that Google does in fact have an official API for text-to-speech, which is much more comprehensive than the undocumented one. The service is free for the first 4 million characters per month for standard-quality voices, after which it’s charged at $4 USD/month. That sounds like quite a lot, but if you get a troublesome user that tries to process a PhD thesis using your service, this could quickly get very expensive.

The client libraries for this service don’t seem to include a browser option, but you could of course use the REST API directly at https://cloud.google.com/text-to-speech/docs/reference/rest/v1/text/synthesize.

3. ResponsiveVoice

There are a whole bunch of third-party TTS tools out there, but one interesting option that I found was ResponsiveVoice. They offer a super-simple JavaScript library that appears to be designed for the browser specifically, and they offer a free licence for non-commercial use. For hobby / NFP / educational purposes, this sounds like it could be a good choice.

Which option did I opt for?

Since my application of this service is mostly for fun, albeit with the desire to help out our students, I opted for the simplest solution, which was the undocumented Google Translate implementation. Since this is an undocumented feature, there is the risk that the service may stop in the future. If that happens, I’ll probably move to ResponsiveVoice - though in actuality, I’m pretty keen to try out their service anyway, and maybe migrate the solution over to it if I quality for the non-commercial free licence. If that happens, I’ll write up my experience in another article!

Anyway, that’s all from me for now. Catch ya!