Jan-Lukas Else

Thoughts of an IT expert

Using the Google Text-to-Speech API to create audio versions for blog posts

Published on in 👨‍💻 Dev
Short link: https://b.jlel.se/s/2ee
⚠️ This entry is already over one year old. It may no longer be up to date. Opinions may have changed. When I wrote this post, I was only 20 years old!

Ashwíṅ Víshnú asked how I created the audio version for my latest post. Here’s a way how you can use the Google Text-to-Speech API to create MP3s from text. A simple way to try this API without a Google Cloud account is to follow these steps:

  1. Open this page in a new (private) Firefox tab.
  2. Open the Web Developer tools and go to the Network tab.
  3. On the page enter the text to synthesize into the text area and choose your config.
  4. Click Read it.
  5. Find the request to https://cxl-services.appspot.com/proxy?url=https://texttospeech.googleapis.com/v1beta1/text:synthesize&token=... and copy it as a curl command and paste it into a shell script.
  6. Append the following code:
#!/bin/sh
curl ... | jq --raw-output '.audioContent' | base64 --decode > audio.mp3
  1. Make the script executable and execute it. It should create an audio.mp3 file.

A warning though: That’s probably not the intended way to use this API. If you want to use it for more than just trying it, use the Google Cloud Console to activate the API and then use the API how it’s documented here. It has a monthly free limit that should be enough for a few blog posts.

The WaveNet voices sound amazingly real and offer a great way to have articles read automatically instead of recording an audio yourself.

Tags: , ,

Jan-Lukas Else
Interactions & Comments