Why I created a Text-to-Speech Application within 3 hours using GCP Cloud TTS API? โ€” Step-by-Step Guide!๐Ÿ˜„ ๐ŸŒŽ

Someshwaran M
6 min readDec 25, 2022
Text-To-Speech Application with GCP + Python
Text-To-Speech Application with GCP + Python

Have you ever found yourself in a situation where you needed to convert a large amount of text into audio files, but didnโ€™t have the resources or budget to use a professional service? If so, youโ€™ll be glad to know that itโ€™s possible to create your own text-to-speech (TTS) application using the Google Cloud Text-to-Speech API.

Please feel free to check out the complete codebase in my Github. You can thank me later for this! ๐Ÿ™

Let me tell you my motivation behind it, I got a friend who is supporting Visually Challenged for their upcoming exams. They required someone to read their chapters/lessons on their behalf so that they can listen to them or the audio files and appear for the exams. My friend had been doing this for years and used to record all the chapters using the Phone Recorder and send them the recorded audio files. Recently, I met a few Visually Challenged and got to know the pain point. Before meeting them, I was in an assumption, that they would typically need someone to dictate and record all chapters through their native accent/voice. But later, I found out that they are very skillful in understanding different accents and there are many volunteers โ›‘ across the globe had helped them already. So, that's when I pulled myself and created this application to process large textbooks of various languages and chapters.

NOTE: It does require some manual intervention and some basic coding knowledge to understand but it is comparatively faster and efficient. You can consider my application is currently in a pre-stage for now.

Be Careful of the Frustration!๐Ÿ’ข

When I checked for Text-to-Speech applications online, there were many but it was expensive and had cost on top of each plan. When I found a FREE plan in an application like murf.ai, I was pushed to upgrade if I wanted to download the audio files. That's when I got exhausted.

The funny part is I did not realize this earlier enough, the moment I saw a FREE plan, I started to upload all the files and tried to process them. Later, post working on it for an hour, I was shocked ๐Ÿ˜ฉ to see I wasn't allowed to download the audio files.

This is when I realized I could use the GCP TTS API to create my own application and it can be modified as I wanted. I could download and upload larger files without any restrictions.

That said, in this article, Iโ€™ll walk you through the process of creating a TTS application using the GCP Cloud TTS API, and explain why this can be a valuable tool for anyone looking to convert text into audio files quickly and efficiently.

--------------------
Directory Structure
-------------------

GCP-TTS-API-and-Python-Script
โ”œโ”€โ”€ assets
โ”‚ โ”œโ”€โ”€ images
โ”‚ โ””โ”€โ”€ images
โ””โ”€โ”€ LICENSE.md
โ””โ”€โ”€ Readme.md
โ””โ”€โ”€ autoexecute.sh
โ””โ”€โ”€ example_input_file.ssml
โ””โ”€โ”€ input_file.ssml
โ””โ”€โ”€ input_file_text.txt
โ””โ”€โ”€ synthesize_file.py

My GitHub repository contains autoexecute.sh script and manual instructions for using the Google Cloud Platform (GCP) Text-to-Speech API to convert text to audio files using the Python Client. The Text-to-Speech API is a part of the GCP Cloud Text-to-Speech API package, which allows users to generate natural-sounding speech from the text in a variety of languages and voices.

The python script in the repository can be used to automate the process of converting text to speech, allowing users to easily generate a large number of audio files from a text file/ssml file . The script is simple to use and requires only a few lines of code to set up. It also includes options for customizing the output audio, such as changing the language, and voice. Below is the code snippet from the synthesize_file.py file where you can modify the voices and language. To know the values of the supported voices and languages, you can refer to this GCP documentation.

# synthesize_file.py
...
# Note: the voice can also be specified by name.
# Names of voices can be retrieved with client.list_voices().
voice = texttospeech.VoiceSelectionParams(
language_code="en-US", ssml_gender=texttospeech.SsmlVoiceGender.FEMALE
)
...

Now, letโ€™s take a look at some of the benefits of using the GCP Cloud TTS API.

  • Easy to use: The GCP Cloud TTS API is designed to be easy to use, even for those with little programming experience. All you need is a Google Cloud account, a little bit of code to get started and a credit-card to sign yourself in for a free trial period of 92 days along with the USD 300 credits to play around with the features. Don't worry ๐Ÿ˜ฆ, your card will not be charged automatically until you initiate the billing. You can remove your card if required.
  • Fast: The GCP Cloud TTS API can process large amounts of text quickly, so you wonโ€™t have to wait long for your audio files to be ready. But, you would need to upgrade and reach out to the Google Support to parse the higher input_file file size. For now, there is a general limitation of 5000 Bytes or 5 Kb.
  • Accurate: The GCP Cloud TTS API uses advanced machine learning techniques to produce high-quality audio files that sound natural and lifelike.
  • Flexible: The GCP Cloud TTS API allows you to customize the voice, language, and other parameters of your audio files to suit your needs.

To get started, I first set up a GCP account and enabled the Cloud TTS API. Then, I used the API documentation and code samples to familiarize myself with the API and learn how to make requests.

The process of creating the TTS application was straightforward and took only about 3 hours from start to finish. I used Python as my programming language i.e. Client and the requests library to make HTTP requests to the API. There is no specific reason why I chose Python as my Client, I was familiar enough with Node.Js and Python so wanted to try out anyone between the two.

One of the challenges I encountered was finding the right balance between speed and quality. The API allows you to adjust the speaking rate, pitch, and volume of the generated audio, but making too many changes can affect the naturalness of the voice. I experimented with different settings until I found a combination that worked well for my needs.

Overall, I am very happy with the TTS application that I created using the GCP Cloud TTS API. It has saved me a lot of time and effort, and I have been able to use it for a good cause. If you are in need of a TTS solution, I highly recommend giving the GCP Cloud TTS API a try from my repository. It is easy to use, customizable, and produces high-quality audio.

I hope it will be helpful to a greater audience. Please feel free to share your comments on whether you liked it.

For now, thanks for reading!! If you enjoyed this article, please follow and subscribe for the latest updates. Looking for more? Check out the other articles below:

--

--

Someshwaran M

I am an Open-Source Enthusiast. I learned a lot from the Open-Source community and I love how collaboration, knowledge sharing happens through Open-Source!