Why I created a Text-to-Speech Application within 3 hours using GCP Cloud TTS API? — Step-by-Step Guide!😄 🌎
Have you ever found yourself in a situation where you needed to convert a large amount of
audio files, but didn’t have the resources or budget to use a professional service? If so, you’ll be glad to know that it’s possible to create your own
text-to-speech (TTS) application using the
Google Cloud Text-to-Speech API.
Please feel free to check out the complete codebase in my Github. You can thank me later for this! 🙏
Let me tell you my motivation behind it, I got a friend who is supporting
Visually Challenged for their upcoming exams. They required someone to read their chapters/lessons on their behalf so that they can
listen to them or the
audio files and appear for the exams. My friend had been doing this for years and used to record all the chapters using the
Phone Recorder and send them the recorded audio files. Recently, I met a few
Visually Challenged and got to know the pain point. Before meeting them, I was in an assumption, that they would typically need someone to dictate and record all chapters through their native accent/voice. But later, I found out that they are very skillful in understanding different accents and there are many
volunteers ⛑ across the globe had helped them already. So, that's when I pulled myself and created this application to process large textbooks of various
NOTE: It does require some manual intervention and some basic coding knowledge to understand but it is comparatively faster and efficient. You can consider my application is currently in a pre-stage for now.
Be Careful of the Frustration!💢
When I checked for
Text-to-Speech applications online, there were many but it was expensive and had cost on top of each plan. When I found a
FREE plan in an application like murf.ai, I was pushed to
upgrade if I wanted to
download the audio files. That's when I got exhausted.
The funny part is I did not realize this earlier enough, the moment I saw a
FREE plan, I started to
upload all the files and tried to process them. Later, post working on it for an
hour, I was shocked 😩 to see I wasn't allowed to
download the audio files.
This is when I realized I could use the
GCP TTS API to create my own application and it can be modified as I wanted. I could
upload larger files without any restrictions.
That said, in this article, I’ll walk you through the process of creating a
TTS application using the
GCP Cloud TTS API, and explain why this can be a valuable tool for anyone looking to convert text into audio files quickly and efficiently.
│ ├── images
│ └── images
My GitHub repository contains
autoexecute.sh script and manual instructions for using the Google Cloud Platform (GCP) Text-to-Speech API to convert text to audio files using the
Python Client. The Text-to-Speech API is a part of the GCP Cloud Text-to-Speech API package, which allows users to generate natural-sounding speech from the text in a variety of
python script in the repository can be used to
automate the process of converting text to speech, allowing users to easily generate a large number of
audio files from a
text file/ssml file . The script is simple to use and requires only a few lines of code to set up. It also includes options for customizing the
output audio, such as changing the
voice. Below is the code snippet from the
synthesize_file.py file where you can modify the
language. To know the values of the supported voices and languages, you can refer to this GCP documentation.
# Note: the voice can also be specified by name.
# Names of voices can be retrieved with client.list_voices().
voice = texttospeech.VoiceSelectionParams(
Now, let’s take a look at some of the
benefits of using the GCP Cloud TTS API.
- Easy to use: The GCP Cloud TTS API is designed to be easy to use, even for those with
little programming experience. All you need is a
Google Cloud account, a little bit of
codeto get started and a
credit-cardto sign yourself in for a
free trial periodof
92 daysalong with the
USD 300credits to play around with the features. Don't worry 😦, your card will not be charged automatically until you initiate the billing. You can remove your card if required.
- Fast: The GCP Cloud TTS API can process
large amountsof text quickly, so you won’t have to wait long for your
audio filesto be ready. But, you would need to
upgradeand reach out to the
Google Supportto parse the higher
input_filefile size. For now, there is a general limitation of
- Accurate: The GCP Cloud TTS API uses advanced
machine learningtechniques to produce
high-qualityaudio files that sound natural and lifelike.
- Flexible: The GCP Cloud TTS API allows you to customize the
language, and other parameters of your audio files to suit your needs.
To get started, I first set up a
GCP account and enabled the
Cloud TTS API. Then, I used the
API documentation and
code samples to familiarize myself with the API and learn how to make requests.
The process of creating the
TTS application was straightforward and took only about 3 hours from start to finish. I used
Python as my programming language i.e. Client and the requests library to make
HTTP requests to the API. There is no specific reason why I chose
Python as my Client, I was familiar enough with
Python so wanted to try out anyone between the two.
One of the
challenges I encountered was finding the right balance between
quality. The API allows you to adjust the speaking
volume of the generated audio, but making too many changes can affect the naturalness of the voice. I experimented with different settings until I found a combination that worked well for my needs.
Overall, I am very happy with the
TTS application that I created using the
GCP Cloud TTS API. It has saved me a lot of time and effort, and I have been able to use it for a good cause. If you are in need of a TTS solution, I highly recommend giving the
GCP Cloud TTS API a try from my repository. It is easy to use,
customizable, and produces
I hope it will be helpful to a greater audience. Please feel free to share your
comments on whether you liked it.
For now, thanks for reading!! If you enjoyed this article, please follow and subscribe for the latest updates. Looking for more? Check out the other articles below:
10 Basic Mac Tools To Empower You In Software Space-Mandatory Skill To Crack Software Interviews❌…
As a Technical Engineer having the right tools can make a big difference in your productivity. These tools add…
What is OpenSSL? How can I use it to generate certificates? — Hands-on Activity 😉 🏃
In this article, we will go through the basic commands of OpenSSL, how to generate a private key, self-signed certs…