Introduction: Speech Recognition Using Google Speech API and Python
Speech Recognition
Speech Recognition is a part of Natural Language Processing which is a subfield of Artificial Intelligence. To put it simply, speech recognition is the ability of a computer software to identify words and phrases in spoken language and convert them to human readable text. It is used in several applications such as voice assistant systems, home automation, voice based chatbots, voice interacting robot, artificial intelligence and etc.
There are different APIs(Application Programming Interface) for recognizing speech. They offer services either free or paid. These are:
- CMU Sphinx
- Google Speech Recognition
- Google Cloud Speech API
- Wit.ai
- Microsoft Bing Voice Recognition
- Houndify API
- IBM Speech To Text
- Snowboy Hotword Detection
We will be using Google Speech Recognition here, as it doesn't require any API key. This tutorial aims to provide an introduction on how to use Google Speech Recognition library on Python with the help of external microphone like ReSpeaker USB 4-Mic Array from Seeed Studio. Although it is not mandatory to use external microphone, even built-in microphone of laptop can be used.
Step 1: ReSpeaker USB 4-Mic Array
The ReSpeaker USB Mic is a quad-microphone device designed for AI and voice applications, which was developed by Seeed Studio. It has 4 high performance, built-in omnidirectional microphones designed to pick up your voice from anywhere in the room and 12 programmable RGB LED indicators. The ReSpeaker USB mic supports Linux, macOS, and Windows operating systems. Details can be found here.
The ReSpeaker USB Mic comes in a nice package containing the following items:
- A user guide
- ReSpeaker USB Mic Array
- Micro USB to USB Cable
So we're ready to get started.
Step 2: Install Required Libraries
For this tutorial, I’ll assume you are using Python 3.x.
Let's install the libraries:
pip3 install SpeechRecognition
For macOS, first you will need to install PortAudio with Homebrew, and then install PyAudio with pip3:
brew install portaudio
We run below command to install pyaudio
pip3 install pyaudio
For Linux, you can install PyAudio with apt:
sudo apt-get install python-pyaudio python3-pyaudio
For Windows, you can install PyAudio with pip:
pip install pyaudio
Create a new python file
nano get_index.py
Paste on get_index.py below code snippet:
import pyaudio p = pyaudio.PyAudio() info = p.get_host_api_info_by_index(0)numdevices = info.get('deviceCount') for i in range(0, numdevices): if (p.get_device_info_by_host_api_device_index(0, i).get('maxInputChannels')) > 0: print ("Input Device id ", i, " - ", p.get_device_info_by_host_api_device_index(0, i).get('name'))</p>
Run the following command:
python3 get_index.py
In my case, command gives following output to screen:
Input Device id 1 - ReSpeaker 4 Mic Array (UAC1.0) Input Device id 2 - MacBook Air Microphone
Change device_index to index number as per your choice in below code snippet.
import speech_recognition as sr r = sr.Recognizer() speech = sr.Microphone(device_index=1) with speech as source: print("say something!…") audio = r.adjust_for_ambient_noise(source) audio = r.listen(source) try: recog = r.recognize_google(audio, language = 'en-US') print("You said: " + recog) except sr.UnknownValueError: print("Google Speech Recognition could not understand audio") except sr.RequestError as e: print("Could not request results from Google Speech Recognition service; {0}".format(e))
Device index was chosen 1 due to ReSpeaker 4 Mic Array will be as a main source.
Step 3: Text-to-speech in Python With Pyttsx3 Library
There are several APIs available to convert text to speech in python. One of such APIs is the pyttsx3, which is the best available text-to-speech package in my opinion. This package works in Windows, Mac, and Linux. Check the official documentation to see how this is done.
Install the package
Use pip to install the package.
pip install pyttsx3
If you are in Windows, you will need an additional package, pypiwin32 which it will need to access the native Windows speech API.
pip install pypiwin32
Convert text to speech python script
Below is the code snippet for text to speech using pyttsx3 :
import pyttsx3
engine = pyttsx3.init()
engine.setProperty('rate', 150) # Speed percent
engine.setProperty('volume', 0.9) # Volume 0-1
engine.say("Hello, world!")
engine.runAndWait()
Step 4: Putting It All Together: Building Speech Recognition With Python Using Google Speech Recognition API and Pyttsx3 Library
The below code is responsible for recognising human speech using Google Speech Recognition, and converting the text into speech using pyttsx3 library.
import speech_recognition as sr import pyttsx3 engine = pyttsx3.init() engine.setProperty('rate', 200) engine.setProperty('volume', 0.9) r = sr.Recognizer() speech = sr.Microphone(device_index=1) with speech as source: audio = r.adjust_for_ambient_noise(source) audio = r.listen(source) try: recog = r.recognize_google(audio, language = 'en-US') print("You said: " + recog) engine.say("You said: " + recog) engine.runAndWait() except sr.UnknownValueError: engine.say("Google Speech Recognition could not understand audio") engine.runAndWait() except sr.RequestError as e: engine.say("Could not request results from Google Speech Recognition service; {0}".format(e)) engine.runAndWait()
It prints output on terminal. Also, it will be converted into speech as well.
You said: London is the capital of Great Britain
I hope you now have better understanding of how speech recognition works in general and most importantly, how to implement that using Google Speech Recognition API with Python.
If you have any questions or feedback? Leave a comment below. Stay tuned!