How to make the Raspberry Pi read text with eSpeak.
ⓘ This article may have been partially or fully translated using automatic tools. We apologize for any errors this may cause.
In computing, the user interface problem is probably one of the most complicated. That, naming things, and invalidating a cache …
The easiest way to present information to a human from a computer is usually to use vision. But for that you need a screen, which is bulky, expensive and therefore not very suitable for on-board use.
In this tutorial, we will see how to use your user's hearing rather than their vision, by having your Raspberry Pi speak text using eSpeak software, which is called Text To Speech.
The hardware to make the Raspberry Pi talk
To be able to make your Raspberry Pi speak, we will need the following equipment:
Install eSpeak on the Raspberry Pi
To make your Raspberry Pi speak, we will use the eSpeak software, an open source text-to-speech software.
The principle of eSpeak is as follows: you give it some text (a string of characters, a file, etc.) and it will split it into phonemes (the smallest sounds that make up a spoken language), then use a whole set of techniques to transform these phonemes into real sound files.
To install eSpeak on the Raspberry Pi it's quite simple since it is already present in the repositories. So we just need to update the repositories and ask for the installation of eSpeak:
sudo apt update sudo apt install espeak -y
Read a sentence with eSpeak
Now that eSpeak is installed, we will be able to make it read a first sentence. But before that, we will make sure that the sound goes out on the Jack port of our Raspberry Pi to reach our speakers.
To do this, run the sudo raspi-config command, then go to "Advanced Options", "Audio", "Force 3.5mm ('headphone') jack" and finally "Finish".
We will create a folder
espeak in the user's directory download an audio file and play it to verify that everything is working correctly:
mkdir /home/pi/espeak cd /home/pi/espeak wget https://raspberry-pi.fr/download/espeak/test.mp3 -O test.mp3 ffplay -nodisp test.mp3
You should hear a C major scale playing.
Now that we know the speakers are working, we're going to ask eSpeak to say the phrase, “The raspberries are perched on my grandfather's stool.”. For this, we use the command below:
espeak -a 200 -v fr+f3 "Les framboises sont perchées sur le tabouret de mon grand-père." --stdout | aplay
Which gives us something like this …
Let's go around the command a bit to try to understand:
espeaklaunch the espeak program
-a 200indicates the volume to use for sound. It ranges from 0 to 200 and is 100 by default.
-v fr+f3tells us the language to use.
frcorresponds to the French language,
+f3indicates that we want to use the third variation of female voice proposed by eSpeak. Remember to adapt the language code to your text.
- The sentence in quotes is the one that will be spoken by eSpeak. Instead, we could also have asked it to read a text file casually.
--stdouttells eSpeak that rather than playing the audio directly, it should send the generated data to the terminal's standard output.
| aplayindicates that the output generated by eSpeak will be redirected to the input of the aplay program, which is a program allowing to play audio files in Wave format, the one generated by eSpeak. Note that instead we could use
> mon_fichier.wavto save the audio output to a file.
As you can see, it is not eSpeak but aplay which plays the sound. The reason for this is very simple, espeak has been bugged on the Raspberry Pi for some versions …
If you try to make eSpeak speak directly you will get errors related to Alsa, the Raspberry Pi sound server. The simplest solution is therefore to send the data to aplay, which itself works perfectly. In the end it works and that's all that matters.
Improve the voice of eSpeak by installing MBROLA on the Raspberry.
As you can see, the voices generated by eSpeak are absolutely filthy. Don't panic, we have a solution to improve this.
I mentioned it at the beginning of the article, eSpeak is able to generate phonemes, these pieces of sound constituting a spoken language. And, it turns out that there are other programs that can read and pronounce these phonemes, and in a way more convincing than eSpeak!
In our case, we will use the MBROLA software, a worldwide collaborative project initiated by the Polytechnic Faculty of Mons, Belgium, which aims to design a huge database for speech synthesis.
Strangely enough, the MBROLA software is not available in the Raspbian repositories, while the language data is. This has the consequence of making the installation of these language data impossible …
Don't worry, we have the solution! Indeed, a good soul was kind enough to create an MBROLA package for the Raspberry Pi herself, and we therefore decided to create a mirror on the site.
So we will download and install this package with the commands below:
cd ~/espeak wget https://raspberry-pi.fr/download/espeak/mbrola3.0.1h_armhf.deb -O mbrola.deb sudo dpkg -i mbrola.deb
And now that MBROLA is installed, we will be able to download the language files we need. So for me it will be
mbrola-fr1, that is to say the first French voice. Adapt the command to your language.
sudo apt install mbrola-fr1 -y
All we have to do now is take our previous eSpeak order and adapt it to generate phonemes and have them read by MBROLA. Which will give us the command below.
espeak -a 200 -v mb-fr1 -s 150 "Les framboises sont perchées sur le tabouret de mon grand-père." --stdout | aplay
Which gives us the audio below:
You will find some of the same settings as before, but with two changes:
-v mb-fr1indicates that we want to use MBROLA to generate the audio file
mb, and the French voice number 1
-s 150indicates when to him that we want to generate the file at a speed of 150 words per minute. The default is 165, but I find it a bit fast, at least for French.
As before, you can of course modify the command to create a file, read the text from a txt file, etc.
By combining these commands with things like reading RFID tags and the like, you can easily create in-vehicle systems with relatively complete interfaces.
Of course, we are still far from a human voice and know that there are better "Text to Speech" engines, such as Mozilla's TTS developed as part of the Common Voice project. Nevertheless, MBROLA and eSpeak offer a good compromise between ease of use, speed of execution and efficiency.