Portable Sound Analyzer on ESP32
Intro: Portable Sound Analyzer on ESP32
Sounds are everywhere and always around us. Whether it is your name spoken by your beloved, a barking dog, music played on television or the horn of a car...
Our brain processes sounds through its ability to transform the sound waves picked up by our ears. But wouldn't it be fun to really be able to visualize these waves and try to understand them as well?
That's what I tried to do with this project. A microphone captures the waves, transforms them into electrical signals that can be processed by a microcontroller and displayed on a screen. Of course, it has to be fun, fast, informative and portable.
Let's see what we can do...
STEP 1: What You Need
The ESP32 is a powerful microcontroller from Espressif, that I've been using for a few years for several projects. For this project I chose the model TTGO T-Display, which is equipped with a LCD display and 2 buttons. This way, I can easily plot any image or graph, and interact with it. It's a very convenient device.
The microphone is a MAX9814 module that is equipped with an amplifier with automatic gain control. That means it can be used even when the sound level is very low. Actually, I tested it with a friend whispering at the other end of the room, and it was able to catch the sound quite well! Really impressive device.
And that's all for the circuit. To make it portable, I added a LiPo battery, with the proper connector: a 2 pin JST SH 1.0 connector with 1.25mm square holes.
Of course, I need a breadboard and some wires for the connections. All this remains under $10.
STEP 2: Assemble and Upload the Code
Making the system is really simple: the ESP32 module powers the microphone, which sends the signal back to the ESP32. So we just need to connect GND and 3V3 pins of both devices and the output of the mike to an analog input of the microcontroller.
I chose for this the pin number 32.
The MAX9814 board has another pin for gain selection:
- unplugged for maximum gain
- to GND for medium gain (50dB)
- to VCC for lower gain (40dB).
I chose this last option, to limit the gain and the associated noise, but you can also try the others if you need to monitor very low sounds.
The code was written using the Arduino IDE, with the extension for ESP32. You can choose the device in the list called 'ESP32 Dev Module'. Connect the ESP32 module to your computer using an USB wire type C, and select the port. Then open the attached ino file and upload it into the ESP32.
The code is organized into 3 files:
- the main file is SoundAnalyzer_ESP32_TTGO.ino
- additional files functions.h and params.h, which contain the functions and the parameters of the system.
All these files should be placed in a folder named 'SoundAnalyzer_ESP32_TTGO' inside your Arduino folder. When opening the ino file in the IDE, you will notice that the other files are in 2 other tabs, so you can see and modify them if you want.
This TTGO T-Display relies on the use of the TFT_eSPI library for the display. To install and configure it, please refer to this website.
As the sound analyzer needs to... well analyze the sound, I needed a Fourier transform that can compute very quickly to keep real time performances. You don't want any latency between the incoming sound and the spectrum display. I first tested the Fast Hartley transform, which is supposed to be faster than the regular Fourier transform, but I wasn't totally satisfied with the result and the way to use it, so I sticked to the Fourier tranform.
In a previous project (have a look, it's fun!), I already adapted an FFT library to ESP32, so I just had to use it in this project. Re-use is the mother of programming...
STEP 3: How It Works
Remember? The system needs to be fun, fast, informative and portable.
So I programmed 5 different ways of displaying the sound: 2 for the spectrum, 1 for the sound waves and 2 for the sound's envelope. Just push either one of the white buttons near the USB connector to switch displays.
Let's have a look at them... But before that, just one additional feature: as these displays are changing quite fast, if you want to stop them, you just need to gently touch the GPIO 15 of the TTGO module.
I couldn't use one of the buttons for that, because the only sound that could be seen was... the click of the button!!! Touching a GPIO is far more discreet, and doesn't disturb the microphone.
SPECTRUM ANALYSIS
The first display shows the real time spectrum of the sound from 0 to 8kHz (the bandwidth of our ear is between 20 Hz (or vibrations per second) and 20 kHz, but sounds above 10 kHz are difficult to pick up: just try this and see if you can hear it).
The screen displays the spectrum with bars, and indicates the frequency of highest amplitude. There is an animation showing the decrease of the bars in time (falling red dashes).
In general, the frequency of the voice is higher for women, who speak in higher pitches, while it is lower for men, where low-pitched sounds dominate. The sound spectrum covered by the human voice therefore ranges from 60 Hz for the low and deepest - although some well-trained Tibetan monks could go even lower - to about 1,200 Hz for the highest sopranos. The average is around 200 Hz. The reason why women most often have a higher pitched voice is that their vocal cords are shorter (12.5 to 17.5 mm), like those of children, than those of men (17 to 25 mm).
This is why I added another display for this medium frequency range. For this one, no need of an animation, the bars are colored from green (lower amplitude) to dark red (higher amplitudes).
SOUND WAVES DISPLAY
This is the classical display, the one we are used to seeing in the media. Here is a cute one below.
Of course, I didn't try to achieve this, but the result is similar: it shows the variation of the amplitude of the sound waves in time. You can play with it to see the difference of the sound waves when you speak different words. Try 'yes', 'no', your name or the name of your pet, etc...
SOUND ENVELOPE
This displays show the envelope of the sound amplitude in time as a curve. The first one goes from left to right in a couple of seconds, the other one shift the curve to the left to provide a moving sound display. This display is interesting if you want to have some memory of the evolution of the sound in time (such as visualizing words), or if you want to analyze very faint sounds.
Anyways, the best is to try them all, and use the ones you prefer... Don't forget that a mere touch of GPIO 15 enables to stop time and quietly watch an image of the sound you just captured.
STEP 4: Have Fun Watching Sounds!
Here are a few examples of sounds that you can see with this tool.
- Beethoven: the famous Fifth symphony
- Maria Callas: "Ave Maria" (Schubert)
- Queen: "Bohemian Rhapsody"
- Pink Floyd: "Marooned"
- Pink Floyd: "Another brick in the wall"
- Churchill speech "We will never surrunder", one of the most famous voices of the last century
- "The flight of the bumblebee" (Rimsky Korsakov), by the famous French trumpet player Maurice André.
The last one lets you imagine the speed of the fingers of Maurice André on his trumpet, and the processing speed which enables to see each single note he plays.
The device is portable, do not hesitate to use it anywhere you'd like to watch some interesting sounds... Have fun with sounds, and stay healthy!
STEP 5: New Version, Including Spectrogram Display
I added another type of display : the spectrogram. A spectrogram is a way to see the evolution of the frequency spectrum in time.
The usual frequency spectrum, as can be seen on the 2 first screens (see Step 3), uses the Fourier transform over the entire time range. In my case, it is calculated here:
sampling_period_us = round(1000ul * (1.0 / MAX_FREQ));
The number of samples and the max frequency are in the params.h file: 256 samples and 20kHz, leading to a recording time of 12.8 ms. The time to display the spectrum is roughly equal to 10 ms. This means that a new frequency spectrum is displayed every 23 ms (more than 40 times per second!). During this time, the sound frequency is not supposed to change a lot.
But it may happen that over a longer time frame, the frequency changes. This actually happens every time you say a word... The spectrogram is a way to see this frequency change. It displays a 2D color graph of the amplitude of the sound, in a time - frequency space.
The sound is recorded over a 700ms duration, which is roughly the length of a word (not a too long word of course). This time frame is divided into slices, for which Fourier transforms are computed. Then the amplitude of the spectrum is changed into colours and displayed, together with the colour scale. Black and blue zones show low amplitude, orange and red mean high amplitude. The image above displays the spectrogram when I say 'Hello'.
Note that the program needs to catch the word you say, so it waits until a sound comes that is above the background noise level. This is emphasized by a 'WAITING FOR SOUND' message.
This version also corrects a few mistakes that I have found: please use it instead of the one above. And enjoy much more watching sounds !!!
STEP 6: Another New Version
This new new version adds another presentation for the spectrogram : it displays iso-amplitude lines in the time - frequency domain. Red is the lowest amplitude, then cyan, blue and green.
Also, both buttons are now active, enabling to move up or down in the displays.
STEP 7: Yet Another New Version
Some of you may have encountered compilation problems, due to the complex calculations and some change in the ESP32 kernel at some time after I published this code. Please find below a new version, using a fix provided by one of you in the comments, which compiles correctly today (2024-01-10) on Arduino IDE 2.2.1 version.
Enjoy!
32 Comments
jmmenec1 3 months ago
Merci beaucoup pour ce partage!
J'ai tenté d'implémenter le programme mais il y a beaucoup de messages d'erreur lors de la compilation!
Il semble que le fichier functions.h pose quelques problèmes!
Je travaille avec l'IDE Arduino 2.2.1
Carte Lilygo TTGO T-Display v1.1
en pièce jointe une image partielle des codes erreur rencontrés!
Merci d'avance pour votre aide!
Cordialement
Jean-Marc
FabriceA6 3 months ago
float complex - changed to: float _Complex
double complex - changed to: double _Complex
Après, ça compile.
jmmenec1 3 months ago
Merci pour cette précision!
En effet, une fois les modifications réalisées, je n'ai plus l'erreur sur le calcul des complexes!
En revanche, d'autres erreurs subsistent!
Du style de celle sur l'image jointe. (il y en a 7 comme celle là)
Merci d'avance pour votre aide!
Cordialement
Jean-Marc
FabriceA6 3 months ago
Par exemple, celle qui dit 'LOG2SAMPLE was not declared' : LOG2SAMPLE est déclaré à la ligne 51 du fichier ino. Vous n'avez peut-être pas la bonne version du code. Je la remets dans un nouveau chapitre à la fin de l'Instructable.
jmmenec1 3 months ago
Merci pour votre réponse et en effet, j'ai testé le code sur un autre PC et, avec les modifications recommandées par Hambo79, la compilation se fait correctement!
Il s'agit donc d'un problème non pas lié au code mais à mon PC !
Je dois avoir un problème plus général sur le compilateur utilisé par mon PC..
Je vais donc devoir me pencher sur le problème ...
Merci encore d'avoir passé un peu de temps pour me répondre!
Si j'arrive à régler le problème je souhaitais utiliser le code pour réaliser un projet lié à l'animation par led d'instruments de musique (fanfare)...
Cordialement
Jean-Marc
FabriceA6 3 months ago
FabriceA6 3 months ago
change float complex to: float _Complex
change double complex to: double _ComplexOne occurence in the ino file and 8 in functions.h
ana1955 11 months ago
kosmostrater 1 year ago
SoundAnalyzer_ESP32_TTGO\SoundAnalyzer_ESP32_TTGO.ino:45:15: error: expected initializer before 'data' float complex data[SAMPLES];
DidLef 1 year ago
J'ai réalisé ce beau montage et tout fonctionne correctement, j'aurai voulu l'adapter pour régler une cornemuse rapidement. (Dans mon pipeband, on passe presque 30 minutes à régler nos cornemuses pour les ajuster ensemble avant de pouvoir jouer... température et humidité changent selon la météo!). Soit un bourdon à 240 hertz, les deux autres à 480 hertz et vérifier le chanteur qui s'échelonne en gros de 400 à 1000 hertz. je ne suis pas un pro de arduino, auriez vous l'amabilité de me proposer un bout de programation pour arriver à mon objectif.
Merci, cordialement
Didier
FabriceA6 1 year ago
Hambo79 1 year ago
-- float complex - changed to: float _Complex
-- double complex - changed to: double _Complex
Now works in Arduino IDE and in platformio
langoni 1 year ago
I have some problem with your sketch.
I just get some graduation and numbers as shown in the photo.
Could you help with this?
Thank you,
Cesar
FabriceA6 1 year ago
langoni 1 year ago
I have tried that basic cares.
I am starting with TTGO and I have two problems. 1- In Mac I have problems with port 2- In PC I have problems with librarian installation.
You could me help with some informations:
What system do you use, MacOS or PC?
What version of arduino IDE do you use?
Thank you for your help
Cesar
FabriceA6 1 year ago
Try remove and install the TFT_eSPI library again. Did you successfully try the examples of the TTGO github?
langoni 1 year ago
However, I am not sure about How to install library in PC. This is different of Mac.
In Mac, when I install library everything find the correct way. When I try install library in PC/Windows appear as "Invalid Library". Then I try install by hand, and I am not sure if I am doing correctly. Some times the .h disappear.
I have a PC/ Windows10 with Arduino 1.8.18
The changes in TFT_SPI is ok. This is not problem.
However, I am not sure if all parts of the library is in the correct way to work.
Thank you anyway,
Cesar
FabriceA6 1 year ago
The line that says you use the TTGO T-display is this one : https://github.com/Xinyuan-LilyGO/TTGO-T-Display/b...
Hope it helps
FabriceA6 1 year ago
langoni 1 year ago
However I was thinking about the power supply. The set work only with USB supply or needs external battery?