Introduction: Portable Sound Analyzer on ESP32
Sounds are everywhere and always around us. Whether it is your name spoken by your beloved, a barking dog, music played on television or the horn of a car...
Our brain processes sounds through its ability to transform the sound waves picked up by our ears. But wouldn't it be fun to really be able to visualize these waves and try to understand them as well?
That's what I tried to do with this project. A microphone captures the waves, transforms them into electrical signals that can be processed by a microcontroller and displayed on a screen. Of course, it has to be fun, fast, informative and portable.
Let's see what we can do...
Step 1: What You Need
The ESP32 is a powerful microcontroller from Espressif, that I've been using for a few years for several projects. For this project I chose the model TTGO T-Display, which is equipped with a LCD display and 2 buttons. This way, I can easily plot any image or graph, and interact with it. It's a very convenient device.
The microphone is a MAX9814 module that is equipped with an amplifier with automatic gain control. That means it can be used even when the sound level is very low. Actually, I tested it with a friend whispering at the other end of the room, and it was able to catch the sound quite well! Really impressive device.
And that's all for the circuit. To make it portable, I added a LiPo battery, with the proper connector: a 2 pin JST SH 1.0 connector with 1.25mm square holes.
Of course, I need a breadboard and some wires for the connections. All this remains under $10.
Step 2: Assemble and Upload the Code
Making the system is really simple: the ESP32 module powers the microphone, which sends the signal back to the ESP32. So we just need to connect GND and 3V3 pins of both devices and the output of the mike to an analog input of the microcontroller.
I chose for this the pin number 32.
The MAX9814 board has another pin for gain selection:
- unplugged for maximum gain
- to GND for medium gain (50dB)
- to VCC for lower gain (40dB).
I chose this last option, to limit the gain and the associated noise, but you can also try the others if you need to monitor very low sounds.
The code was written using the Arduino IDE, with the extension for ESP32. You can choose the device in the list called 'ESP32 Dev Module'. Connect the ESP32 module to your computer using an USB wire type C, and select the port. Then open the attached ino file and upload it into the ESP32.
The code is organized into 3 files:
- the main file is SoundAnalyzer_ESP32_TTGO.ino
- additional files functions.h and params.h, which contain the functions and the parameters of the system.
All these files should be placed in a folder named 'SoundAnalyzer_ESP32_TTGO' inside your Arduino folder. When opening the ino file in the IDE, you will notice that the other files are in 2 other tabs, so you can see and modify them if you want.
This TTGO T-Display relies on the use of the TFT_eSPI library for the display. To install and configure it, please refer to this website.
As the sound analyzer needs to... well analyze the sound, I needed a Fourier transform that can compute very quickly to keep real time performances. You don't want any latency between the incoming sound and the spectrum display. I first tested the Fast Hartley transform, which is supposed to be faster than the regular Fourier transform, but I wasn't totally satisfied with the result and the way to use it, so I sticked to the Fourier tranform.
In a previous project (have a look, it's fun!), I already adapted an FFT library to ESP32, so I just had to use it in this project. Re-use is the mother of programming...
Step 3: How It Works
Remember? The system needs to be fun, fast, informative and portable.
So I programmed 5 different ways of displaying the sound: 2 for the spectrum, 1 for the sound waves and 2 for the sound's envelope. Just push either one of the white buttons near the USB connector to switch displays.
Let's have a look at them... But before that, just one additional feature: as these displays are changing quite fast, if you want to stop them, you just need to gently touch the GPIO 15 of the TTGO module.
I couldn't use one of the buttons for that, because the only sound that could be seen was... the click of the button!!! Touching a GPIO is far more discreet, and doesn't disturb the microphone.
SPECTRUM ANALYSIS
The first display shows the real time spectrum of the sound from 0 to 8kHz (the bandwidth of our ear is between 20 Hz (or vibrations per second) and 20 kHz, but sounds above 10 kHz are difficult to pick up: just try this and see if you can hear it).
The screen displays the spectrum with bars, and indicates the frequency of highest amplitude. There is an animation showing the decrease of the bars in time (falling red dashes).
In general, the frequency of the voice is higher for women, who speak in higher pitches, while it is lower for men, where low-pitched sounds dominate. The sound spectrum covered by the human voice therefore ranges from 60 Hz for the low and deepest - although some well-trained Tibetan monks could go even lower - to about 1,200 Hz for the highest sopranos. The average is around 200 Hz. The reason why women most often have a higher pitched voice is that their vocal cords are shorter (12.5 to 17.5 mm), like those of children, than those of men (17 to 25 mm).
This is why I added another display for this medium frequency range. For this one, no need of an animation, the bars are colored from green (lower amplitude) to dark red (higher amplitudes).
SOUND WAVES DISPLAY
This is the classical display, the one we are used to seeing in the media. Here is a cute one below.
Of course, I didn't try to achieve this, but the result is similar: it shows the variation of the amplitude of the sound waves in time. You can play with it to see the difference of the sound waves when you speak different words. Try 'yes', 'no', your name or the name of your pet, etc...
SOUND ENVELOPE
This displays show the envelope of the sound amplitude in time as a curve. The first one goes from left to right in a couple of seconds, the other one shift the curve to the left to provide a moving sound display. This display is interesting if you want to have some memory of the evolution of the sound in time (such as visualizing words), or if you want to analyze very faint sounds.
Anyways, the best is to try them all, and use the ones you prefer... Don't forget that a mere touch of GPIO 15 enables to stop time and quietly watch an image of the sound you just captured.
Step 4: Have Fun Watching Sounds!
Here are a few examples of sounds that you can see with this tool.
- Beethoven: the famous Fifth symphony
- Maria Callas: "Ave Maria" (Schubert)
- Queen: "Bohemian Rhapsody"
- Pink Floyd: "Marooned"
- Pink Floyd: "Another brick in the wall"
- Churchill speech "We will never surrunder", one of the most famous voices of the last century
- "The flight of the bumblebee" (Rimsky Korsakov), by the famous French trumpet player Maurice André.
The last one lets you imagine the speed of the fingers of Maurice André on his trumpet, and the processing speed which enables to see each single note he plays.
The device is portable, do not hesitate to use it anywhere you'd like to watch some interesting sounds... Have fun with sounds, and stay healthy!
Step 5: New Version, Including Spectrogram Display
I added another type of display : the spectrogram. A spectrogram is a way to see the evolution of the frequency spectrum in time.
The usual frequency spectrum, as can be seen on the 2 first screens (see Step 3), uses the Fourier transform over the entire time range. In my case, it is calculated here:
sampling_period_us = round(1000ul * (1.0 / MAX_FREQ));
The number of samples and the max frequency are in the params.h file: 256 samples and 20kHz, leading to a recording time of 12.8 ms. The time to display the spectrum is roughly equal to 10 ms. This means that a new frequency spectrum is displayed every 23 ms (more than 40 times per second!). During this time, the sound frequency is not supposed to change a lot.
But it may happen that over a longer time frame, the frequency changes. This actually happens every time you say a word... The spectrogram is a way to see this frequency change. It displays a 2D color graph of the amplitude of the sound, in a time - frequency space.
The sound is recorded over a 700ms duration, which is roughly the length of a word (not a too long word of course). This time frame is divided into slices, for which Fourier transforms are computed. Then the amplitude of the spectrum is changed into colours and displayed, together with the colour scale. Black and blue zones show low amplitude, orange and red mean high amplitude. The image above displays the spectrogram when I say 'Hello'.
Note that the program needs to catch the word you say, so it waits until a sound comes that is above the background noise level. This is emphasized by a 'WAITING FOR SOUND' message.
This version also corrects a few mistakes that I have found: please use it instead of the one above. And enjoy much more watching sounds !!!
Step 6: Another New Version
This new new version adds another presentation for the spectrogram : it displays iso-amplitude lines in the time - frequency domain. Red is the lowest amplitude, then cyan, blue and green.
Also, both buttons are now active, enabling to move up or down in the displays.

Participated in the
1000th Contest
1 Person Made This Project!
- rmond18 made it!
23 Comments
Question 5 weeks ago on Introduction
Bonjour
J'ai réalisé ce beau montage et tout fonctionne correctement, j'aurai voulu l'adapter pour régler une cornemuse rapidement. (Dans mon pipeband, on passe presque 30 minutes à régler nos cornemuses pour les ajuster ensemble avant de pouvoir jouer... température et humidité changent selon la météo!). Soit un bourdon à 240 hertz, les deux autres à 480 hertz et vérifier le chanteur qui s'échelonne en gros de 400 à 1000 hertz. je ne suis pas un pro de arduino, auriez vous l'amabilité de me proposer un bout de programation pour arriver à mon objectif.
Merci, cordialement
Didier
Answer 5 weeks ago
Bonjour Didier, avec plaisir, mais quel genre ?
6 months ago
failed to compile, issue with:
-- float complex - changed to: float _Complex
-- double complex - changed to: double _Complex
Now works in Arduino IDE and in platformio
7 months ago
Dear Fabrice,
I have some problem with your sketch.
I just get some graduation and numbers as shown in the photo.
Could you help with this?
Thank you,
Cesar
Reply 7 months ago
Hi Cesar, sorry to hear that you have a problem. Did you check all the wiring? Did you change the sketch? Have you tried other sketches such as those on the TTGO github?
Reply 7 months ago
Hi Fabrice,
I have tried that basic cares.
I am starting with TTGO and I have two problems. 1- In Mac I have problems with port 2- In PC I have problems with librarian installation.
You could me help with some informations:
What system do you use, MacOS or PC?
What version of arduino IDE do you use?
Thank you for your help
Cesar
Reply 7 months ago
I use PC (Windows 10) and Arduino IDE version 1.8.19.
Try remove and install the TFT_eSPI library again. Did you successfully try the examples of the TTGO github?
Reply 7 months ago
Ok, I have done this several times.
However, I am not sure about How to install library in PC. This is different of Mac.
In Mac, when I install library everything find the correct way. When I try install library in PC/Windows appear as "Invalid Library". Then I try install by hand, and I am not sure if I am doing correctly. Some times the .h disappear.
I have a PC/ Windows10 with Arduino 1.8.18
The changes in TFT_SPI is ok. This is not problem.
However, I am not sure if all parts of the library is in the correct way to work.
Thank you anyway,
Cesar
Reply 7 months ago
It should be straightforward on a PC too. Follow the instructions from the github : https://github.com/Xinyuan-LilyGO/TTGO-T-Display
The line that says you use the TTGO T-display is this one : https://github.com/Xinyuan-LilyGO/TTGO-T-Display/b...
Hope it helps
Reply 7 months ago
Instructions for installing correctly a library can be found here : https://learn.adafruit.com/adafruit-all-about-arduino-libraries-install-use/how-to-install-a-library
Reply 7 months ago
Thanks for the advices, but still doesn't work.
However I was thinking about the power supply. The set work only with USB supply or needs external battery?
Reply 7 months ago
I used a small Lipo battery
Reply 7 months ago
Hi Fabrice,
I found the problem.
The problem is in the GPIO15. When I remove the TTGO from protoboard keeping pin 15 overhead, the system works perfectly.
What happens in the system when we touch pin 15?
How can I decrease this sensibility?
Thank you for your patience...
Cesar
Reply 7 months ago
Hi Fabrice,
It is ok now. We may adjust the sensibility or remove completely the touch in pin 15 in the last 3 lines of the sketch. It is ok now.
Your code is fantastic. Works very well.
Thank you for your help and advices.
Thanks a lot.
Cesar
Reply 7 months ago
Bravo ! I'm glad you made it. Have fun!
8 months ago
Just to let you know that my project is now almost finished. It works good..My appreciation for providing the code that allow me to do this.
Reply 8 months ago
Good to know, thanks for the update! Have fun !
Question 1 year ago on Step 6
On functions.h
void ffti_evaluate...
Line 152 double complex Wm = re + I * im;
where is I defined?
Line 483 for (int j = 0....I * 0.0f; again "I" is not defined?
I have managed to get spectrum 1 & 2 to work on a 1.8" TFT display,
although I don't quite understand what I am doing.
Being that I am 80 years old and not a programmer it is difficult to find
what I don't need for my purposes to have a clean "functions.h" file.
All I am trying to do is the first Spectrum to work out from 50 or 63 to 16KHz.
Don't know if it can be done with your code? I have seen "SAMPLES = 8192 and
MAX-FREQ = 40kHz" in other implementations with the ESP32 but my limited understanding do not allow me to port it to the 1.8" TFT Display.
I am making 13 of these to show the spectrum after the crossovers in my home theater setup.
I have a bi-amped 5.2 system. L&R Subs, L&R Lower Woofer, L&R Mid-Range,
L&R Ribbon, L&R Surround and Center Channel. See attached PDF.
Answer 1 year ago
I found this example that somehow clarifies the imaginary I of "complex.h".
It says this is a "C program" thing, whereas "C++" uses something different.
I had compiled and upload under Arduino and
When I tried to use Platformio for the first time it told me about "I" not defined
and the program would not compile or give me a flag. Obviously, I was/am confused.
int main(void)
{double complex= 3.2 + 4.1 * I;
// Creates complex numbers
// with 3.2 and 4.1 as
// real and imaginary parts
printf("z = %.1f% + .1fi\n",creal(z), cimag(z));
}
Reply 1 year ago
As far as I know, 'I' is defined in the core ESP32 frame that is loaded by the Arduino IDE (somewhere here: https://github.com/espressif/arduino-esp32 I guess). IDK how that works with PlatformIO.