Introduction: Portable Sound Analyzer on ESP32

Sounds are everywhere and always around us. Whether it is your name spoken by your beloved, a barking dog, music played on television or the horn of a car...

Our brain processes sounds through its ability to transform the sound waves picked up by our ears. But wouldn't it be fun to really be able to visualize these waves and try to understand them as well?

That's what I tried to do with this project. A microphone captures the waves, transforms them into electrical signals that can be processed by a microcontroller and displayed on a screen. Of course, it has to be fun, fast, informative and portable.

Let's see what we can do...

Step 1: What You Need

The ESP32 is a powerful microcontroller from Espressif, that I've been using for a few years for several projects. For this project I chose the model TTGO T-Display, which is equipped with a LCD display and 2 buttons. This way, I can easily plot any image or graph, and interact with it. It's a very convenient device.

The microphone is a MAX9814 module that is equipped with an amplifier with automatic gain control. That means it can be used even when the sound level is very low. Actually, I tested it with a friend whispering at the other end of the room, and it was able to catch the sound quite well! Really impressive device.

And that's all for the circuit. To make it portable, I added a LiPo battery, with the proper connector: a 2 pin JST SH 1.0 connector with 1.25mm square holes.

Of course, I need a breadboard and some wires for the connections. All this remains under $10.

Step 2: Assemble and Upload the Code

Making the system is really simple: the ESP32 module powers the microphone, which sends the signal back to the ESP32. So we just need to connect GND and 3V3 pins of both devices and the output of the mike to an analog input of the microcontroller.

I chose for this the pin number 32.

The MAX9814 board has another pin for gain selection:

  • unplugged for maximum gain
  • to GND for medium gain (50dB)
  • to VCC for lower gain (40dB).

I chose this last option, to limit the gain and the associated noise, but you can also try the others if you need to monitor very low sounds.

The code was written using the Arduino IDE, with the extension for ESP32. You can choose the device in the list called 'ESP32 Dev Module'. Connect the ESP32 module to your computer using an USB wire type C, and select the port. Then open the attached ino file and upload it into the ESP32.

The code is organized into 3 files:

  • the main file is SoundAnalyzer_ESP32_TTGO.ino
  • additional files functions.h and params.h, which contain the functions and the parameters of the system.

All these files should be placed in a folder named 'SoundAnalyzer_ESP32_TTGO' inside your Arduino folder. When opening the ino file in the IDE, you will notice that the other files are in 2 other tabs, so you can see and modify them if you want.

This TTGO T-Display relies on the use of the TFT_eSPI library for the display. To install and configure it, please refer to this website.

As the sound analyzer needs to... well analyze the sound, I needed a Fourier transform that can compute very quickly to keep real time performances. You don't want any latency between the incoming sound and the spectrum display. I first tested the Fast Hartley transform, which is supposed to be faster than the regular Fourier transform, but I wasn't totally satisfied with the result and the way to use it, so I sticked to the Fourier tranform.

In a previous project (have a look, it's fun!), I already adapted an FFT library to ESP32, so I just had to use it in this project. Re-use is the mother of programming...

Step 3: How It Works

Remember? The system needs to be fun, fast, informative and portable.

So I programmed 5 different ways of displaying the sound: 2 for the spectrum, 1 for the sound waves and 2 for the sound's envelope. Just push either one of the white buttons near the USB connector to switch displays.

Let's have a look at them... But before that, just one additional feature: as these displays are changing quite fast, if you want to stop them, you just need to gently touch the GPIO 15 of the TTGO module.

I couldn't use one of the buttons for that, because the only sound that could be seen was... the click of the button!!! Touching a GPIO is far more discreet, and doesn't disturb the microphone.

SPECTRUM ANALYSIS

The first display shows the real time spectrum of the sound from 0 to 8kHz (the bandwidth of our ear is between 20 Hz (or vibrations per second) and 20 kHz, but sounds above 10 kHz are difficult to pick up: just try this and see if you can hear it).

The screen displays the spectrum with bars, and indicates the frequency of highest amplitude. There is an animation showing the decrease of the bars in time (falling red dashes).

In general, the frequency of the voice is higher for women, who speak in higher pitches, while it is lower for men, where low-pitched sounds dominate. The sound spectrum covered by the human voice therefore ranges from 60 Hz for the low and deepest - although some well-trained Tibetan monks could go even lower - to about 1,200 Hz for the highest sopranos. The average is around 200 Hz. The reason why women most often have a higher pitched voice is that their vocal cords are shorter (12.5 to 17.5 mm), like those of children, than those of men (17 to 25 mm).

This is why I added another display for this medium frequency range. For this one, no need of an animation, the bars are colored from green (lower amplitude) to dark red (higher amplitudes).

SOUND WAVES DISPLAY

This is the classical display, the one we are used to seeing in the media. Here is a cute one below.

Of course, I didn't try to achieve this, but the result is similar: it shows the variation of the amplitude of the sound waves in time. You can play with it to see the difference of the sound waves when you speak different words. Try 'yes', 'no', your name or the name of your pet, etc...

SOUND ENVELOPE

This displays show the envelope of the sound amplitude in time as a curve. The first one goes from left to right in a couple of seconds, the other one shift the curve to the left to provide a moving sound display. This display is interesting if you want to have some memory of the evolution of the sound in time (such as visualizing words), or if you want to analyze very faint sounds.

Anyways, the best is to try them all, and use the ones you prefer... Don't forget that a mere touch of GPIO 15 enables to stop time and quietly watch an image of the sound you just captured.

Step 4: Have Fun Watching Sounds!

Here are a few examples of sounds that you can see with this tool.

  • Beethoven: the famous Fifth symphony
  • Maria Callas: "Ave Maria" (Schubert)
  • Queen: "Bohemian Rhapsody"
  • Pink Floyd: "Marooned"
  • Pink Floyd: "Another brick in the wall"
  • Churchill speech "We will never surrunder", one of the most famous voices of the last century
  • "The flight of the bumblebee" (Rimsky Korsakov), by the famous French trumpet player Maurice André.

The last one lets you imagine the speed of the fingers of Maurice André on his trumpet, and the processing speed which enables to see each single note he plays.

The device is portable, do not hesitate to use it anywhere you'd like to watch some interesting sounds... Have fun with sounds, and stay healthy!

Step 5: New Version, Including Spectrogram Display

I added another type of display : the spectrogram. A spectrogram is a way to see the evolution of the frequency spectrum in time.

The usual frequency spectrum, as can be seen on the 2 first screens (see Step 3), uses the Fourier transform over the entire time range. In my case, it is calculated here:

sampling_period_us = round(1000ul * (1.0 / MAX_FREQ));

The number of samples and the max frequency are in the params.h file: 256 samples and 20kHz, leading to a recording time of 12.8 ms. The time to display the spectrum is roughly equal to 10 ms. This means that a new frequency spectrum is displayed every 23 ms (more than 40 times per second!). During this time, the sound frequency is not supposed to change a lot.

But it may happen that over a longer time frame, the frequency changes. This actually happens every time you say a word... The spectrogram is a way to see this frequency change. It displays a 2D color graph of the amplitude of the sound, in a time - frequency space.

The sound is recorded over a 700ms duration, which is roughly the length of a word (not a too long word of course). This time frame is divided into slices, for which Fourier transforms are computed. Then the amplitude of the spectrum is changed into colours and displayed, together with the colour scale. Black and blue zones show low amplitude, orange and red mean high amplitude. The image above displays the spectrogram when I say 'Hello'.

Note that the program needs to catch the word you say, so it waits until a sound comes that is above the background noise level. This is emphasized by a 'WAITING FOR SOUND' message.

This version also corrects a few mistakes that I have found: please use it instead of the one above. And enjoy much more watching sounds !!!

Step 6: Another New Version

This new new version adds another presentation for the spectrogram : it displays iso-amplitude lines in the time - frequency domain. Red is the lowest amplitude, then cyan, blue and green.

Also, both buttons are now active, enabling to move up or down in the displays.

Step 7: Yet Another New Version

Some of you may have encountered compilation problems, due to the complex calculations and some change in the ESP32 kernel at some time after I published this code. Please find below a new version, using a fix provided by one of you in the comments, which compiles correctly today (2024-01-10) on Arduino IDE 2.2.1 version.

Enjoy!

1000th Contest

Participated in the
1000th Contest