Portable Sound Analyzer on ESP32

10,160

23

23

Introduction: Portable Sound Analyzer on ESP32

Sounds are everywhere and always around us. Whether it is your name spoken by your beloved, a barking dog, music played on television or the horn of a car...

Our brain processes sounds through its ability to transform the sound waves picked up by our ears. But wouldn't it be fun to really be able to visualize these waves and try to understand them as well?

That's what I tried to do with this project. A microphone captures the waves, transforms them into electrical signals that can be processed by a microcontroller and displayed on a screen. Of course, it has to be fun, fast, informative and portable.

Let's see what we can do...

Step 1: What You Need

The ESP32 is a powerful microcontroller from Espressif, that I've been using for a few years for several projects. For this project I chose the model TTGO T-Display, which is equipped with a LCD display and 2 buttons. This way, I can easily plot any image or graph, and interact with it. It's a very convenient device.

The microphone is a MAX9814 module that is equipped with an amplifier with automatic gain control. That means it can be used even when the sound level is very low. Actually, I tested it with a friend whispering at the other end of the room, and it was able to catch the sound quite well! Really impressive device.

And that's all for the circuit. To make it portable, I added a LiPo battery, with the proper connector: a 2 pin JST SH 1.0 connector with 1.25mm square holes.

Of course, I need a breadboard and some wires for the connections. All this remains under $10.

Step 2: Assemble and Upload the Code

Making the system is really simple: the ESP32 module powers the microphone, which sends the signal back to the ESP32. So we just need to connect GND and 3V3 pins of both devices and the output of the mike to an analog input of the microcontroller.

I chose for this the pin number 32.

The MAX9814 board has another pin for gain selection:

  • unplugged for maximum gain
  • to GND for medium gain (50dB)
  • to VCC for lower gain (40dB).

I chose this last option, to limit the gain and the associated noise, but you can also try the others if you need to monitor very low sounds.

The code was written using the Arduino IDE, with the extension for ESP32. You can choose the device in the list called 'ESP32 Dev Module'. Connect the ESP32 module to your computer using an USB wire type C, and select the port. Then open the attached ino file and upload it into the ESP32.

The code is organized into 3 files:

  • the main file is SoundAnalyzer_ESP32_TTGO.ino
  • additional files functions.h and params.h, which contain the functions and the parameters of the system.

All these files should be placed in a folder named 'SoundAnalyzer_ESP32_TTGO' inside your Arduino folder. When opening the ino file in the IDE, you will notice that the other files are in 2 other tabs, so you can see and modify them if you want.

This TTGO T-Display relies on the use of the TFT_eSPI library for the display. To install and configure it, please refer to this website.

As the sound analyzer needs to... well analyze the sound, I needed a Fourier transform that can compute very quickly to keep real time performances. You don't want any latency between the incoming sound and the spectrum display. I first tested the Fast Hartley transform, which is supposed to be faster than the regular Fourier transform, but I wasn't totally satisfied with the result and the way to use it, so I sticked to the Fourier tranform.

In a previous project (have a look, it's fun!), I already adapted an FFT library to ESP32, so I just had to use it in this project. Re-use is the mother of programming...

Step 3: How It Works

Remember? The system needs to be fun, fast, informative and portable.

So I programmed 5 different ways of displaying the sound: 2 for the spectrum, 1 for the sound waves and 2 for the sound's envelope. Just push either one of the white buttons near the USB connector to switch displays.

Let's have a look at them... But before that, just one additional feature: as these displays are changing quite fast, if you want to stop them, you just need to gently touch the GPIO 15 of the TTGO module.

I couldn't use one of the buttons for that, because the only sound that could be seen was... the click of the button!!! Touching a GPIO is far more discreet, and doesn't disturb the microphone.

SPECTRUM ANALYSIS

The first display shows the real time spectrum of the sound from 0 to 8kHz (the bandwidth of our ear is between 20 Hz (or vibrations per second) and 20 kHz, but sounds above 10 kHz are difficult to pick up: just try this and see if you can hear it).

The screen displays the spectrum with bars, and indicates the frequency of highest amplitude. There is an animation showing the decrease of the bars in time (falling red dashes).

In general, the frequency of the voice is higher for women, who speak in higher pitches, while it is lower for men, where low-pitched sounds dominate. The sound spectrum covered by the human voice therefore ranges from 60 Hz for the low and deepest - although some well-trained Tibetan monks could go even lower - to about 1,200 Hz for the highest sopranos. The average is around 200 Hz. The reason why women most often have a higher pitched voice is that their vocal cords are shorter (12.5 to 17.5 mm), like those of children, than those of men (17 to 25 mm).

This is why I added another display for this medium frequency range. For this one, no need of an animation, the bars are colored from green (lower amplitude) to dark red (higher amplitudes).

SOUND WAVES DISPLAY

This is the classical display, the one we are used to seeing in the media. Here is a cute one below.

Of course, I didn't try to achieve this, but the result is similar: it shows the variation of the amplitude of the sound waves in time. You can play with it to see the difference of the sound waves when you speak different words. Try 'yes', 'no', your name or the name of your pet, etc...

SOUND ENVELOPE

This displays show the envelope of the sound amplitude in time as a curve. The first one goes from left to right in a couple of seconds, the other one shift the curve to the left to provide a moving sound display. This display is interesting if you want to have some memory of the evolution of the sound in time (such as visualizing words), or if you want to analyze very faint sounds.

Anyways, the best is to try them all, and use the ones you prefer... Don't forget that a mere touch of GPIO 15 enables to stop time and quietly watch an image of the sound you just captured.

Step 4: Have Fun Watching Sounds!

Here are a few examples of sounds that you can see with this tool.

  • Beethoven: the famous Fifth symphony
  • Maria Callas: "Ave Maria" (Schubert)
  • Queen: "Bohemian Rhapsody"
  • Pink Floyd: "Marooned"
  • Pink Floyd: "Another brick in the wall"
  • Churchill speech "We will never surrunder", one of the most famous voices of the last century
  • "The flight of the bumblebee" (Rimsky Korsakov), by the famous French trumpet player Maurice André.

The last one lets you imagine the speed of the fingers of Maurice André on his trumpet, and the processing speed which enables to see each single note he plays.

The device is portable, do not hesitate to use it anywhere you'd like to watch some interesting sounds... Have fun with sounds, and stay healthy!

Step 5: New Version, Including Spectrogram Display

I added another type of display : the spectrogram. A spectrogram is a way to see the evolution of the frequency spectrum in time.

The usual frequency spectrum, as can be seen on the 2 first screens (see Step 3), uses the Fourier transform over the entire time range. In my case, it is calculated here:

sampling_period_us = round(1000ul * (1.0 / MAX_FREQ));

The number of samples and the max frequency are in the params.h file: 256 samples and 20kHz, leading to a recording time of 12.8 ms. The time to display the spectrum is roughly equal to 10 ms. This means that a new frequency spectrum is displayed every 23 ms (more than 40 times per second!). During this time, the sound frequency is not supposed to change a lot.

But it may happen that over a longer time frame, the frequency changes. This actually happens every time you say a word... The spectrogram is a way to see this frequency change. It displays a 2D color graph of the amplitude of the sound, in a time - frequency space.

The sound is recorded over a 700ms duration, which is roughly the length of a word (not a too long word of course). This time frame is divided into slices, for which Fourier transforms are computed. Then the amplitude of the spectrum is changed into colours and displayed, together with the colour scale. Black and blue zones show low amplitude, orange and red mean high amplitude. The image above displays the spectrogram when I say 'Hello'.

Note that the program needs to catch the word you say, so it waits until a sound comes that is above the background noise level. This is emphasized by a 'WAITING FOR SOUND' message.

This version also corrects a few mistakes that I have found: please use it instead of the one above. And enjoy much more watching sounds !!!

Step 6: Another New Version

This new new version adds another presentation for the spectrogram : it displays iso-amplitude lines in the time - frequency domain. Red is the lowest amplitude, then cyan, blue and green.

Also, both buttons are now active, enabling to move up or down in the displays.

1000th Contest

Participated in the
1000th Contest

1 Person Made This Project!

Recommendations

  • Game Design: Student Design Challenge

    Game Design: Student Design Challenge
  • Big and Small Contest

    Big and Small Contest
  • Make It Bridge

    Make It Bridge

23 Comments

0
DidLef
DidLef

Question 5 weeks ago on Introduction

Bonjour
J'ai réalisé ce beau montage et tout fonctionne correctement, j'aurai voulu l'adapter pour régler une cornemuse rapidement. (Dans mon pipeband, on passe presque 30 minutes à régler nos cornemuses pour les ajuster ensemble avant de pouvoir jouer... température et humidité changent selon la météo!). Soit un bourdon à 240 hertz, les deux autres à 480 hertz et vérifier le chanteur qui s'échelonne en gros de 400 à 1000 hertz. je ne suis pas un pro de arduino, auriez vous l'amabilité de me proposer un bout de programation pour arriver à mon objectif.
Merci, cordialement
Didier

0
FabriceA6
FabriceA6

Answer 5 weeks ago

Bonjour Didier, avec plaisir, mais quel genre ?

0
Hambo79
Hambo79

6 months ago

failed to compile, issue with:
-- float complex - changed to: float _Complex
-- double complex - changed to: double _Complex
Now works in Arduino IDE and in platformio

0
langoni
langoni

7 months ago

Dear Fabrice,
I have some problem with your sketch.
I just get some graduation and numbers as shown in the photo.
Could you help with this?

Thank you,

Cesar

IMG_6068.jpg
0
FabriceA6
FabriceA6

Reply 7 months ago

Hi Cesar, sorry to hear that you have a problem. Did you check all the wiring? Did you change the sketch? Have you tried other sketches such as those on the TTGO github?

0
langoni
langoni

Reply 7 months ago

Hi Fabrice,
I have tried that basic cares.
I am starting with TTGO and I have two problems. 1- In Mac I have problems with port 2- In PC I have problems with librarian installation.
You could me help with some informations:
What system do you use, MacOS or PC?
What version of arduino IDE do you use?
Thank you for your help

Cesar

0
FabriceA6
FabriceA6

Reply 7 months ago

I use PC (Windows 10) and Arduino IDE version 1.8.19.
Try remove and install the TFT_eSPI library again. Did you successfully try the examples of the TTGO github?

0
langoni
langoni

Reply 7 months ago

Ok, I have done this several times.
However, I am not sure about How to install library in PC. This is different of Mac.
In Mac, when I install library everything find the correct way. When I try install library in PC/Windows appear as "Invalid Library". Then I try install by hand, and I am not sure if I am doing correctly. Some times the .h disappear.
I have a PC/ Windows10 with Arduino 1.8.18
The changes in TFT_SPI is ok. This is not problem.
However, I am not sure if all parts of the library is in the correct way to work.

Thank you anyway,

Cesar

0
langoni
langoni

Reply 7 months ago

Thanks for the advices, but still doesn't work.
However I was thinking about the power supply. The set work only with USB supply or needs external battery?

0
FabriceA6
FabriceA6

Reply 7 months ago

I used a small Lipo battery

0
langoni
langoni

Reply 7 months ago

Hi Fabrice,
I found the problem.
The problem is in the GPIO15. When I remove the TTGO from protoboard keeping pin 15 overhead, the system works perfectly.
What happens in the system when we touch pin 15?
How can I decrease this sensibility?

Thank you for your patience...

Cesar

0
langoni
langoni

Reply 7 months ago

Hi Fabrice,
It is ok now. We may adjust the sensibility or remove completely the touch in pin 15 in the last 3 lines of the sketch. It is ok now.
Your code is fantastic. Works very well.
Thank you for your help and advices.
Thanks a lot.

Cesar

0
FabriceA6
FabriceA6

Reply 7 months ago

Bravo ! I'm glad you made it. Have fun!

0
jacevedo9
jacevedo9

8 months ago

Just to let you know that my project is now almost finished. It works good..My appreciation for providing the code that allow me to do this.

0
FabriceA6
FabriceA6

Reply 8 months ago

Good to know, thanks for the update! Have fun !

0
jacevedo9
jacevedo9

Question 1 year ago on Step 6

On functions.h
void ffti_evaluate...
Line 152 double complex Wm = re + I * im;
where is I defined?
Line 483 for (int j = 0....I * 0.0f; again "I" is not defined?

I have managed to get spectrum 1 & 2 to work on a 1.8" TFT display,
although I don't quite understand what I am doing.
Being that I am 80 years old and not a programmer it is difficult to find
what I don't need for my purposes to have a clean "functions.h" file.
All I am trying to do is the first Spectrum to work out from 50 or 63 to 16KHz.
Don't know if it can be done with your code? I have seen "SAMPLES = 8192 and
MAX-FREQ = 40kHz" in other implementations with the ESP32 but my limited understanding do not allow me to port it to the 1.8" TFT Display.
I am making 13 of these to show the spectrum after the crossovers in my home theater setup.
I have a bi-amped 5.2 system. L&R Subs, L&R Lower Woofer, L&R Mid-Range,
L&R Ribbon, L&R Surround and Center Channel. See attached PDF.

0
jacevedo9
jacevedo9

Answer 1 year ago

I found this example that somehow clarifies the imaginary I of "complex.h".
It says this is a "C program" thing, whereas "C++" uses something different.
I had compiled and upload under Arduino and
When I tried to use Platformio for the first time it told me about "I" not defined
and the program would not compile or give me a flag. Obviously, I was/am confused.

int main(void)
{double complex= 3.2 + 4.1 * I;

// Creates complex numbers
// with 3.2 and 4.1 as
// real and imaginary parts
printf("z = %.1f% + .1fi\n",creal(z), cimag(z));
}