Introduction: The Best Way for Sampling Audio With ESP32
I will describe three ways of sampling Audio with a ESP32 microcontroller.
1. Direct (Sequential)readout
2. Interrupt Driven Readout
3. I2S Driven Readout
But first, let me tell you a bit about sampling in general and the Nyquist Theorem
ESP32 board like Devkit
Step 1: Sampling Audio With a Microcontroller
When you want to sample an audio signal with a micro controller, it is important to know what you will be doing with that digitalized information. Will you be using it to send audio to another device or will you be using it to record the audio, do a FFT analysis or something entirely different? This information is important to make the write choice in choosing your parameters. More on that later.
Let's say we have an audio signal of 10Khz that I want to digitize.The signal is connected to an audio pin of the micro controller and for now, we are assuming that this signal is in the positive domain only. ( it will not go below zero) .
For denominational purposes, we will be digitizing a audio signal with the following parameters:
- Sin wave
- Vtt = 1V
- Frequency is 10 Khz
- We will be taken 9 samples at a frequency of 10 Khz
All is demonstrated in the picture. The vertical lines represent the 9 sampling points. The dots on each sin wave represent the actual samples being stored to the memory.
The first sin wave in the picture has a frequency of 2Khz. If we connect of sampled dots with a straight line ( in yellow) we can recreate of signal using these samples.
The second sin wave has a frequency of 5 Khz and as we can see, when the dots are connected, we can again recreate or signal from the stored samples.
Now, the third sin wave has a frequency of 10 Khz, Now, as you can see, when we connect the red dots all we see is a straight line so the signal can no longer be recreated by making use of the audio samples.
This is precisely why we apply Nyquist Theorem. Nyquist stated that, in order to recreate an audio signal, the sampling frequency has to be at least twice as high as the highest frequency in your audio signal. If we apply this to our audio signal, that would mean that our 10 Khz signal has to be sampled at a frequency of 20Khz or more in order to get a good result.
Now, If you want to now more about this, check out the following video of Scott Marley:
In the example with the 3 sin wave signals, you will see that the signal we reproduced with our samples does have the same nice shape as the original sin wave. Now, this all depends on the number of samples per second we take. As you know, this depends on the sample frequency. It we were to sample an audio signal of 5 Khz using a sample frequency of 44.1Khz, Samples will be taken at an interval of 22 uS. ( t=1/f) Sinds the Period time of a 5Khz signal is 200uS, we will be taken almost 10 samples per period.
Step 2: Sequential Sampling
In order to sample an audio signal you will always have to take and store a chunk of this signal and then process it before moving on to the next chuck. Processing can be as simple as storing or sending but it can also be doing an FFT Analysis.
Let say we will be taking 8 samples at at sampling frequency of 10Khz.
This is the easiest way to sample your audio signal but it is also the one that is most time consuming.
Basically, what you do is you take your first sample and put it in a buffer. Then you wait as long as it takes until it is time to take the next sample, you take that sample and wait again. You repeat this process until your buffer is full. When it is full, you process your buffer. When the processing is done you start all over again.
This is time consuming because you have to wait until all samples are taken and you have to wait until all processing in done until you can process the next sample. During that "wait" your audio signal is lost so you will be missing a chuck of data. If you are doing FFT analyses with the purpose of visualizing the data ( spectrum analyzer of VU meter) that might not be a problem at all and this method might work just fine.
However, it becomes more and more critical if you plan on taken more then 8 samples. For example 256,512 of 1024... Then the 'Wait' will be long and the processing time of the buffer data will be so long that it will be noticeable on your visualization.
However, for those who like to try,
Here is the code.
Step 3: Interrupt Driven Sampling.
A better way to proceed would be to use a timer that runs at your desired sampling frequency and that will generator an interrupt whenever the timer is completed.
During that interrupt, as long as the buffer is not full, it will be filled with one sample of the ADC converter.
If the buffer is full, the interrupt will set a flag so that the main loop knows that the buffer is full and can be processed.
In your main loop, whenever the flag is not set, the particular section will be skipped and the next piece of code is processed. However, if the flag is set, the main loop will copy the buffer into another buffer and it will clear the flag and reset the first buffer ( or buffer counter). Next, it will start processing the data from the second buffer and during that time, the first buffer is already being filled up by the interrupt routine.
This way, you can save a lot of time because you don't always need to wait for the buffer to be full as it is already partly filled up while your code was doing other things like updating the ledscreen, or handling the user interface. etc.
And remember, while processing the buffer data, another buffer is being filled up as we speak.
There is one downside....if you are sampling at a high frequency, like 44KHz, the generated interrupts are causing a lot of interruptions of your main program. It therefore takes up a lot of processor power and it might effect your other pieces of code.
Here is an example on how to use it.
Step 4: I2S Driven Sampling
The third and best option is to sample your audio using the I2S bus if available. The ESP32 has it's I2S bus directly connected to the internal ADC and that is perfect!
The principle behind I2S sampling is similar to the one for interrupt driven sampling.
However, you don't need to worry about timing and interruptions. The I2s has direct access to the memory and is therefore capable of writing directly to the memory without help of the core. This makes it way faster then using the interrupt driven sampling principle.
The I2S has 2 or more buffers that are handled by the I2S bus/Protocol. You don't need to worry about it, it is all taken care of. Whenever a buffer is full, it will move on to the next buffer. You don't need to worrie about what buffer to read as the I2s Protocol does this automatically for you. All you need to do is set a sampling frequency,the number of buffers and you need to decide how many samples fit in each buffer.
Of all the mentioned methods, this is the fastest.
The Arduino Sketch is included
Step 5: And a Video Explaining All of the Above
Finally there is a video telling you all of the above: