How to Do Automatic Song Classification With AI!

In this tutorial we will collect microphone data to get chunks of songs and use NanoEdge AI Studio (a free tool) to automatically create an AI model able to classify our songs.

Do not worry, you don't need knowledge in AI to follow this tutorial :)

Here is the plan:

Setup
Collect microphone data
Create the classification model
Add the model in our Arduino code
Use the LED matrix to display the song detected

Let's go!

Supplies

Hardware:

Arduino Uno R4 WIFI (wifi is not needed)
Max4466 microphone
A micro-USB cable to connect the Arduino board to your desktop machine

If you don't have an Arduino R4 WIFI, you can check the board compatible in NanoEdge AI Studio.

Concerning the LEDs, it is only required in the last step, you don't need them.

If you have another microphone and you know how to use it. You can also change it!

Software:

Arduino IDE: To develop the code
NanoEdge AI Studio: To detect the sound class automatically (click on the link to get the software)

Step 1: Setup

First, we need to connect the microphone to the Arduino board.

Use jumper wires to connect:

OUT (mic) to A0 (board)
GND to one of the GND on the board
VCC to 3.3v

Make sure that you have a USB data cable connecting the board to the pc.

In Arduino IDE:

Make sure you selected the right COM port: Tools > Port and select the right one.

Select the right board:

Tools > Boards > Arduino Renesas UNO R4 boards > Arduino UNO R4 WIFI
If you don't find it, click on Tools > Boards > Boards Manager..., look for the UNO R4 and install the package

Step 2: Logging Data With Microphone

We use a digital microphone that has a very high data rate.

We will collect chucks of the music by collecting buffers of values from the microphone and also reduce the data rate by keeping only 1 value every 32 collected.

We collect buffers of music and note single notes to classify them. Even for a human it is impossible to recognize a song with one random note taken from the song.

To accomplish this:

Define the AMP_PIN to A0 as our microphone use the A0 pin to send data
We define a buffer called neai_buffer to stock the value collected
In our case, the buffer is of size 1024 (SENSOR_SAMPLE)
We initialize the serial in the setup()
We create a get_microphone_data() to collect buffers of data from the microphone. We get only 1/32 values
We print the buffer to send it via serial.

The code:

/* Defines  ----------------------------------------------------------*/
#define SENSOR_SAMPLES	  1024 //buffer size
#define AXIS              1    //microphone is 1 axis
#define DOWNSAMPLE        32   //microphone as a very high data rate, we downsample it


/* Prototypes ----------------------------------------------------------*/
void get_microphone_data(); //function to collect buffer of sound

/* Global variables ----------------------------------------------------------*/
static uint16_t neai_ptr = 0; //pointers to fill for sound buffer
static float neai_buffer[SENSOR_SAMPLES * AXIS] = {0.0}; //souhnd buffer
int const AMP_PIN = A0;       // Preamp output pin connected to A0

/* Setup function ----------------------------------------------------------*/
void setup() {
  Serial.begin(115200);
  delay(10);
}

/* Infinite loop ----------------------------------------------------------*/
void loop() {
  get_microphone_data();
}


/* Functions declaration ----------------------------------------------------------*/
void get_microphone_data()
{
  static uint16_t temp = 0; //stock values
  int sub = 0; //increment to downsample
  //while the buffer is not full
  while (neai_ptr < SENSOR_SAMPLES) {
    //we only get a value every DOWNSAMPLE (32 in this case)
    if (sub > DOWNSAMPLE) {
      /* Fill neai buffer with new accel data */
      neai_buffer[neai_ptr] = analogRead(AMP_PIN);
      /* Increment neai pointer */
      neai_ptr++;
      sub = 0; //reset increment
    }
    else {
      //we read the sample even if we don't use it
      //else it is instantaneous and we don't downsample
      temp = analogRead(AMP_PIN);
    }
    sub ++;
  }
  //print the buffer values to send them via serial
  for (uint16_t i = 0; i < SENSOR_SAMPLES; i++) {
    Serial.print(neai_buffer[i]);
    Serial.print(" ");
  }
  Serial.print("\n");
  neai_ptr = 0; //reset the beginning position
}

To use this code, copy and paste it in Arduino IDE. If you have followed the setup part, you only need to click on UPLOAD (little arrow on the top).

In the next step, we will use this code to collect data in NanoEdge AI Studio and create an AI library to classify songs.

arduino_demo_shazam_datalogger.ino
Download

Step 3: Classification Model

With the code in the previous step, we can use NanoEdge to collect datasets of data for each of the music that we want to classify:

Open NanoEdge
Create a N-class classification project
Select the Arduino R4 WIFI board as target (other boards are compatible)
Select Microphone 1axis as sensor
Click Next

Then we will collect data for every music. In the SIGNAL STEP:

Click ADD SIGNAL
then FROM SERIAL (USB)
First launch the music (on a phone for example)
Then click START/STOP to collect data (make sure the right COM port is selected)
Collect the buffers while playing the song at least two times. Avoid empty buffers (pause if you need)
Click CONTINUE then IMPORT
Rename the file if you want
repeat for each song

Once you have everything that you want, go to the BENCHMARK STEP.

The more song you have, the harder it will get, so start simple.

Click on NEW BENCHMARK
Select all the song and click START

The benchmark will look for the best model and preprocessing of your data to find a model that is able to classify the songs.

In few tenths of minutes, you should have a score > 80%. Else, you may need to go the previous step and collect new data with longer buffer or a bigger downsample. Something like this:

/* Defines  ----------------------------------------------------------*/
#define SENSOR_SAMPLES	  2048 //buffer size
#define DOWNSAMPLE        64   //microphone as a very high data rate, we downsample it

Then repeat the whole process.

OPTIONAL:

Go to the EMULATOR STEP to make sure your model is working:

Click INIATILIZE EMULATOR
Click on SERIAL (USB)
Click on START/STOP while playing a song

What you see on the up right is the probability to be of each class. If your model works, the highest probability should be the one corresponding to the song that you are playing. It may alternate a bit with other class at some times.

On the bottom right is just a count of the classes that were detected (the maximum probability in the up right is selected as the detected class).

Then go to the COMPILATION STEP:

Click compile, fill the little form and save your AI library (.zip).

In the next step, we will add this library into Arduino IDE to classify song directly on the board.

Step 4: Classification Integration

Now that we have the classification library, we need to add it to our Arduino code:

Open the .zip obtained, there is an Arduino folder containing another zip
Import the library in Arduino IDE: Sketch > Include library > Add .ZIP library... and select the .zip in the Arduino folder

IF YOU ALREADY USE A NANOEDGE AI LIBRARY IN ARDUINO IDE:

go to document/arduino/library and delete the nanoedge one. Then follow the instruction above to import the new library.

IMPORTANT:

If you get an error because of RAM, it may be because of the library in NanoEdge. Go back to the VALIDATION STEP in NanoEdge and select a smaller library (click on the crown on the right), then compile it and replace it in Arduino IDE.

Copy the code below and paste it in Arduino IDE, it contain the previous code but also everything needed for Nanoedge:

The library
Some NanoEdge varialbe
The initialization of the library in the setup
The classification after collecting sound data
The output class

/* Libraries ----------------------------------------------------------*/
#include "NanoEdgeAI.h"
#include "knowledge.h"

/* Defines  ----------------------------------------------------------*/
#define SENSOR_SAMPLES    1024 //buffer size
#define AXIS              1    //microphone is 1 axis
#define DOWNSAMPLE        32   //microphone as a very high data rate, we downsample it

/* Prototypes ----------------------------------------------------------*/
void get_microphone_data(); //function to collect buffer of sound

/* Global variables ----------------------------------------------------------*/
static uint16_t neai_ptr = 0; //pointers to fill for sound buffer
static float neai_buffer[SENSOR_SAMPLES * AXIS] = {0.0}; //souhnd buffer
int const AMP_PIN = A0;       // Preamp output pin connected to A0

/* NEAI PART*/
uint8_t neai_code = 0; //initialization code
uint16_t id_class = 0; // Point to id class (see argument of neai_classification fct)
float output_class_buffer[CLASS_NUMBER]; // Buffer of class probabilities
const char *id2class[CLASS_NUMBER + 1] = { // Buffer for mapping class id to class name
  "unknown",
  "up",
  "down",
};


/* Setup function ----------------------------------------------------------*/
void setup() {
  Serial.begin(115200);
  delay(10);

  /* Initialize NanoEdgeAI AI */
  neai_code = neai_classification_init(knowledge);
  if (neai_code != NEAI_OK) {
    Serial.print("Not supported board.\n");
  }
}

/* Infinite loop ----------------------------------------------------------*/
void loop() {
  get_microphone_data();
  neai_classification(neai_buffer, output_class_buffer, &id_class);
  /* DISPLAY THE SONG NAME */
  Serial.println(id_class);
}

/* Functions declaration ----------------------------------------------------------*/
void get_microphone_data()
{
  static uint16_t temp = 0; //stock values
  int sub = 0; //increment to downsample
  //while the buffer is not full
  while (neai_ptr < SENSOR_SAMPLES) {
    //we only get a value every DOWNSAMPLE (32 in this case)
    if (sub > DOWNSAMPLE) {
      /* Fill neai buffer with new accel data */
      neai_buffer[neai_ptr] = analogRead(AMP_PIN);
      /* Increment neai pointer */
      neai_ptr++;
      sub = 0; //reset increment
    }
    else {
      //we read the sample even if we don't use it
      //else it is instantaneous and we don't downsample
      temp = analogRead(AMP_PIN);
    }
    sub ++;
  }
  neai_ptr = 0; //reset the beginning position
}

In this code, The detected class is simply written using the serial. You can create a switch if you want to print the name of the song instead. you can do something like this:


/* Infinite loop ----------------------------------------------------------*/
void loop() {
  get_microphone_data();
  neai_classification(neai_buffer, output_class_buffer, &id_class);
  /* DISPLAY THE SONG NAME */
  switch(id_class){
  case 1:
     Serial.println("your song 1");
     break;
  case 2:
     Serial.println("your song 1");
     break;
  //continue if you have more songs

  default:
     Serial.println("default");
     break;
}
}

IMPORTANT:

You need to check that the id2class variable in the Arduino code is the same than in NanoEdgeAI.h file from the library we imported earlier:

in document/arduino/libraries/nanoedge/src/NanoEdgeAI.h (commented at the end of the file)

In my case it was this:

const char *id2class[CLASS_NUMBER + 1] = { // Buffer for mapping class id to class name
    "unknown",
    "magic fs32",
    "cheriecoco fs32",
    "zouglou fs32",
    "gaou fs32",
    "ambiance fs32",
};

arduino_demo_shazam_serial_print.ino
Download

Step 5: Use the LED Matrix to Print the Name of the Song

If you want to display the class predicted with the led matrix on the Arduino R4 WIFI, use the code attached.

You need to select the libraries:

Sketch > Include Library, we need to include the following libraries:
ArduinoGraphics
Arduino_LED_Matrix

The only thing added in the code are a matrix object, 2 variables (frame and text), then some code in the setup(), that's it!

IMPORTANT:

Change the *id2class as in the previous step
In the loop(), edit the switch to display what you want

 switch(id_class){
  case 1:
    strcpy (text, " song1 ");
    break;
  case 2:
    strcpy (text, " song2 ");
    break;
  default:
    strcpy (text, " check switch in code ");
    break;
 }

The variable text is declared before the matrix. It has a size 30, change it is you need (if you want to display a sentence longer than 30 in the switch)