"SoundDetector" is a simple voice recognition device.
Step 1: Background and Purpose
- AI Speaker is popular now.
- So I want to do something about voice sound recognize.
- And, I'm interested in STMicro Electronics Corp. CPU
- Try a voice sound recognize system that the following requirements.
- Limited resources. (use ARM Cortex-M3 Level CPU.)
- It works on off-line.
Step 2: How to Use
- Turn on the System.
- Speak for microphone "turn on light."
- the system turn on light. It's simple.(In the demo, turn on LED on the breadboard.)
Step 3: System Components
- System Components
- Atollic True Studio(IDE)
- STM32F3xx_HAL_Driver(NOTE: This software is licensed by STMicroElectronics.)
- STMicro Nucleo STM32F303K8(72MHz, ROM64KB,RAM12KB)
- ECM(Electronic Condenser Microphone)*1.
- Resistance(470*2, 1k*5, 100k*1)
- Capacitor(0.47uF*2 or 1uF*1)
- Operational amplifier(LM358N)
- USB Cable,Breadboard(for Debugging)
Step 4: Voice Recognizing Algorithm
In General, In speech recognition, feature quantities are extracted from speech signals by frequency analysis and recognized as vowels or consonant. However, Its need a high signal processing speed, and dictionary data of each vowel and consonant. So, most of AI speakers is doing most of the processing in the cloud.
In this system, since it needs to work on off-line,I needed to find an algorithm to be completed within the embedded system. And I adopted the following method.
- Calculate volume from ECM output voltage
- Generate ON / OFF pattern from volume
- Compute the matching rate with the correct pattern
(For details, see Figure3-1,Figure3-2)
This method has several merits.
- Not require a fine sampling cycle
- Save memory capacity
- Not require arithmetic processing with high processing load such as FFT. (It can be done only by arithmetic operation and logarithmic calculation)
On the other hand, the following disadvantages exist.
- Required Threshold adjustment.
- Possibility of erroneous detection. (It will erroneously recognize even a completely different word, not even the same word.)
Step 5: Hardware Design
For the details of the hardware, see the attached wiring diagram and circuit diagram.
The point is that the amplifier circuit using the LM358N is incorporated in order to amplify the output voltage of the microphone. I connected the amplified microphone signal to Nucleo's A0 (used as ADC Input) port. I connected the LED to Nucleo's A1 (used as GPIO Output) with a resistor. Since I mainly verified the algorithm this project, I used the breadboard.
Step 6: Software Design
Although the amount of code written is not so much, I designed it with UML diagrams in order to leave the design document. The class diagram shows the module structure of software. The state machine diagram and the sequence diagrams explain the behavior of the software.
Timer, ADC, GPIO, UART were used for the peripheral functions of the CPU. Timer is used to generate 500 μs periodic interrupt. When the cyclic interrupt handler is called, AD conversion of the microphone signal (12 bits) is performed using the ADC peripheral, and the result is stored in the data array. When the buffer becomes full, I do speech recognition based on my algorithm. The result of speech recognition is indicated by LED. Also, UART was prepared for debugging.
Step 7: Coding
Please refer to the attached zip file for the completed source codes. Unzip the zip file and you can open the project in TrueStudio. (Double click on .cproject file)
The settings of CPU's Timer, ADC, GPIO, and CLK are set with the STM32CubeMX tool. STM32FCube supports each CPU manufactured by STMicro and is a convenient free tool that allows you to initialize the peripherals registers on the GUI. By using this tool, I was able to eliminate the implementation of register initialization process, and I was able to concentrate on the application layer implementation.
(NOTE: Codes generated by STM32CubeMX tool is licensed by STMicro Electronics Corp.)
Step 8: Debugging
It is necessary to adjust two parameters essential to the voice recognition algorithm. One is the threshold (THRESHOLD_VOLUME_ON) that the Volume is determined to be ON, and the other is the threshold (THRESHOLD_DETECT_COUNT) to judge the pattern match.
In order to adjust this parameter on the PC, I implemented the function to send the audio signal recorded by the device to the PC via the COM port. And, I can validate the parameters on Excel. The setting of COM port is Baudrate: 38400 bps, Databit: 8, StopBit: 1 Parity: None, FlowControl: None,. Since the NucleoSTM32F303K8 has a USB port for development, it was not necessary to prepare the hardware for that.
As a result of the adjustment, THRESHOLD_VOLUME_ON was optimized for 40 [dB]. This means that you can recognize the voice from about 300 mm away from the microphone. Since this figure is influenced by the sensitivity of the microphone and the gain of the amplifier circuit,it is expected that the threshold will also be affected by hardware.
THRESHOLD_DETECT_COUNT has become 3100. This is the value which corresponds to about 75% pattern concordance rate.This value is a parameter adjusted by repeating trial and error. However, it is probably expected to be optimal for each user to use. Perhaps it may be better to apply machine learning.
Step 9: Watch the Movie!!
Step 10: Challenges for the Future
The system produced this time has room for some improvements.
Complete hardware (mounting on printed circuit board, creating case)
Algorithm improvement (Accuracy improvement)
Application to the following systems.
- Output infrared signal and turn on the TV.
- As a secure key to open and close the door.
- etc ...
I would be pleased if you give me advice and opinions about this system.