Intro: Encoded Audio With Radial Basis Function
The main goal of the project is to achieve a system which codes sound signals using Radial Basis Function embedded on a Zybo FPGA board. After the coding process the RBF weights are to be transferred through a Pmod RF2 to a similar system, which ensures the decoding of the message, which then can be played back. The code is intended to be very secure and extremely hard to decipher. Another goal is to minimize the error and quality decrease between the originally recorded audio signal and played back audio signal.
Step 1: Materials
1) Xilinx Vivado Design Suite
2) Matlab 2015b
3) Xilinx System Generator 2016.2
4) 1x Digilent Zybo Zynq-7000 ARM/FPGA SoC Trainer Board
5) 2x Digilent Pmod RF2: IEEE 802.15 RF Transceiver
6) Microphone (or other recording device with 3.5 mm standard audio jack)
7) Audio output device with 3.5 mm standard audio jack (headphone, speaker)
Step 2: System Schematic
As shown on the image all the system’s modules are connected to the FPGA’s embedded processor which ensures the synchronization of the modules. The processor also serves as a debugger so you can easily identify problems within a module. The plan was achieved using a pipeline architecture so the modules create a stream so that real time data processing could be achieved. We start with recording the audio signal using a microphone. I used the audio module for digital conversion. The digital data recorded through a Microphone Interface arrives to the Signal Coder module. Here the RBF neural network learns the signals in real time and outputs its weights and error signals for debugging purposes. If the code seems not to be secure enough then an additional coder is intended to be implemented for public channel transmission. The coded message is then forwarded through an RF Transceiver Interface to the RF module (Pmod RF2), which sends the data to another RF module. The sent data through a similar RF Transceiver Interface arrives to the decoder module which calculates the output of the RBF network using the same structure as the coder network. This output is considered the decoded signal which is forwarded to the Audio Output module through an Audio Output Interface. The audio output module was used for digital to analog conversion. This analog signal can be played back through a playback device plugged to the system.
Step 3: Radial Basis Function
The Radial Basis Function is a neural network, which is capable of learning signals by updating its basis function weights so that these functions match the reference signal. Like every other neural network this also needs to be trained. The Gaussian function was used for the basis functions of the system. By using this network the signals can be coded with an error which can be minimized by finding the best parameters for the network but can never reach 0, which means that the data can’t be coded without loss.
To be able to implement the algorithm on an FPGA, it needed a bit of refining. The Delta-algorithm was used for training the weights. But by optimizing the algorithm the number of calculations was reduced by a significant amount.
Calculating the output:
y[k] = w[l1[k]] * g(m1[k],s) + w[l2[k]] * g(m2[k],s), where y is the output vector, k is the current sample index, w is the weights vector, l1,l2 are the weight index vectors, m1,m2 are the distance vectors, s is the deviation and g is the Gaussian function output calculating function (e^(-(distance/s)^2)).
Calculating the error:
E[k] = d[k] – y[k], where E is the error vector, k is the current sample index, d is the reference sample vector and y is the output vector
Updating the weights:
w[l1[k]] = w[l1[k]] + nu * E[k] * g(m1[k],s)
w[l2[k]] = w[l2[k]] + nu * E[k] * g(m2[k],s), where nu is the training coefficient
Step 4: Circular Buffer
To code the samples in real time they mustn’t be interrupted in their learning cycle, because otherwise it may hinder the output quality or a clearly audible break may be heard during playback. For this purpose a circular buffer was used, which enables a steady stream of input data for the neural training. At start the buffer is empty and the learning process won’t start until the buffer gets full. The samples arrive in the buffer one by one. When the buffer is full the learning process begins, for all the samples in the buffer the network updates its weights throughout a given number of cycles to minimize the error. After that the first element of the buffer is replaced by a new sample. Then the network is retrained for these samples. This continues until all the samples which are related to a weight are replaced. Then that weight is sent to the output. This weight is now replaced with a new weight initialized with 0.
Step 5: Finding the Basis Function Parameter Based on Signals Power Spectrum Analysis
For the neural network to learn the input sounds it is crucial to find its parameters including the number of samples between the middle of the basis functions, the basis function width parameter (sigma), the buffer size and the training coefficient. The tested parameters were chosen semi empirically. By looking at the plotted sound waves we can tell the width of and number of samples between basis functions. By utilizing the test parameters the power spectrum of the original sound (red colored wave) and the trained signal (blue colored wave) were compared. The network was tested for the following number of samples between basis functions (SN): 10, 20, 30, 40. For every number listed different width parameters were tested: SN/2, SN/4, SN/5, SN/6.
The first image shows the overall best result where SN was 10 and the sigma was SN/5. The trained signal's power spectrum aligns close to the original signal's spectrum.
The second image shows a poor result. The power spectrum of the trained signal differs significantly from the power spectrum of the original signal. Here the SN was 30 and the sigma was SN/2 .
In the third image an even poorer result appears. The network skips most of the input data’s details. Instead of adding additional noise it removes important data sequences. The test parameters were SN: 40 and sigma: SN/2.
Step 6: Choosing the Best Parameters
The different power spectrum comparisons were achieved by calculating the error between the original recorded sound power spectrum and the trained sound power spectrum. By getting the minimal error the best parameters could be picked for the neural network.
The above first image is made of the best test result with the SN of 10 and sigma of SN/5 with a training coefficient of 0.1 and a buffer size of 160 samples.
On the second image it is shown how the neural network is learning in every cycle. As it appears after the 10th cycle the error will not improve, which means that the network learned the signal as best as it could according to the given parameters mentioned above.
Step 7: Coder Module
The coder module was built in System Generator. Look Up Tables were used (l1, l2, m1, m2) stored in Block-RAMs just as the weights of the network (W) and the sample which are to be learned. To synchronize the data the “logic” block written in MCode was used. This block controls which sample is currently coded in which cycle of the learning process also controls the write enable (we) of the weight storing Block-RAM and the sample storing Block-RAM. This “logic” block has 5 outputs. The first is “sel” which controls the multiplexers to determine which set of LUTs are currently in use (l1, m1 or l2, m2). The second output is “addr” which controls the address from which we read from all the Block-RAMs (except the W Block RAM which gets its address from l1 or l2 depending on “sel”). The third is “we” which enables and disables the writing to the W Block-RAM. The forth is “weData” which enables and disables the writing to the Samples Block-RAM. The last one is “output” which enables the writing to the output of the system.
It is important to mention the used arithmetic that was the fixed point arithmetic. We are using numbers close to 1 so I used in total 20 bits from which 16 are binary bits. I used 16 binary bits because the accuracy of the calculations are in direct correlation with the quality of the outputted audio.
Step 8: Decoder Module
This module is almost the same as the previous one, except is lacks the feedback loop in which the training of the weights was performed so only the output calculation is presented. The weights are inputted from the Radio Frequency Transceiver Interface module and stored in the W Block-RAM and are rotated according to the circular buffer.
Step 9: Audio Input
An audio interface was made through which the recorded data is sent to the coder module. It was made using System Generator. The primary module can be seen on the first image, which contains two subsystems connected with registers for the purpose of separating the data output and the data input from the clock handling. The input (reset, mute) and output (data_output_left, data_output,_right) are generated/displayed in the jtag_clock_domain seen on the second image. On the third image appears the sys_clock domain, which generates the necessary clock signals for the audio data to be received from the microphone.
The fourth picture shows the 3 generated clock signals: Master clock, Bclock and PBLRC and the last signal is the on Mute signal. The master clock signal is set to the default 12.288 MHz (12.5) which gives a 48 kHz sampling rate.
The fifth picture shows how the data is supposed to be received, but for some unknown reason when the system is tested online, it won’t receive any input data. This still needs to be tested further and improved.
Step 10: Putting It Together
Work in progress
** Here I will synchronize the input data with the coder module, then with the decoder module and last with the audio output module which is almost the same as the audio input module (the data is shifted in the other direction, but the sampling rates and the clock frequencies are the same). **
Step 11: Step 12: RF Module
Work in progress
**Here I will design a radio frequency transmitter and receiver interface to send/ receive the coded signal.**