Introduction: Running Average for Your Microcontroller Projects
In this instructable I will explain what a running average is and why you should care about it, as well as show you how it should be implemented for maximum computational efficiency (don't worry about complexity, it is very simple to understand and I will provide an easy to use library for your arduino projects as well:)
Running average, also commonly referred to as moving average, moving mean or running mean, is a term used for describing average value of last N values in data series. It can be calculated just as normal average or you can use a trick to make it have a minimal impact on performance of your code.
Step 1: Use Case: Smoothing Out ADC Measurements
Arduino has a decent 10 bit ADC with very little noise. When measuring value on a sensor such as potentiometer, photoresistor or other high noise components, it is hard to trust that measurement is correct.
One solution is to take multiple measurements every time you want to read your sensor and average them out. In some cases this is a viable solution but not always. If you wanted to read ADC 1000 times per second, you would have to 10 000 if you took average of 10 measurements. A huge waste of computation time.
My proposed solution is to take measurements 1000 times a second, update running average each time and use it as current value. This method introduces some latency but reduces computational complexity of your application, giving you a lot more time for additional processing.
In picture above I used running average of last 32 measurements. You will see that this method is not 100% failproof but it improves accuracy significantly (it is no worse than averaging 32 samples each time). If you wanted to calculate an average of 32 measurements each time, that would take over 0.25 ms on Arduino UNO for measurements alone!
Step 2: Use Case: Measuring DC Component of Microphone Signal
Arduino can measure voltages between 0 and Vcc (normally 5 V). Audio signal is completely AC and if you want to measure it on a microcontroller, you have to bias it around 1/2 Vcc. In an Arduino UNO project that would mean roughly 2.5 V (DC) + audio signal (AC). When using 10 bit ADC and 5 V power supply, 2.5 V bias should equal measurement of 512. So to get an AC value of signal, 512 should be subtracted from ADC measurement and that is it, right?
In an ideal world, that would be true. Unfortunately real life is more complicated and our signal bias tends to drift. Very common is 50 Hz noise (60 Hz if you live in US) from electrical network. Usually it isn't all too problematic but it is good to know it exists. More problematic is linear drift from heating of components. You carefully set DC offset correction at start and it slowly drifts away as your application is running.
I will illustrate this problem with a (music) beat detector. You setup your bias removal and beats are clear (picture 2). After some time, DC bias moves and beats are barely noticeable to the microcontroller (picture 3). Beat detection algorithm will be explored in depth in a future instructable as it exceeds the scope of this article.
Fortunately there is a way to constantly keep calculating audio's DC offset. It will come as no surprise that running average, topic of this instructable, provides a solution.
We know that average value of any AC signal is 0. Using this knowledge we can deduct that average value of AC+DC signal is it's DC bias. To remove it, we can take a running average of last few values and subtract it from current ADC reading. Note that you need to use a long enough running average. For audio, a tenth of a second (number of samples depends on your sample rate) should suffice but know that longer averages work better. In first picture you can see example of real DC bias calculation with running average with 64 elements at 1 kHz sample rate (less than I recommended but it still works).
Step 3: Calculation
You can imagine running average as averaging weight of people in doctor's waiting room. Doctor finishes examining one patient and simultaneously a new one walks into the waiting room.
To find out average weight of all waiting patients in waiting room, nurse could then ask each patient about their weight, add those numbers up and divide by the number of patients. Every time doctor accepts new patient, nurse would repeat whole process.
You might be thinking: "This doesn't sound all too efficient... There must be a better way to do this." And you would be correct.
To optimise this process, nurse could keep a record of total weight of current group of patients. Once doctor calls new patient in, nurse would ask him about his weight and subtract it from group total an let him go. Nurse would then ask patient who just walked into the waiting room about his weight and add it to the total. Average weight of patients after each shift would be sum of weights divided by number of patients (yes, same as before but now nurse only asked two people about their weight instead of all of them). I realise this paragraph might have been a bit confusing so please see illustration above for additional clarity (or ask questions in comments).
But even if you didn't find the last paragraph confusing you might have questions such as what should be in accumulator in the beginning, how do I put what I just read in an actual C code? That will be addressed in next step, where you will also get my source code.
Step 4: The Code
In order to calculate running average, you first need a way to store last N values. you could have an array with N elements and move entire contents one place each time you add an element (please don't do this), or you could overwrite one old element and adjust pointer to next element to be thrown out (please do this:)
Accumulator should start initialised to 0, same goes for all elements in delay line. In other case your running average will be always wrong. You will see that delayLine_init takes care of initialising the delay line, you should take care of accumulator yourself.
adding an element to delay line is as easy as decrementing index of newest element by 1, making sure it doesn't point out side of delay line array. after decrementing index when it is 0, it will loop around to 255 (because it is an 8 bit unsigned integer). Modulo (%) operator with the size of delay line array will ensure index will point to a valid element.
Calculating a running average should be easy to understand if you followed my analogy in previous step. Subtract oldest element from accumulator, add newest value to accumulator, push newest value to the delay line, return accumulator divided by number of elements.
Easy, right?
Please feel free to experiment with using the attached code to better understand how all of this works. As it currently stands, arduino reads analog value on analog pin A0 and prints "[ADC value] , [running average]" on serial port at 115200 baud rate. If you open up arduino's serial plotter on correct baud rate, you will see two lines: ADC value (blue) and smoothed out value (red).
Step 5: Extras
There are a few things that you don't necessarily need to know in order to use running average in your project ut won't hurt to know.
delay: I will start with talking about illustration of this step. You will notice that running average of more elements introduces bigger delay. If your response time to change in value is critical, you might want to use shorter running average or increase sample rate (measure more often).
Moving on.
initialising: When I talked about initialising accumulator and delay elements, I said you should initialise them all to 0. Alternatively you could initialize delay line to anything you like but the accumulator should start as a sum of newest N elements in delay line (where N is number of elements in your running average). If accumulator starts as any other value, calculated average will be wrong - either too low or too high, always by the same amount (assuming same initial conditions). I suggest you try to learn why this is so by using some "pen and paper simulation".
accumulator size: You should also note that accumulator should be big enough to store sum of all elements in delay line if they are all positive or negative max. Practically that means accumulator should be one data type greater than delay line elements and signed, if delay line elements are signed.
trick: Long delay lines take up a lot of memory. That can quickly become a problem. If you are very memory restricted and don't care much about accuracy, you can approximate running average by omitting delay entirely and doing this instead: subtract 1/N * accumulator from accumulator and add new value (on example of 8 long running average: accumulator = accumulator * 7 / 8 + newValue). This method gives wrong result but it is a decent method of calculating running average when you are running low on memory.
linguistics: "running average/mean" is typically used when referring to real time averaging while "moving average/mean" usually means algorithm is running on static data set such as excel spreadsheet.
Step 6: Conclusion
I hope this instructable was easy enough to understand and that it will help you in your future projects. Please feel free to post questions in comments below if there is anything unclear.