Step 4: Decode the Audio
So, now we've got a bunch of audio on our device. How do we decode it? I based my code on an Android tutorial which shows how to record data and then play it back. In my case, I made sure to save that audio as 16bit PCM encoded. I sampled at 44100hz. On Android (and elsewhere, I suppose) 16bit PCM data means that each sample is a signed 16bit value. Since we only care about the frequency, we only need to care about how much time there is between "zero-crossings". A zero-crossing is when the signal goes from postive to negative or vice-versa. A 0 bit will be represented by the space between 2 crossings, and a 1 will have an extra crossing in approximately the same time period.
Card data in each track starts off with some (variable) number of 0s, to establish the base frequency. What I did was listen for the first sample above a certain "quiet" threshold, then count the number of samples between zero-crossings. That number becomes the base value for a 0. Since these cards are hand-swiped, the actual frequencies will change somewhat from the start of the scan to the end. So, I made a simple method which determines if the number of samples since the last zero-crossing is closer to the base frequency or twice the base frequency (half the base number of samples). It then adjusts the expected base frequency accordingly. This works well, so long as the changes between any two logical bits are fairly small. And they almost certainly will be.
To detect a zero-crossing, we need to look at the sign of each sample and compare it to the sign of the previous sample. If they differ (one positive, one negative) then the signal crossed 0 between those samples.
The basic algorithm is to iterate through the byte array, extracting samples. Count the number of samples between zero-crossings, and compare the count to the expected count for a 0 or 1.
Okay, after some hand-waving, we now have a binary sequence of data, which we want to turn back into ASCII. The most common encoding (and the only one I wrote a handler for) encodes each character as some number of bits, plus one parity bit. In the case of track 2, that's 4 bits for the character, and 1 for parity, making 5 bit groups. The bits are read from least significant to most, with the parity bit last. The parity bit is set to make the number of 1s in the group odd. In my implementation, I just disregard the parity bit, but it would help determine whether the read was good or not. In track 1, it's 6 bits for the character, plus the parity.
The character set of the tracks differ too, but both are subsets of ASCII with some offset. In the case of track 2, which only encodes some symbols and digits, the character set starts at 48, which is the ASCII code for "0". So if we get 0,0,0,0,1 as our character, we turn that into 0, add 48, and get 48. Similarly, 1,0,0,0,0 is 1. 1+48 = 49 = ASCII "1".
For track 1, the character set starts with " " (space) which is ASCII 32. So we add 32 to the decoded numeric value and get our ASCII character. After that, we have the data, so all that remains is hooking up the UI glue.