The whole idea is all about representing each sample as 2 bits instead of 8 bits thus saves 75% of the audio file size which is already in a reduced sample rate. The result would take 2KB of Arduino's flash memory (program memory) for each second. During playback a buffer window will slide through those data in flash memory and play them.
I made a python script that convert a raw 8000Hz raw audio files into a C++ header file (.h) that my library can read.
Step 1: Get the Library
Get my avr_sound library from github and install it in sketchbook/libraries/ folder.
You should be able to see the 3 included examples in your Arduino IDE.
Step 2: The Hardware
Connect digital pin 8 and digital pin 9 to R-2R resistor ladder (I used 1K and 2K Ohm values).
Connect the output of the ladder to an earphone. If you want to use a bigger speaker you may use a generic npn transistor by connecting the emitter to the ground and the base to the analog signal you get from the ladder and the collector to the speaker while connecting the other end of the speaker to +5.0V.
You can test you R-2R ladder by using analogRead function and asserting expected values.
Step 3: Going Further
We can increase quality by using more than 2 bits, the current implementation uses 2 bits (ie. we have 4 values only) for example we can use 3-bits, 3-bits, 2-bits to represent 3 samples in a byte aligned way.
We can use this to make text to speech by storing a small number of phonemes in a way similar to Cantarino