Introduction: Designing With Discrete SPI Flash Memory
Designing with discrete flash is 1/10th the cost, uses a much smaller form factor, and requires significantly less specialized hardware than using SD flash cards.
This Instructable will show you how to add 1MB of discrete external flash memory to your microcontroller project with what I believe to be the least amount of effort possible. This is also a follow-on to my other two data-logging Instructables (an anemometer and a 3-axis wrist accelerometer) that explains how to download the data from the logger flash memory using age-old TTY command line applications found in Linux.
Whenever I'm building an Atmel ATMega or Arduino project and I need to record data, I almost always reach for a single SPI WinBond W25Q80BV 1MB flash chip rather than an SD flash subsystem. Many reasons exist to choose a discrete flash chip over an SD subsystem, and vice versa, and you'll need to consider these tradeoffs for your design. The list below contains a few tradeoffs I think about when I need to decide if I want to use a single 8-pin DIP chip or a full-on SD solution:
Hardware Complexity (Choose: Discrete)
One way to add SD flash to an Arduino system is to use a shield, such as this one by Seeed Studio (three 'e's) I bought at my local Radio Shack for $15. While shields provide convenience for prototyping, the final production assembly might not have the budget or the space to include SD hardware. An 8-pin DIP package of a discrete flash chip is much easier to drop on a protoboard than an SD shield, assuming your development board even supports a shield.
Software Complexity (Choose: Discrete)
The SD flash subsystem commonly relies on the SDFat16/32 libraries. While the devices are an SPI interface, it makes sense to use FAT since any PC/MAC can then read this card. These libraries are large and can take up precious EEPROM space on smaller embedded controllers. Compatibility and integration into your build environment may require significant debug. The software required to drive a discrete flash chip with an SPI interface is trivial and very small, as you will soon see. Maybe this says more about me than the SDFat libraries, but I find them cumbersome to work with.
Capacity & Portability (Choose: SD)
SD flash wins big here, simply pop in a larger capacity SD card into the existing design with no modifications. Discrete SPI flash has lower density limits in the 8-pin DIP format. The SDFat library means any PC/MAC can read the files on the card.
Cost (Choose: Discrete)
SD cards range in price dramatically, and with an SD flash shield, can set you back $20-$30. WinBond 1MB chips cost about $2 from Mouser or Digikey.
Power (Choose: Discrete)
Energy requirements of flash depend on the manufacturer, production lot, device density, and process technology. SD cards are typically higher leakage power due to the higher densities, and higher dynamic power due to the higher access speeds. The WinBond chips I focus on in this Instructable require very little power, 6uW standby, 60mW page program, and 60mW chip erase. I wasn't able to find power data on the high-end super-fast SD cards, but the write speed is about 100x that of the WinBond. Since dynamic power is proportional to frequency, I can't imagine power would be less.
Speed (Choose: SD)
I haven't had any need for very fast flash memory write performance, but SD flash comes in many different product SKUs based on speed (mostly due to the demands of digital photography and the use of raw image formats). The WinBond SPI chips can't really compare: page program speed is 0.7ms for 256 bytes, which translates to 0.360MB/s, which is 100x slower than Team Corp.’s fastest Micro SD cards at ~40MB/s. I suspect they have multiple devices or arrays writing in parallel to achieve those speeds.
While this analysis most likely represents my own lazy biases, I find my brand of laziness to be rather prolific. That being said, any one of these vectors may be more important for your project, but my goal here is to call out the tradeoffs, and then illustrate the simplicity of this wonderful flash chip. (And I haven't even discussed using larger capacity parallel flash chips.)
Step 1: What Is SPI Flash Memory?
I'm going to explain this next part painfully fast. My first job at Intel was in the flash memory group in 1993, and a lot has changed with the technology in the 20 years since then, but some concepts remain consistent.
Flash memory is a type of nonvolatile storage memory based on MOSFET technology. Nonvolatile means the device retains its value when it isn't powered-up.
If you aren't familiar with how a MOSFET transistor works, I'll try to explain it in one sentence: a slab of silicon with two terminals on either end doesn't conduct electricity if you place a potential difference between them, but if you stick another piece of metal on top of that slab and sandwich a dielectric between it, and then apply a voltage to that piece of metal it creates a field and current can flow between the two terminals. The terminals are called the source and drain, and the metal is called the gate. That's a super simple explanation that bulldozes 50 years of quantum physics, but from a Michael Farady point-of-view, it is reasonably workable.
Flash memory operates by blasting a bunch of charge carriers onto the dielectric between the gate and the substrate. This is called programming, and is typically done with a much higher voltage. It actually damages the material, and after 100k program cycles, the gate will fail. To remove the charge carriers rom the dielectric, and equally high voltage, but reverse potential, pulls the carriers off the gate. This is called erasing.
A programmed flash bit has value 0 and an erased bit has value 1, an erased flash byte is 0xFF in hex. (Nowadays, flash memory can store multiple bits per cell using multiple voltage levels, but that gets really complicated.)
Typically, a flash memory contains a giant array of transistors that can be individually programmed, but only erased in groups (sectors, blocks, or the entire chip). This is simply a side effect of how the erase circuitry works: per-bit erase would require too much metal density, and isn't all that useful (in practice, erasing in larger chunks works just fine).
Since programming a single transistor is slow due to ramping up that high voltage and all of the control that goes along with that, flash is usually programmed in pages. Typically a flash device will have a small SRAM page buffer (256 bits) which the host will first rapidly fill with data, and then the host issues a page write command, and the flash chip writes all the page bytes out in a large batch job. This batch circuitry amortizes the startup write latency across a larger number of bits. Offering two or more page buffers allows the host to use a double-buffer technique to hide the write latency of the flash device.
The Serial Peripheral Interface is a brilliant invention. It is a simple serial interface that uses a chip select, a clock, a data IN and a data OUT. There are many kinds of SPI devices, as it is a very popular interface, and all SPI devices use a common library: once you know how to talk to one SPI device, you can talk to any SPI device.
The advantage to SPI is it's software simplicity, the code basically shifts data in and out of the DI and DO pins respectively, on the rising edge of a clock. And the clock is controlled by the host, it doesn't require a fancy clock circuit: the phases can be as asymmetric as you want, as long as you adhere to the minimum cycle width requirements of the device.
Flash SPI memory simply combines the best of both worlds. Note that SD cards use SPI as well as this discrete chip. Surprise! The programming interface isn't very different, but the actual instructions and timings differ.
Step 2: The WinBond Device Interface
The pinout shown above is taken from the WinBond datasheet.
Pin 1: Chip Select (/CS, sometimes called /SS, for "serial select")
CS is the "Chip Select" pin. You set the CS pin when you want to talk to that device, because you could have a dozen SPI devices all sharing the same bus, and you identify each one uniquely via their CS pin. The slash in front of CS means "active low": to talk to this device, pull this pin to logic level zero; to remove it from the shared bus, drive logic level one.
Pin 2: Data Out (DO)
Serial data is read from this pin. It will connect to the MISO (Master In / Slave Out) wire of the bus. Typically you write a command to the SPI device in a pre-determined sequence. After that sequence completes, and depending on the instruction in the sequence, data is then read off the DO pin.
Pin 3: Write Protect (/WP)
This pin disables writing. Sometimes you'll see a jumper attached to this pin in order to provide very strict control over the program/erase mechanism: if set low, the device cannot be programmed or erased. I usually hardwire it to Vdd and allow my software to control write enable/disable through serial commands (we'll talk about this later).
EDIT (2016-12-16) Thanks to user velsoft for catching a typo: I had the polarity mixed up.
Pin 4: Ground
This is simply the ground pin.
Pin 5: Data In (DI)
This is the input serial pin. It will connect to the MOSI (Master Out / Slave In) wire of the bus. Commands and data are written to this pin by the host system.
Pin 6: Clock (CLK)
The clock pin determines how data bits are transmitted on the DI and DO pins. The DI/DO pins are sampled on the rise of the clock pin.
Pin 7: Hold (/HD)
I've never used this pin, but it allows a host device to pause whatever transaction is in flight. You'll probably never have to use this pin so I leave it wired to VCC (active low).
Pin 8: VCC
This is simply the source voltage.
Step 3: How to Read a Timing Diagram
Now that I've explained flash, SPI, and a specific implementation of an SPI flash device, the next things you need to understand are communication timing diagrams*. Timing diagrams explain the sequencing of the data across the pins to issue instructions to the device. Each SPI device responds to its own set of instructions (e.g., a flash device will have a read or erase instruction) and the timing diagram is the link between the conceptual behavior of the instruction and the actual hardware protocol to execute that instruction.
In the diagram for this section I copied the chip erase timing diagram from the datasheet because it is the easiest to understand.
The bottom axis is time, the vertical axes represent four SPI pins and the sequence data should appear on them over time to execute an instruction. Note: "High impedance" means you can ignore that signal (it is driven to not 0 or 1, but extremely high resistance, so it is effectively an open circuit). Cases when two lines appear (like DI) that simple represents that some kind of transitions are happening but are unknown; a single line means a specific high or low value is present.
Let's look at the diagram from left to right and top to bottom.
In order to talk to any SPI device, it's chip-select must be brought high and then driven low (remember /CS means active low). When /CS is brought low, note that the clock in the diagram is very explicitly drawn to show eight phases. This means you must pulse the clock eight times, once per bit. At the time the clock is strobing, data in goes from high to low to high. I think the DI diagram is erroneous, because if you draw a vertical line down the rising edge of each clock and calculate the binary values of DI at those points, you should get value 11000111, or 0xC7. This is the instruction that tells the chip to erase itself. Once chip select is brought high, the internal circuitry will begin executing the 0xC7/Chip Erase function. This instruction takes about 1~2 seconds to complete.
Keep in mind, you don't need to actually toggle the clock pin 8 times to send out 8-bits of a byte, the SPI library does this for you when you use the function SPI.transfer(). You will still need to manually drive /CS with digitalWrite(), but the SCK, MOSI and MISO is all handled by the SPI functions.
You will notice in my source code a function called "not_busy()". This function continually issues a "read control register #1" and checks bit 0 which indicates if the internal operation has completed yet, and the flash is not busy. The timing of this operation matches diagram 9.2.8 of the datasheet.
* Note I am not referring to the electrical timing diagrams, which explain to the nanosecond the setup and hold times for the internal digital logic; the diagrams I'm referring to are the logical diagrams that ignore nanoseconds and describe the sequence of logical events. The actual electrical timing of the SPI interface is handled by the Arduino SPI library. And to be honest, that code isn't very complex, and could be further simplified if you are designing to one specific device.
Step 4: Interfacing to an Arduino Uno With Level-Shifters
The Arduino Uno's digital outputs transmit 0V and 5V as logic levels low and high, respectively. The WinBond flash chip only operates between 2.7V and 3.6V. Whenever logic circuits on different voltage planes need to communicate, we have to use a level-shifter.
The easiest form of level-shifter is a simple Zener diode clamp. There are many other types of level shifters in the world, some are faster, some use less power, the Zener clamp method is quick and easy.
All diodes have a reverse breakdown voltage at which point they begin to conduct. Zener diodes are specifically designed to breakdown at finely tuned voltages. In my case, I connected a 3.3V Zener diode in parallel with each of the chip's digital inputs (see the schematic). (As for the other four pins, ground is 0V, and the Uno board has a 3.3V supply for VCC, so these pins don't need a diode, and I hardwired /WP and /HOLD to 3.3V Vcc.)
UPDATE: I forgot to add the 330 Ohm resistors in series with the output of the Uno drivers. Normally, if you were connecting the digital output of the Uno to a digital input of another device, a simple wire would suffice (since you are connecting one digital logic signal to another, see the ATmega328 datasheet, section 13.1 "I/O Pin Equivalent Schematic"). But since the output path now branches through the Zener, you need a resistor to limit the maximum current driven by the logic output of the Uno/ATmega chip. Without the resistor, this path to ground may exceed the max output current of the device. Which would be bad, Ray.
Now, whenever the Uno drives a 5V logic-high into, say, the /CS pin, the Zener diode switches to breakdown mode, clamping the voltage to 3.3V, thus protecting the input logic of the flash chip.
Using these clamps, I connected the Arduino Uno's digital output pin 10 (SS) to /CS, pin 11 (MOSI) to DI, pin 12 (MISO) to DO, and pin 13 (SCK) to CLK. (Note that the pins of the Atmega328 are NOT the same pins as the Uno, e.g., the Atmega pin #19 is Uno pin #13.) The SPI software library assumes pin 10 = SS, etc.
Step 5: Code Code Code!
I wrote a sketch that allows me to communicate with the Uno via serial TTY communication via the Serial Monitor (or even a Unix prompt, as you well see). This is a helpful method for debugging new hardware, as I can issue commands interactively.
The "serialEvent()" function is a built-in callback, called whenever something happens on the default Serial object. I use this callback to construct a command string and set a boolean flag (the byte-by-byte construction of the string completes when the callback reads as semicolon ";" from the stream; I use this instead of a newline since there's no way to issue a newline from the serial monitor). When the callback constructs the string and sets the flag, the "loop()" function executes a decoder. The decoder determines which function to call based on the command string, and parses any additional parameters from the command string, and calls that function.
Each function is essentially a wrapper around a low-level implementation of a WinBond SPI functional timing diagram. I used a wrapper so that the low-level functions remain generic: I can use them again in other sketches with a simple cut-and-paste. Plus, the wrapper prints some feedback to the user, which is very useful for debug.
The screenshot above shows an interactive session with the Serial Monitor. I have issued four commands, "get_jedec_id;", "read_page 0;", "write_byte 0 2 8;", and "read_page 0;" You don't actually see the commands (the serial monitor doesn't have an echo, and I didn't print the exact command.. I probably should have), but you do see the response. It should be most clear when I read/write/read page 0. The "read_page ;" command simply dumps the specified page (in decimal). The "write_byte ;" function is a little weird, as the parameters specify a page number, an offset into that page, and then the byte. Since there is no native 32-bit register in the 16-bit Atmega, I didn't bother doing logical to physical translation, but you'll need to consider this translation at some point. Anyway, notice that the third byte of page zero is now "08h".
I could have also issued "chip_erase;" and then "read_page 0;" to illustrate an erase cycle, but hopefully you get the picture.
The low level functions start with "_", and are named "_read_page" or "_write_page" or "_erase_chip". These functions explicitly sequence out the SPI commands found in the datasheet timing diagrams. Each function ends with a call to "not_busy()" to prevent execution from proceeding before the chip has completed its internal operation.
EDIT (11-MAR-2014): There was an issue with the _read_page low-level function, I had forgotten to pull CS HIGH before pulling it LOW at the start of the function, like the other functions. This means if _read_page is the first function you call, CS may not already be high, so without a valid /CS 1->0 transition _read_page will not function properly, the first time it is called. The second time it would work fine because it leaves /CS as 1. Small but annoying bug.
Step 6: Downloading the Data With a TTY
The real reason for this Instructable is demonstrate how to download the entire flash memory to a single file. To do this, I used a Unix function, "tail -f" and a redirect.
The Unix function "tail" prints the last 10 lines of a text file. When given the parameter "-f", "tail" will remain connected to the redirect until it catches a SIGINT (e.g., Ctrl-C).
There are three windows open in this screenshot: the Arduino IDE on the left, the Serial Monitor on the upper-right, and an OSX POSIX terminal in the lower-right. In OSX/POSIX land, the USB controller of the Uno shows up as a /dev/tty device, in this case "/dev/tty.usbmodem1411". I connect "tail -f" to this device and redirect the output to a file.
I then issued a "read_page 0;" command in the serial monitor, and the output is sent through "tail" since it is connected to the output of the TTY, and then sent to the file. I then "cat" the file to prove the serial stream was captured.
Now all I need to do to dump the ENTIRE flash chip is to type this in the terminal prompt:
% tail -f /dev/tty.usbmodem1411 > 1MB_of_flash.txt
And then type this in the Serial Monitor window:
Then type CTRL-C in the terminal window to stop the "tail" process.
Done and done! This is why Unix is so vastly superior to any other operating system, IMHO.
Step 7: Conclusion
This concludes the long-awaited sequel to my data logger Instructables. I had promised a forthcoming method for pulling the data out of the flash chip, here it is.
I hope you found this Instructable useful: 99% of what I know was learned from reading things like this on the web that other people took the time to write, and I'm very grateful for their efforts.