There are already a few tutorials out there showing how to control a 32x32 RGB LED display.

This project has some slight variations with the following features:

Based on a STM32F401RE eval board
Software BSP generated by STM32CubeMX (v1.4.0 used here, newer version might need changes in the source code)
Built with Eclipse
Pixel data fed via SPI (Less cables to run)
High frame refresh rate, rather than doing a PWM cycle on each single line (for a more steady picture)
Only 16 brightness levels per color though
Old skool plasma effect for demonstration purposes

The only source code provided is a patch to the auto-generated STM32CubeMX files as well as the Eclipse project.

Needed software:

Windows (yeah, sorry about that. But STM32CubeMX runs in Windows)
STM32CubeMX
STM32F4 DSP and standard peripherals library
GCC ARM embedded (e.g. 4.9-2014-q4)
OpenOCD 0.8.0
Git for windows
Eclipse

Step 1: Hooking Up the RGB LED Matrix to the STM32F401RE.

The 32x32 display has two 16 pin headers as well as a 4 pin power port. The power port is connected as usual, most probably with the supplied connector.

The header labeled 'INPUT' is used to supply the pixel data and line selection. The header 'OUTPU' can be used to chain multiple displays together. Since we only want to use a single wire and the SPI drivers to supply display data to two selected lines at once there is some creative wiring necessary.

The display electronics consists mainly of shift registers with a common clock, and anode drivers to select the currently powered lines. There are always two lines powered at once, and the 32*3=96 LEDs per line are triggered individually through the shift registers.

In an ideal scenario you supply 6 bit streams of data on R0, G0, B0, R1, G1, B1 and load the shift registers using the common clock. Using STB, you then can latch the data of the shift registers to the outputs to drive the LEDs. There is also an output enable signal, labeled 'OE', that you use to have no side effects when changing lines and latching data. But more on that later...

The 'OUTPUT' headers in fact carries the serial data output of the shift registers; just imagine it as the last bit that was dropped from the shift register on the last clock. So if you e.g. want to chain two displays together, you would need to 64 clock cycles of data, whereas the first bit in the bit stream would be the last LED to be triggered, and the last bit would be the first LED in the row.

Here we only want to have one display and the wish for a minimal amount of wires. So we connect the outputs back to the inputs. In fact we do:

out R0 -> in G0
out G0 -> in B0
out B0 -> in R1
out R1 -> in G1
out B1 -> in B1

We will use the input for R0 for the bitstream. Using this setup, we will need to stream 32*3*2=192 bits per two lines. We will do this via the SPI output of the STM32F4. Calculating the needed bandwidth for a desired color depth as well as refresh rate will be done later.

To complete the picture we will need to look at the pins labeled A, B, C, D. These are used to select the two lines that are being powered. Those four signals will be decoded into the lines 0-15, and the line drivers will enable the selected line in the top half as well as bottom half.

Having all this information already gives a clear picture on what the display driver will need to do.

Step 2: Design Your BSP Using STM32Cube.

In this step we configure the peripherals and clocks of the STM32F4.

To jump start that, you can also just open the 32x32.ioc file in the hw/ sub-folder with STM32CubeMX.

We basically need the following IOs:

SPI output data and clock (SPI1_MOSI and SPI1_SCK)
Four GPIOs to drive the line selection (A,B,C,D)
Two GPIOs to drive OE and STB

Note that the pins for the SPI port are fixed. The other ports can be chosen to be any free GPIO. I used PC0 to PC3 for A to D, and PB0, PB1 for OE and STB.

In terms of clock configuration we clock the little STM32F4 at 64MHz. Just to be on the safe side. This can be done by setting the main PLL and prescaler values to something that works.

We also need to include the SPI driver, enable interrupts, and also enable DMA. Only the first page of the tabs are shown, the rest is pretty self explanatory. To get the DMA to do the right thing we also have to configure that. We basically want the DMA to change the memory location of the source, and always write to the same SPI register. That's why we check "increment address" for the "Memory", and keep it unchecked for "Peripheral".

To later do the PWM to get different brightness levels and do the screen refresh, we use TIM3. We need to enable interrupts on the second tab (not shown here).

Using the configuration set in the first tab, we can calculate the interrupt rate:

(64000000Hz / 32 (prescaler)) / 130 = 15384Hz

We will understand why we chose that number when debating the display driver.

Step 3: What the Display Driver Will Do.

As presented before, the shift registers are now coupled as follows:

R0 -> G0 -> B0 -> R1 -> G1 -> B1

Each of those entries is 32bit. The input is placed at R0. If we now push 32 * 3 * 2 bits via SPI to the display, we in fact fill those two lines. Based on the wiring, we will push B1, G1, R1, B0, G0, R0, as what we will push last will later be first.

Imagine having no shades for colors, we will now need to cycle through all the lines by doing the following over and over:

Drive output enable (OE) high to disable the LED
Drive GPIO-pins for line selection
Latch data in the shift registers to the outputs by pulling STB low
Drive output enable (OE) low to enable the LED
Start SPI transfer to upload next line into the shift registers

I will reference this later as a 'frame scan'.

In our case we assume that the SPI transfer will finish before we strobe in the next line. This can be calculated pretty easily, when just using e.g. a best case assumption (SPI bit rate = pixel clock).

To now enable different shades for colors (and so be able to mix colors a little bit more sophisticated) we need to turn LEDs on and off pretty fast. In this example we use 4 bit to encode intensity values. That makes up for 15 intensities + off. For each 'frame scan' (see above), we will have a bit mask that we compare to each 4 bit pixel value and decide if we should enable the LED or not. The pseudo-code is as follows:

<p>bitmask = bit_angle_modulation_lookup[ctr];</p><p>for each pixel for both two lines:
	if ( pixel_value & bitmask ) LED[pixel] = ON; else LED[pixel] = OFF;
ctr = (ctr+1)%15;</p>

The BAM_MASK corresponds e.g. to the binary representation of the value ctr+1, and should have the following properties:

If pixel_value is 0, then no LED will be on over all 'frame scan's (given by the logical or)
If pixel_value is 1, then over the 15 cycles, the LED shall be on once
...
If pixel_value is 15, then over the 15 cycles, the LED shall always be on

As a side note: Doing the mask with the 'logical AND' was a random decision instead of using 'compare equal', as it saves doing masking operations on each value in the end.

The 'frame scan' rate can now be calculated as:

15384Hz / 16 Lines / 15 Intensity levels = 64.1Hz

We only have 15 intensity levels, as black is given as 'always off'.

Step 4: Interrupts, SPI and DMA, Clocks, GPIOs.

The configuration for the peripherals is generated by the CubeMX software using the "Generate Code" button. The initialization code for the System Clock, SPI, Timer (TIM3), DMA and GPIO are generated into the main.c file.

Some more initialization code for the pin functions belonging e.g. to SPI are generated into the stm32f4xx_hal_msp.c file.

The callbacks that are later called to execute the interrupt code are defined as 'weak' function definitions in the HAL drivers. They are overwritten in the code in main.c by:

void HAL_SPI_TxCpltCallback(SPI_HandleTypeDef *hspi) { ... }
void HAL_TIM_PeriodElapsedCallback(TIM_HandleTypeDef *htim) { ... }

Just in case you were searching for a 'register callback' or similar.

Step 5: V-Sync and Buffer Switching.

Just a short explanation what those are and why we need them:

As was noted before, the display refresh rate is 64.1Hz. During each refresh, the image get's redrawn multiple times. We need to make sure that the content doesn't change during the redraw in one cycle as:

this would mess up color reproduction
would introduce artifacts (tearing)

We will indicate a start of a new refresh cycle by triggering a V-Sync signal. This terminology comes from the days of CRT displays and was similar to the electron beam going from the bottom right back to the top left.

In theory we would just need to make sure that new display content is available at the time of the V-Sync. To reduce copy overhead, we also make use of double buffering: There is always one active frame buffer that is drawn to the display, and a second back buffer that is used for rendering the graphics. In our case the plasma effect.

In practice we have two executing code paths at a time:

Main thread that renders the plasma
Display thread that toggles LEDs

The main thread will do the following steps:

Wait for V-Sync
Draw plasma effect to back buffer
Go to step 1

The display thread triggered by the timer interrupts will do the following step:

If new frame cycle, trigger V-Sync and switch buffers
Toggle LEDs for current 'frame scan'
Increment 'frame scan' counter
Go to step 1

Step 6: The Plasma Effect.

There are many tutorials for plasma effects on the web. This is also just a wild mix of sinoids put together to do something. Some things to note though:

It uses the floating point version for the Cortex M4. I did not bother using the fix-point math.
There is no gamma correction. As there are only 16 intensities, this sounded like overkill.
The layout of the frame buffer is a little awkward, and mostly built to make the code easy, but not necessarily readable. But you'll figure it out :)

Step 7: Putting It All Together.

Wire everything up, then download the projects from

https://github.com/cpmetz/32x32-instructable

Building will be hard, but after everything compiles you'll be a master of the STM32F4. Which is kind of nice.

In the end, it should look like this:

http://youtu.be/u-XY1kCQB6s

Introduction: 32x32 RGB LED Plasma W/ STM32F4.