Bit Banging Step-by-step: Arduino Control of WS2811, WS2812, and WS2812B RGB LEDs




Introduction: Bit Banging Step-by-step: Arduino Control of WS2811, WS2812, and WS2812B RGB LEDs

About: Designing fun, unique, and Open-Source electronic kits for education, DIY, hobby, art, science, and more. For the tinkerer in all of us!

Disclaimer: over the past year, a few different libraries have been written for controlling these ubiquitous RGB LEDs by fellow makers from Adafruit, PJRC, and the FastSPI project.  The libraries work great, and we should try them all out.  Recently, we were asked by a few people how the low-level code really worked.  With the hope that others find the explanation useful, we put together this Instructable with a detailed answer.

When we want a microcontroller to send/receive data to/from devices using some form of digital logic, we often do so by way of standard protocols such as SPI, I2C/TWI, UART, etc.  However, there comes a time in every embedded hardware programmer's life where it is convenient or necessary to roll-up her sleeves, and crank-out her own protocol.  Such is the case for controlling the ubiquitous RGB LEDs from WorldSemi: the WS281X series.

It should be noted that there have been successful attempts to use the SPI protocol for controlling these LEDs.  Nevertheless, given the nature of their communication protocol (described below) this is a perfect setting for implementing a custom solution using a programming technique known as bitbanging.  This technique allows us to mimic different functions of specialized hardware using software.  In this case, we'll use it to toggle a digital output pin on the ATMega328p microcontroller in a highly precise manner, so that the digital signal created allows us to turn on and off a 1-by-60 array of WS2812 RGB LEDs.

Difficulty level: Beginner+ (some familiarity with Arduino programming)
Time to completion: 15-30 Minutes

Step 1: List of Materials

Inside the WS2812 and WS2812B packages resides an embedded version of the WS2811 constant-current LED driver, as well as 3 individually controlled LEDs; one red, one green, and one blue. In a compact package, the WS2811 includes:
- An internal oscillator
- A signal reshaping and amplification circuit
- A data latch
- A 3-channel, programmable constant current output drive
- 2 digital ports (serial output/input)

Despite all the intricacies under the hood, we only need to worry about one thing—besides providing power, of course—which is to send ones (1s) and zeros (0s) over to the serial input port according to what we'd like for the LEDs to do. And so, we first need to be clear as to what we want the LEDs to do before moving on to the how we want to do it.


This Instructable focuses on the firmware to communicate with the WS281X, so all we really need to follow along is a computer running the latest version of the Arduino programming interface (avaliable for OS X, GNU/Linux, and Widnows):
Arduino IDE V1.0.5

If we want to see the code controlling an actual WS2812 RGB LED, then we need the following parts:
1 x WS2812 RGB LED (pre-soldered onto a tiny breakout board)
1 x Solderless Breadboard
Solid Core Wire (assorted colors; 28 AWG)
1 x Arduino Uno R3
1 x Breakaway Pin Connector, 0.1" Pitch, 8-Pin Male

Step 2: Principles of Operation

From a high-level, we want to control the intensity and color of the WS2812.  Whereas intensity is an intuitive concept (fully on, fully off, and some range of intermediate values), color deserves a short explanation for those unfamiliar with RGB LEDs.  As mentioned a few lines ago, the WS2812 contains 3 tiny LEDs that are very close to one another.  When we turn them on simultaneously, our eyes perceive a combination of red, green, and blue light, which we interpret as different colors—this is also the principle behind the pixels on our computer screens.  By changing the intensity of each LED relative to the other two, we can get a wide range of colors; for example, if we set the red, green, and blue LEDs to their maximum intensity our eyes perceive a whitish color.

We have a good high-level view of what we'd like the LEDs to do, but we need to translate it into something that the WS2812 can understand.  It turns out that this is not difficult to do, and is similar to how colors work on most digital displays (e.g., the screen on which you're reading this!).  The intensity of all 3l LEDs inside the WS2812 can be independently set to a value ranging from 0 (fully off) to 255 (fully on). So to set the color to whitish as mentioned above, we need to tell the embedded WS2811 driver chip: 

“Hey! Set the red LED to an intensity of 255, the blue to an intensity of 255, and the green to an intensity of 255.” (as illustrated by the video demo below)

But how exactly are we going send this message to the WS2811?  We need to delve a little bit into digital logic (pun intended) to know exactly how to communicate these and any other allowable intensity values.  After a couple of steps, we'll be able break down these values into their constituent 1s and 0s, and send them serially to the digital input port of the WS2811.

Step 3: From Decimal to Binary: Breaking Down Numbers Into 1s and 0s

Breaking a number down into 1s and 0s really means using its binary representation.  We need to remember that in a binary representation (e.g., 1101 in binary represents the number 13 in decimal), each position has a 'weight' that increases from right to left by a power of 2.  Starting with the first position on the right, the 'weights' are: 2^0, 2^1, 2^2, 2^3...  This is analogous to the decimal system where starting with the first digit on the right, the 'weights' increase by a power of 10: 10^0, 10^1, 10^2, 10^3...  Different than the decimal system where we can have any number from 0...9 at each position, in the binary system we can either have a 1 or a 0. 

Say we want to find the binary representation of the decimal number 23.  We first notice that 23 is a combination of the number 3 set in the 10^0 position, and the number 2 set in the 10^1 position, which means that when we weigh each number according to its position (2*10^1+3*10^0) we get the number 23.  If we tried to do the same in binary we would come up with the number 10111 because 1*2^4+0*2^3+1*2^2+1*2^1+1*2^0 = 23.  Of course, finding the binary representation of a relatively small number such as 23 can be done without much calculation.  But for larger numbers it becomes necessary to use the following algorithm:

- Increasing from 2^0, find the first power of 2 that's larger than the decimal number we have
- Starting with the power of 2 immediately below the one we found in the first step, divide the decimal number by the powers of 2 in decreasing order, all the way down to 2^0
- After each division step, we should get either a 1 or a 0 as the quotient, and some remainder value.  The remainder eventually should go to 0 (this can occur prior to the last division step)
- The 1s and 0s obtained as the quotients give the binary representation of the decimal number

Well, if we haven't gone through the process before, it all sounds like gibberish.  Nothing like going through an example to clear things up.  Say we want to find the binary representation of the decimal number 117.  Let's try to follow the algorithm above (I'll use some personal tweaks):

- We start with 2^0 which is smaller than 117, so we keep increasing.  2^1 is also smaller, keep going...  Okay so we get to 2^5 which is still smaller than 117, but as soon as we hit 2^6 we notice that it is the “first power of 2 that's larger than the decimal number we have”
- So we know we need to start with the power of 2 immediately below 2^6, which is 2^5.  [Personal tweaks] since we know that we're going to be dividing by all powers of 2 below 2^5, I write them all down beforehand so I don't forget.  I also remind myself that the remainder of the division should end in 0 (although it could turn into 0 along the way.
Divider       | Remainder  | Quotient
2^6 (64)     |                    |
2^5 (32)     |                    |
2^4 (16)     |                    |
2^3 (8)       |                    |
2^2 (4)       |                    |
2^1 (2)       |                    |
2^0 (1)       |                    |
-With everything set, we start the division steps:
Divider       | Remainder  | Quotient
2^6 (64)     | 117             | 1
2^5 (32)     |  53              |
2^4 (16)     |                    |
2^3 (8)       |                    |
2^2 (4)       |                    |
2^1 (2)       |                    |
2^0 (1)       |                    |
117 divided by 64 gives a quotient of 1 and a remainder of 53.  Thus we know that in the 7th position of our binary representation of 117, there'll be a 1 (i.e., 1XXXXXX).  To get the other positions we simply continue the division process:
Divider       | Remainder  | Quotient
2^6 (64)     | 117             | 1
2^5 (32)     | 53               | 1
2^4 (16)     | 21               | 1
2^3 (8)       | 5                 | 0
2^2 (4)       | 5                 | 1
2^1 (2)       | 1                 | 0
2^0 (1)       | 1                 | 1
- And so, we get that the binary representation of the decimal number 117 is 1110101.  We need to remember, that even if the remainder goes to 0 before the last division step, we need to continue the process all the way down to 2^0.  Thus, in the case of the decimal number 48:
Divider       | Remainder  | Quotient
2^5 (32)     | 48               | 1
2^4 (16)     | 16               | 1
2^3 (8)       | 0                 | 0
2^2 (4)       | 0                 | 0
2^1 (2)       | 0                 | 0
2^0 (1)       | 0                 | 0
The binary representation is 110000 as opposed to 110, which we would get if we stopped diving when the remainder first reached 0.  Knowing how to break down a number into its constituent 1s and 0s is very much necessary for being able to transmit data to the WS281X.

Step 4: From Binary Numbers to Digital Logic

Okay, so now that we're familiar with the binary representation of decimal numbers we can communicate the intensity values we want to the WS2811 LED driver IC.  Since the values go from 0 to 255 for each LED, we will need 8 positions (called bits in digital logic) to cover the entire range—255 is 11111111 in binary.  And, we'll need 24 bits to transmit the values for all 3 LEDs inside each WS2812.  But how exactly can we tell the WS2811 that we want a 0 or a 1.  Well, it turns out that we need to manipulate the timing of a square wave signal to do this.

DisclaimerThere is a small variation of the timing described below depending whether you're using an actual WS2811 IC, or the embedded version inside the WS2812/WS2812B.  The numbers used below correspond to the latter case (WS2812/WS2812B).  If you're using the WS2811 IC then consult the datasheet for the slightly different numbers (other than that, everything else described below is the same).

Principle of operation
The WS2811 expects two things:
1)  A pulse (i.e., rectangular) wave signal with a frequency around 800KHz—other frequencies work as well, but we'll stick to 800KHz in this tutorial—that sets the intensity values in an internal shift register.  Let's note however, that the WS2811 behaves differently than a standard shift register in that the data are shifted in a First-In Last-Out fashion. 
2)  After the data are shifted into place, the WS2811 expects a low signal lasting at least 50μs in order to latch the data to their respective outputs.

Shifting the data
Those unfamiliar with the term 'pulse wave' might have heard of its special case: the square wave.  These type of non-sinusoidal signals consist of an alternating amplitude between a fixed maximum and a fixed minimum at a constant frequency.  When the alternation occurs symmetrically, that is, when the time during which the signal has a maximum value is identical to the time during which the signal has a minimum value, then we have the special case of a square wave.  At around 800KHz, each period of the pulse wave is around 1.25μs long (1/1.25μs = 800KHz).  For communicating with the WS2811 we need to adjust the time during which the signal is either high or low in order to signal a 0 or a 1.  There's a mistake in the datasheet from WorldSemi, so the real values should be (credit to the folks over at Adafruit for catching this):

Transmitting a 1:
Time for the signal to remain high (T1H): 0.8μs
Time for the signal to remain low (T1L): 0.45μs

Transmitting a 0:
Time for the signal to remain high (T0H): 0.4μs
Time for the signal to remain low (T0L): 0.85μs

Latching the data
After sending all the bits corresponding to the intensity values of all the LEDs that we want to control, then we need simply hold the value of the pulse wave at its minimum value for at least 50μs.

Transmitting a 'latch command':
Time for the signal to remain low (TL): >=50μs

This type of signal has the special properties of being self-clocked, and non-zero return (NZR).  So, what remains is to see how we can set our ATMega328p to produce a precisely timed signal so that we can transmitting to the array of WS2812 RGB LEDs.  [Spoiler alert!]  We'll be using the bitbanging technique.

Step 5: Bitbanging a Pulse Wave on an ATMega328p Microcontroller

One of the advantages of using a microcontroller as opposed to, say a computer's CPU, is that we have a very tight control over the timing of the instructions we program into it.  In fact, to show how precise that control can be, we'll be using assembly instructions instead of the typical high-level functions such as digitalWrite.  The use of assembly allows us to know exactly how many clock cycles are taken up during the execution of each instruction.

Since the Arduino Uno R3 development board maintains a 16MHz external clock signal on the onboard ATMega328p, the microcontroller executes a 1 clock-cycle instruction in exactly 62.5ns (1/16MHz = 62.5ns).  Since we can find out how many clock cycles each instruction takes, we can precisely control how many instructions we need to generate our signal.

As we saw previously, in order to transmit a 1 to the WS281X chip we need to transmit a signal that stays at a maximum (HIGH) value for 0.8μs, and then stays at a minimum (LOW) value for 0.45μs.  Thus, we want to write a list of instructions that:

- Set digital pin to HIGH
- Wait 0.8μs
- Sets digital pin to LOW
- Waits 0.45μs

In assembly language, this can be achieved by the following code:

  asm volatile(
    // Instruction        Clock   Description   Phase     Bit Transmitted
    "sbi  %0, %1\n\t"  // 2      PIN HIGH       (T =  2) 
    "rjmp .+0\n\t"        // 2      nop nop         (T =  4)
    "rjmp .+0\n\t"        // 2      nop nop         (T =  6)
    "rjmp .+0\n\t"        // 2      nop nop         (T =  8)
    "rjmp .+0\n\t"        // 2      nop nop         (T = 10)
    "rjmp .+0\n\t"        // 2      nop nop         (T = 12)
    "nop\n\t"               // 1      nop                (T = 13)
    "cbi   %0, %1\n\t" // 2      PIN LOW       (T = 15)
    "rjmp .+0\n\t"        // 2      nop nop         (T = 17)
    "rjmp .+0\n\t"        // 2      nop nop         (T = 19)
    "nop\n\t"               // 1      nop                (T = 20)      1
    // Input operands
    "I" (_SFR_IO_ADDR(PORT)), //%0
    "I" (PORT_PIN)            //%1

The first column includes the assembly instruction followed by a linefeed and tab characters, which make the final assembler listing generated by the compiler more readable.

The second column shows the number of clock cycles each instruction takes.  For this set of simple instructions there is only one possible value, we'll see later how some instructions (e.g., conditional) may have 1, 2, or 3 possible values.  Remember that each clock cycle on the 16MHz Arduino Uno takes 62.5ns.

The third column shows a very brief description of what each operation does.

Using the term a bit loosely, we use it to indicate the cumulative sum of clock cycles taken by the instructions that have been executed thus far.

In order to send a single 255 value—11111111 in binary—to the WS281X we need to repeat this set of instructions 8 times.  In addition, if we insert a 50μs (or greater) pause between transmissions of the 8-bit sequence, the WS281X latches the transmitted data to its output register.  Once the data are latched,  the first LED (green) of the WS281X should turn on to a maximum brightness level.  The Arduino sketch inside demonstrates this operation.

To send a 0 we need to change the code that produces a 1 by decreasing the time during which the signal has a HIGH (maximum) value, and increasing the time during which the signal is at a LOW (minimum).  In addition, we should note that the values to each LED should always be specified using 8 bits.  For instance, if we wanted to send a value of 105—1101001 in binary—we would need to send the 8 bits 01101001 including the leading 0.  The code that produces a 0 looks like:

  asm volatile(
    // Instruction        Clock   Description   Phase     Bit Transmitted
    "sbi  %0, %1\n\t"  // 2      PIN HIGH       (T =  2) 
    "rjmp .+0\n\t"        // 2      nop nop         (T =  4)
    "rjmp .+0\n\t"        // 2      nop nop         (T =  6)
    "cbi   %0, %1\n\t" // 2      PIN LOW       (T =  8)
    "rjmp .+0\n\t"        // 2      nop nop         (T = 10)
    "rjmp .+0\n\t"        // 2      nop nop         (T = 12)
    "rjmp .+0\n\t"        // 2      nop nop         (T = 14)
    "rjmp .+0\n\t"        // 2      nop nop         (T = 16)
    "rjmp .+0\n\t"        // 2      nop nop         (T = 18)
    "rjmp .+0\n\t"        // 2      nop nop         (T = 20)      0
    // Input operands
    "I" (_SFR_IO_ADDR(PORT)), //%0
    "I" (PORT_PIN)            //%1

We can use the Arduino sketch inside to generate the signal whose image can be seen on the oscilloscope screen captures that are attached to this step.

Now, for the WS281X to display the whitish color we want, we need to send not one but three 255 values—in which case our signal consists of 24 ones—before waiting the 50μs for the data to latch.  We could do this by copy-pasting the eleven assembly instructions above 23 times (you can give it a try modifying the bitbang_255.ino sketch).  But the code would be impractical for sending values to more than one WS281X chips.  A better solution would be to write a loop that would iterate through the 8-bit values until all three of them have been sent.

The sketch inside includes a clear description of the steps taken to achieve the desired outcome.  The main section, written in assembly following the logic described above, looks as follows:

  asm volatile(
   //  Instruction        Clock   Description                           Phase
   "nextbit:\n\t"              // -    label                                     (T =  0)
    "sbi  %0, %1\n\t"     // 2    signal HIGH                         (T =  2)
    "sbrc %4, 7\n\t"       // 1-2  if MSB set                           (T =  ?)         
     "mov  %6, %3\n\t"  // 0-1   tmp'll set signal high          (T =  4)
    "dec  %5\n\t"           // 1    decrease bitcount                (T =  5)
    "nop\n\t"                  // 1    nop (idle 1 clock cycle)        (T =  6)
    "st   %a2, %6\n\t"    // 2    set PORT to tmp                 (T =  8)
    "mov  %6, %7\n\t"   // 1    reset tmp to low (default)     (T =  9)
    "breq nextbyte\n\t"  // 1-2  if bitcount ==0 -> nextbyte  (T =  ?)               
    "rol  %4\n\t"             // 1    shift MSB leftwards              (T = 11)
    "rjmp .+0\n\t"           // 2    nop nop                                (T = 13)
    "cbi   %0, %1\n\t"    // 2    signal LOW                          (T = 15)
    "rjmp .+0\n\t"           // 2    nop nop                                (T = 17)
    "nop\n\t"                  // 1    nop                                        (T = 18)
    "rjmp nextbit\n\t"     // 2    bitcount !=0 -> nextbit           (T = 20)
   "nextbyte:\n\t"          // -    label                                       -
    "ldi  %5, 8\n\t"         // 1    reset bitcount                       (T = 11)
    "ld   %4, %a8+\n\t" // 2    val = *p++                             (T = 13)
    "cbi   %0, %1\n\t"   // 2    signal LOW                           (T = 15)
    "rjmp .+0\n\t"          // 2    nop nop                                 (T = 17)
    "nop\n\t"                 // 1    nop                                        (T = 18)
    "dec %9\n\t"           // 1    decrease bytecount             (T = 19)
    "brne nextbit\n\t"    // 2    if bytecount !=0 -> nextbit     (T = 20)

The best way to understand the operation of this section is to consider different case scenarios, and follow the assembly code line by line.  For instance, we know that in order to send a value of 255, we need to send 8-bits with a timing corresponding to a 1.  In other words, the Digital Pin connected to the WS281X should remain HIGH for 13 cycles (0.8125μs), and LOW for 7 (0.4375μs).  Does the code above achieve this?  Let's see what happens when we first start transmitting:

asm volatile(
   "nextbit:\n\t"             // This is only a label for directing the jumps below.
    "sbi  %0, %1\n\t"     // The signal is set to HIGH, instruction uses 2 cycles.
    "sbrc %4, 7\n\t"       // True. Sending 255 implies current MSB is 'set' (=1).
     "mov  %6, %3\n\t"  // This is executed. “tmp” is set to HIGH.
    "dec  %5\n\t"           // Bit is being transmitted, decrease bit counter.
    "nop\n\t"                  // Need to idle for getting to the 13 clock cycles.
    "st   %a2, %6\n\t"    // Write the “tmp” value to the PORT (pin still HIGH).
    "mov  %6, %7\n\t"   // Set “tmp” to low for the next pass through the loop.
    "breq nextbyte\n\t"  // False. Bit counter isn't 0, use 1 cycle and continue.
    "rol  %4\n\t"             // Shift the byte value MSB leftwards.
    "rjmp .+0\n\t"           // Idle for 2 clock cycles. Phase reached T = 13.
    "cbi   %0, %1\n\t"    // Set signal to LOW.
    "rjmp .+0\n\t"           // Idle for 2 clock cycles.
    "nop\n\t"                  // Idle for 1 clock cycle.
    "rjmp nextbit\n\t"     // Bit counter wasn't 0 so jump to next bit. T = 20.

So the instructions that actually get executed generate a signal on the data pin that is 13 cycles HIGH (0.8125μs) and 7 LOW (0.4375μs), thus sending a bit with a value of 1 to the WS281X.  If we continue to study what the code does when the rest of the bits are sent, and what it does when values other than 255 are used, we'll get a deeper understanding of this particular implementation of bitbanging. 

I personally hope that you find this tutorial useful for getting started with bitbanging your own communication protocols whenever it's necessary!

2 People Made This Project!


  • Clocks Contest

    Clocks Contest
  • Block Code Contest

    Block Code Contest
  • Baking Contest

    Baking Contest



5 months ago

You made a very detailed explanation, I bet most of the readers without knowledge of the WS2812 now understand it. Thank you. On my bench tests, I calculated all the possible timings from the datasheet, and came to a nice combination, when thinking about combination of 8 timing pulses per bit NRS, the magic number is 5 and 3. Off bit will be 3(on)5(off), the On bit 5(on)3(off).

Use the figure below as reference:
The WS2811 has a processor inside, whenever it receives a 1 level on data (T0), it triggers two timers (1) ~520ns timer and (2) 60000ns (60µs) timer, once the timer1 expires somewhere between T1 and T2, it reads the data pin again and considers such level as the logic 1 or 0 for that bit. Then it needs some little time to store such bit and to prepare the logic for the next bit, this is the time after T2, it must be a minimum of 200ns or so.

Until the data pin level goes up again, the chip does nothing, just wait. If during this wait the timer2 expires (60µs), then the chip makes the Latch. If the data pin rises before timer2 expires, it considers another T0 and do all again, starts timer1 and timer2, and sample data pin when timer1 expires, etc.

[time diagram here]

It means, the time between T2 and next T0 can be as long as you want, minimum of 200~300ns for safety, but longer as the universe, if lower than 60µs it will be considered a new bit T0, if 60µs or longer, Latch command.

The NRZ reading is that simple, IBM used it in several machines, still using I guess. In 1970 and further IBM used it in magnetic media reading (floppy disks 8") and such. I worked there at the time.

There is no information about what is the Timer1 value, but by the information from datasheet, we could guess around 520~540ns, at least is higher than 500ns.

The datasheet also says the minimum Start Bit is 350ns ±150ns, so it can goes from 200ns to 500ns. So, (T0~T1) 200ns is the minimum time for the chip to read and sense the next start bit nicely.

The minimum time from T2 to next T0, datasheet states as 600ns ±150ns, from 450ns to 750ns. So, minimum 450ns for the chip to prepare to receive a new bit start.

So, my rule of 3 and 5 is nice, since 3 "t" times can not be higher than 480ns, and 5 "t" times can not be lower than 520ns I went for "t" = 125ns, so 3t = 375ns, 5t=625ns, they are pretty apart from 500ns and they add to 8t, remember a byte? It can go to 150ns and reach the 1.2µs or closer to the 800kHz from the datasheet, but it is not a multiple of 62.5ns from running at 16MHz, but...

To transmit a bit 1, it will be HHHHHLLL and bit zero will be HHHLLLLL

Then what I did in assembly was loading the byte to be transmitted into some register (say R10), and loaded another register (say R16) with value 8 (for 8 bits).

Then for each bit to be transmitted, we need to set bit 1 the NRZ start bit, so a (set bit) SBI PORTD,5 (for example), the data line will be up. Then rotate left R10 to get the highest bit to be transmitted. When rotating, the bit 7 falls into the Carry Bit, and carry bit can be tested on a conditional instruction BRCC (branch if carry is clear) or BRCS (branch if carry is set).

Using BRCC or BRCS you can decided by bit 7 now in carry, if it is zero or one. Then at T1, If carry bit is zero, you need to drop the port pin, so a SBI PORTD,5 is enough. If one, you don't need to do nothing, since the port pin is already up from the start bit. In either case, you just need to wait the completion of the 125ns and reach T2. At that point, need to just make port pin low, CBI PORTD, 5 (clear bit) and wait again 125ns to restart everything again and send the original bit 6 of R10 (now in bit 7 due the previous rotate).

Here is funny. If carry is 1, then you don't need to do nothing on T1, just keep port pin up, but you need to drop port pin on T2. If carry is zero, then you need to drop port pin on T1, but don't need to do nothing on T2, just keep port pin down.

Reaching the new T0 time, just decrement R16 (original=8), verify if it is zero, if not repeat the sequence, if zero the 8 bits where send.

One would love to make such sequence eight times instead of looping R16.

For a super easy solution, do it in 8MHz and use an AtTiny13 chip running in internal clock, very low cost, just for that. Use the AtTiny13 serial port to receive the bytes from another processor or microcontroller. So, the AtTiny13 will be working like a WS2812 controller.

But at 8MHz, instruction clock period is 125ns and that becomes pretty close to the "t" time, without so much time for other processing instructions.

As you may be increasing the quantity of LEDs in string, you will need a larger AVR with more SRAM to store data from the serial port, then you may migrate for AtMega8, 328, or even larger. Then, you may increment commands in the serial protocol, telling the AtMega Controller to auto change colors, fading, etc. The Master on serial port will have less and less job to control such LEDs, just send packed commands to the slave AVR. Already tested this with a AtMega328 16MHz.

LDI R16, 8
; R10 is loaded with byte to transfer
SBI PORTD, 5 ; set data high
NOP ; nops, expend some time until T1-few instructions
; do nothing, data line is already up
NOP ; nops, expend some time until T2-few instructions
CBI PORTD, 5 ; set data low
NOP ; nops, expend some time until the end of T2
NOP ; nops, expend some time until T2-few instructions
CBI PORTD, 5 ; set data low
NOP ; nops, expend double time until the end of T2

Then, just repeating:
No matter how long the low level is, but no more than ~50us, it will wait for the high level to comes up, than it counts up time to decide if the bit is 1 or 0. Up less than ~450ns is zero, longer than ~600ns is one.

Question 1 year ago

How could I use this to get the number of chips used? I was thinking of using interrupts and that if I sent a value to the 10th chip but there were only 9, the return signal would trigger an interrupt and I could infer the length, but it seems the sketch works based on a known length. What machine code could I use in a loop to send a single value to a specific chip index?


1 year ago

thank you so much for this amazing tutorial...


Question 1 year ago on Step 5

how to apply this to ESP8266?

How many LEDs can be controlled with Arduino UNO and using his beautiful library ??

I can theoretically handle thousands, since I am not subject to 2KB of RAM Arduino UNO. Right ??


Reply 7 years ago on Introduction

Hi, Hely. In theory you can handle thousands indeed! The main limitations are supplying enough power to them (each one can take up to 60mA but will happily run with less), and not running out of RAM.

One thing to keep in mind is that you may start seeing flickering once the # of LEDs is big enough. Setting each LED takes around 30us (which can't be changed), so if you need to set 1000, then your code will spend 30ms setting the colors. After the colors are set, then is when you display them.

Our eyes start noticing the flicker if the refresh rate is below ~20--30Hz, so you'll need to get creative with speeding up the way you set the colors (e.g., by connecting them in parallel to your Arduino, and performing port writes).


Kml_s Lab
Kml_s Lab

Reply 2 years ago

somewhere i found that 5,000 Ledd can be drive with Atmega8(Low RAM) instead of Atmega328(2KB RAM) .

at first point ignore power consumption , wiring, framerate.

:- for storage, used external flash memory (gif to raw rgb written in chip)
:- read that hex values and display on large matrix
:- 2 options available #1= single wire output #2 8pin(full port parallel) output(625*8 = 5000)

as of theory we need 5000 Led * 3byte(15kb) RAM for buffer, other local usage is calculated.

how even its possible to drive?

what i think:- without store values which returned from flash in buffer, just toggle pin accordingly realtime.

Ex:- calculate read time for single byte from flash & as per time it will react as delay for ws2811.

realtime pin swapping may work without storing data to ram


Question 2 years ago

Hello, is it possible to decode WS2811 en assembly with atmega 328p ? and send the data to the USB port .. The purpose is to send data in usb at other systems like computers.. Good job


2 years ago

finally I solved my problem ...
thank you for your code that really helped me.


2 years ago

That is great. thanks a lot. I've read this page more than 3 times and try to write my own code. I use codevisionavr compiler and my problems started!
your code is GCC inline!. In other word this kind of assembly codes don't work in codevisionavr. It takes too many erro. for example "rjmp .+0\n\t" isn't exist in codevision. I used rjmp pc+1 without " " and some changes!
can anyone help me to change these lines to codevision?
"sbrc %4, 7\n\t" , "mov %6, %3\n\t" , "dec %5\n\t" ,"st %a2, %6\n\t" ....
and another problem is how to use my variables like var nbites ... in assembly instruction? because instruction act on registers not on variables. how did you relate your variables to assembly instruction? for example "r" (val), "r" (val), "w" (nbytes) how can i change them to use in codevision?
thank you for your help.


3 years ago

Amazing ! Simply amazin, thank you for the amazing work


5 years ago

Thanks for everything in here, very nice.

but I find a difficulties, I want to use your bit banging code to control "skylite RGB floodlight" shipped from china..well I can detect the bit-bang code using my oscilloscope but the receiver on floodlight RGB driver is 10V. how do I send the bit-bang code in 10V?

I've tried DC step-up but the signal from arduino is lost T_T.


Reply 5 years ago

That's easy. You can easily find a op-amp and boost the 3.3V IO voltage directly up to 10 V.


6 years ago

Very good, nice and clear :)

Counting instruction cycles and padding with NOPs, takes me back!

For the beginner, who ought not be confused.... I've NEVER heard of a 'pulse wave' (other than in sci-fi films); One may send 'a pulse' or 'several pulses', or 'a train of pulses'. We use square waves in digital electronics (although the edges may get degraded).

A 'pulse wave' does not imply square wave, and in this context is unhelpful and degrades the nomenclature (IMHO).

It says that acrobotic is a "bloke" in the profile, although references above imply a female hand. The later I hope, as this piece is a strong role model for young women considering a career in electronics..... how it should be done ;)


8 years ago on Introduction

I have adjusted and tested the code for 8mhz, I tried this on a ATTiny85 and it works a treat

asm volatile(
"startbit:\n\t" // label
"ldi %5, 8\n\t" // reset
"nextbit:\n\t" // label

"sbi %0, %1\n\t" // SET OUTPUT HIGH
"sbrc %4, 7\n\t"
// Skip if HiBit in value is clear
"rjmp bitset\n\t" //
jump if HiBit is set
"cbi %0, %1\n\t" // clear output bit

"dec %5\n\t" // decrement nBits
"rol %4\n\t" //
shift value left to get to next bit
"brne nextbit\n\t" // branch
if bits not finished
"rjmp nextbyte\n\t" // jump to next byte

"bitset:\n\t" // label
"rol %4\n\t" // shift value
left to get to next bit
"dec %5\n\t" // decrement nBits

"cbi %0, %1\n\t" // clear output bit
"brne nextbit\n\t" //
branch back if bits not finished
"nextbyte:\n\t" // label
"ld %4, %a8+\n\t" // val = *p++ a8

"cbi %0, %1\n\t" // clear output bit
"dec %9\n\t"
// decrease bytecount
"brne startbit\n\t" // if bytes not finished
start again
:: // Input operands Operand Id (w/
"I" (_SFR_IO_ADDR(PORT)), // %0
// %1
"e" (&PORT), // %a2
"r" (high),
// %3
"r" (val), // %4
"r" (nbits),
// %5
"r" (tmp), // %6
"r" (low),
// %7
"e" (p), // %a8
"w" (nbytes)
// %9


Reply 6 years ago

is that all the code that you need if not could i please have it because im having trouble


Reply 8 years ago on Introduction

although I am very confused at what this bit of your code does...

if((rgb_arr = (uint8_t *)malloc(NUM_BYTES)))
//memset(rgb_arr, 0, NUM_BYTES);

can you explain ? I would like to make a really mini version of this for an ATTINY13 (1k/60bytes) and the malloc command uses quite a bit of flash to implement. Thanks


Reply 8 years ago on Introduction

Memory allocation is not always necessary, but you need to be careful with accessing the array. Here's a good explanation on Stack Overflow