First off - would like to say
Good on you Alan Burlison.
This is not meant to be bagging you in any way. Your code did what it needed to do. Great success. My initial response in a forum comment was actually directed at the people who where offering non-working ideas of using a UART to get some hardware help.
My first suggestion of using a timer to help out is partly fleshed out below, but not fully functional. The reason it is not complete is that when I started to fill out the code it became obvious that with a bit more optimization there is plenty clocks to do the full-monty as Bit-Banging without having to unroll any loops.
The second bit of action shown here is my other suggestion. One of the "use a UART" people said that you could use an inverter to fix up the START-BIT problem. I thought "Well - if you are going to throw a 74XX at it, why not use the SPI and have 140 clock cycles free." Again this is not a complete solution, but is a "proof of concept" to show how the hardware can help.
Finally the third piece is a version of bit banging out a WS2811 that I came up with. Sans a WS2811 because I don't have any. It does not do anything better than Alans code. It is just a bit more optimized (1/2 size) and easier to read due to no loop unrolling and path-lengthening.
It does not break any new ground, there is no magic in it that no one has ever used. It is just a little bit me showing off and a little bit of practice for me. I have been away from the assembler for several years and am just trying to build up my confidence a little bit.
Anyways - On with the show
Step 1: Using TCO to generate the waveform
Sorry guys, but I can't work out how to add <code></code> to this thing.
So I have added a quite useless picture of the code instead.
It at least has the code/comments in glorious technicolor. If anyone wants the ASM file then send me a mail on here with your real email address and I will FWD it to you.
But back to the point.
This method of generating the pulses actually is slower (by one clock) than just pure bit banging. However it has one big advantage. All your free clock cycles (14 of them) are in one contiguous block. The bit banging version has a total of 15 free clocks, but they are broken up into two blocks AND the output-test must go at the start which limits some of the other tricks you could have used.
The astute out there will notice that the scope shows the waveform at 400Khz. My AVR on the desk here is clocked at 8Mhz not 16. So it is apples for apples.