Help me make shiftOut faster (Arduino Function)

I'm making a clock and I need an Attiny85 to output a "big" amount of data to a series of two shift registers. Using the shiftOut function that comes with the arduino lags the whole program down, making the clock annoying to the eye and reducing it's brightness. 
I managed to reduce the lag a bit by substituting the digitalWrite functions from the whole program and creating a new function called shiftOutFast to replace the existing one, where I replaced two digitalWrite functions since I couldn't figure out much more. 

void shiftOutFast(uint8_t dataPin, uint8_t clockPin, uint8_t bitOrder, byte val)
     for (i = 0; i < 8; i++)  {                                                 //could I change this to 16?
           if (bitOrder == LSBFIRST)
                 digitalWrite(dataPin, !!(val & (1 << i)));
                 digitalWrite(dataPin, !!(val & (1 << (7 - i)))); //I guess here I should replace the 7 by a 15?
           PORTB |= _BV(PB0);                                         //there was a digitalWrite here
           PORTB &= ~_BV(PB0);                                     //and also here (I believe this is called Port Manipulation)

This is the normal way I use the function, this whole operation is repeated 5 times, one for each digit position on the display and another one for the dots in the middle. As you can see I use it two times since shiftOut can only output 8 bits at a time.

PORTB &= ~_BV(PB1); //latchPin low
shiftOutFast(dataPin, clockPin, MSBFIRST, 8); //the values for the second shift register are sent first, this determines the digit that will be lit (1,2,3,4th position or the dots)
shiftOutFast(dataPin, clockPin, MSBFIRST, numbers2[m]); //then pushed in place by the first one, this determines the number
PORTB |= _BV(PB1); //LatchPin high

//maybe I could reduce it to only one shiftOutFast (see 2nd question)

I've reached the apex of what my programming skills can let me do, so I'll just simply ask two questions:

  • Can I further optimize this function? I could remove the if since I won't be using the LSBFIRST
  • If I changed the 8 by a 16 and the 7 by a 15 at the "for" inside the shiftOutFast function written above, could I reduce the use to only one function sending the 16 bits?

Here's the whole program for the clock for further reference (this is actually a timer since attiny's aren't reliable when counting time) I would be extremely grateful if you found some way to optimize it or sent me a hint or two.


Thank you!

Picture of Help me make shiftOut faster (Arduino Function)
sort by: active | newest | oldest
-max-1 year ago

I've heard that the DigitalWrite in arduino is notoriously slow, and that there are ways to control the hardware ports and registers for the pins of your specific microcontroller in C or assembly, here is one 'able that covers that for the popular ATmega328 and 128.


He was able to improve the time by over an order of magnitude, 10 times faster!!!!

-max- -max-1 year ago

Reading your question, I actually see you have already done that. Using dedicated hardware of any sort to do the processing will always be faster, like steve said, SPI should be faster. However, I don't think speed is your issue, (clocks only need to update once per second at the most, right?)

Since you are using a shift register with a high Z output, you can implement a "blanking" interval so that while data is being put into the shift registers, they are not outputting any signals to the LED display. So that way the digits that are supposed to be dark will remain completly dark while the screen updates. There will still be a small period of time while the lit LED digits are dark, and that can be eliminated by adding small value (<1uF) capacitors to each output of the shift register. Capacitors resist changes in voltage, so when the shift register goes into high impedance mode for all pins, the capacitor will continue to light up the lit LEDs for a very short amount of time until the next update.

Victor805 (author)  -max-1 year ago

Thanks to that instructable I was able to replace the digitalWrite funtcion and gained some time, but most of the time is consumed by the ShiftOut function, replacing the digitalWrite functions inside it has allowed me to improve it noticeably.

"However, I don't think speed is your issue, (clocks only need to update once per second at the most, right?)"
The problem is the clock is multiplexing, and so digits are constantly changing in order for the persistence of vision to work, I don't have any problems with operations, but the refresh rate must be high in order to avoid flickering.

I just realized the shift register could go into high impedance mode by sending a HIGH signal to the output enable pin, In my board I connected it directly to ground, so I can't change that unless I redo the circuit. Thanks for pointing that out, it might be very useful in my next projects.

The capacitor fix seemed a great idea, I even tried it connecting at the 1,2,3,4,: outputs, but it never worked since the energy stored in them get's drained by the shift register when it goes into low state since I haven't connected the Output Enable pin. And even then, the energy sent would be wasted on displaying a different number. For example:

1 is high, and has a capacitor attached to it, at the same time the number 8 is being displayed, this means (a,b,c,d,e,f,g) are low, then 2 is high, and the number changes to 1, this means (b and c) are low, but the rest is high. The output of 1 will remain high for a little bit thanks to the capacitor, but now the energy of the capacitor will drain into different LED, since the LOW pins of the display have changed.

I'm going to try to modify the function to send 16 bits at a time, the current time between each refresh is 10.2mS, let's see if I can improve it.

That's too bad, hopefully you don't have to re-spin the board for a hardware solution. If you were able to squeeze in the capacitors, maybe you could also bodge in some resistors in series between the registers and LED's so that when the shift register goes low or changes state, the capacitor charge does not change instantaneously. some diodes may also help, so that way the shift registers can only sink or drain current, depending on the direction of the diode.

You can pinpoint exactly where all the latency is by using a micros() function, which if I remember correctly returns the microseconds the program has been running for. Define 2 long integers, and set one equal to microseconds before some code under test, and the other right after that code, the difference of the 2 should be a close approximation of the time it took to execute. (I think that method is most accurate when it takes a particularly long time to complete)

Victor805 (author)  Victor8051 year ago

No, it didn't work, i tried this:

void shiftOutFast(uint16_t dataPin, uint16_t clockPin, uint16_t bitOrder, byte val)
for (i = 0; i < 16; i++) {

digitalWrite(dataPin, !!(val & (1 << (15 - i))));

PORTB |= _BV(PB0);
PORTB &= ~_BV(PB0);

I don't know if it'd make much difference but since the dataPin is defined at compile time, why not try replacing the digitalWrite() calls in your shiftOutFast() function with direct access? E.g.

if (val & (1 << (15 - i))) { PORTB |= _BV(PB2); } else { PORTB &= ~_BV(PB2); }

Victor805 (author)  mostlyglue1 year ago

Thanks, I wanted to replace that DigitalWrite but I didn't know how, I'll try that in a moment.

Change display driver to a MAX7221...

Victor805 (author)  steveastrouk1 year ago

Thanks, I'll also look into it.

-max-1 year ago

Also, I was recently working on interfacing a large 16 bit parallel interface DAC, and since my original plans require using 4 of these DACs, that is 64 GPIO pins!!!!

I have similarly run into the issue of getting 16 bits of data into 2 registers quickly, and since I decided to use the 74C164 shift registers, which only have a 2 state output, (no high impedance mode) it would have been much harder to use 16 decent transistor or FET solutions to disconnect them from the DAC while updating the data in the buffers. So I had to do some analog blanking of the signal, since the there is only one analog output from the DAC, only one analog switch is needed to blank the output while data is being pushed into the registers. Luckily my application does not need to be fast, just accurate.

-max- -max-1 year ago

One last comment: if this is for the desktop clock or something, you may find yourself correcting the time one it quite often. The built in oscillators on mos arduino chips is not that good or accurate, and in some cases, can even lead to data corruption on UART and other asynchronous data.

Also, have you considered using dedicated 7 segment LED drivers chips? Something like the old faithful 74LS47, or MC4511, or something similar. These basically let you input 4 bytes to control a 7 digit display. A LM4026 which is a 7 segment LED counter. Pulse the clock pin X amount of times and the counter will count and display that! Simple! In fact, you can use TTL/CMOS building blocks to make a clock not even using a micro controller! Check out this video below! *THATS* how you make a REAL digital clock lol!

Victor805 (author)  -max-1 year ago
The built in oscillators on mos arduino chips is not that good or
accurate, and in some cases, can even lead to data corruption on UART
and other asynchronous data.

I knew this, this was just a proof of concept and an excuse to use 595 ICs, maybe I'll eventually build one with an external oscillator with temperature control for extra precision.

Also, have you considered using dedicated 7 segment LED
drivers chips? Something like the old faithful 74LS47, or MC4511, or
something similar.

No, but mainly because I didn't knew about some popular drivers, I'll take note of those, many thanks!

I watched that video too, I might be wrong, but I recall the clock signal was taken from the grid, then the logic did the rest, really nice, it's cool not to have to rely on microcontrolles and just use a bunch of chips to do the same job.

It is certainly amazing what cool things you can make even without microcontrollers! He gets the clock signal from mains actually, tapping of the AC voltage from the transformer to get 50Hz clock. Mains power has surprisingly good frequency stability, things like microwaves and ovens actually use mains for timing!

iceng1 year ago

Machine code will run much faster then C code.

Victor805 (author)  iceng1 year ago

I know, but I have to rely on C because it's much easier, I'll eventually learn a better way to program microcontrollers since I'm constantly reading and learning new things, but I still get lost easily, that will do just for now.

Use the SPI pins, and this library to call them


Victor805 (author)  steveastrouk1 year ago

Thanks for the information, the main problem I have is there's a lot of content and I don't know where to start looking.

Not 100% if it works on a Attiny as well but you could movoe the values into EEprom storage and read them from there.
Or if the values are fixed create a table with the values.