Arduino Is Slow - and How to Fix It!




Introduction: Arduino Is Slow - and How to Fix It!

About: Check out for some projects.
Arduino is slow? What? This instructable will show just how slow a part of Arduino is, and how to fix it.
It’s true – more specifically, Arduino’s digitalWrite command takes a considerable amount of time. If you are just switching on a LED once or something, you won’t be able to notice it. However, I realized how slow it was while I was trying to use a TLC5947 PWM driver. That requires the microcontroller to shift in 288 bytes each time! Each byte required about 12 digitalWrites, for a total of 3456 digitalWrites each time I wanted to shift in new data to the TLC5947.
How long did that take? 30 seconds of just digitalWrite!
But there is a solution – using “true c” style commands, or what the AVR GCC (GNU C Compiler) uses. The brains behind Arduinos are ATMega168s or ATMega328s. The AVR community typically uses “true c” commands to program these chips, using AVR Studio 4. The advantage of using these “true c” commands is that it does exactly what you tell it to do.
But before we get in to these commands, we must get familiar with port and pin definitions in the next step!

(If you predict you will like this instructable, feel free to vote for the Arduino contest!)

Step 1: The Truth About Pins

Arduino users know that the pins are labeled as digital 0-13 and analog 0-5. The makers behind Arduino used this for simplicity. The actual ATMega chip’s pins are labeled differently, however. Each pin has an assigned letter and number. The numbers range from 0-7. For example, pins can be A0-A8, B0-B8, and so on. All of the AVR 8-bit pins are labeled this way.
To help you clarify which digital/analog pin corresponds to which AVR pin, see the chart below.
My Seeeduino has a LED built in on digial pin 13. So, looking at the chart, its real pin would be B5.
Next I will show you what the actual C command is.

Step 2: A and B Watch Out, C Is Here

Now it is time for the “true c” style statements! The thing is, there are quite a few of them and they all do the same thing. It gets confusing, but basically it boils down to changing a port register in some way.
There are many right ways to do it, but here is the way that I found to be the simplest. To turn a port high, use this command:
PORT{letter} |= _BV(P{letter}{number});
To turn a port low, use this command:
PORT{letter} &= ~_BV(P{letter}{number});
Replace {letter} and {number} with the corresponding pin letter and number. For example, for pin B5 with the LED, turning it high and then low would be:
PORTB |= _BV(PB5);
PORTB &= ~_BV(PB5);
Note that | can be found to the left of the backspace key, on the same key of the backslash.

So, basically, you can replace the entire digitalWrite() command with the above to get a faster response!
But how much faster is it, really? It is time for an experiment!

Step 3: Exspearimintation

In this experiment, I sought to find out how long it took the digitalWrite() command to execute 1000 times, and then how long it took the “true c” style command to execute 1000 times. The code is fairly simple, and shown below:
void setup()
void loop()
 int initial = 0;
 int final = 0;
 initial = micros();
 for(int i = 0; i < 500; i++)
 final = micros();
 Serial.print("Time for digitalWrite(): ");
 initial = micros();
 for(int i = 0; i < 500; i++)
    PORTB |= _BV(PB5);
    PORTB &= ~_BV(PB5);
 final = micros();
 Serial.print("Time for true c command: ");
Feel free to try this out yourself. Here are the results I got:
(Seeeduino with ATMega168)
Time for digitalWrite(): 3804
Time for true c command: 348
So each style turned on the pin 500 times and turned it back off 500 times. digitalWrite() took 3804 microseconds, while the true c commands took just 348 microseconds. The true c commands are 10 times faster than the digitalWrite()!

Try out this experiment for yourself, all you need is an arduino and a computer.

Step 4: Overview

Now you know that digitalWrite() takes significantly more time to execute than true c style commands. If you want to use arduino and time is important, I would highly recommend you use these true C style commands in lieu of digitalWrite(). While digitalWrite() is more convenient to change pins with, it takes 10 times longer!

I have attached a little cheat sheet below, print it out so it is handy whenever you need it.
I hope you have gotten something useful from this.
Vote for the Arduino contest! Thanks!

Step 5: Going Further

Are you the type of person who wants to know how everything works?

Basically, what happens when you tell a pin to go high or low is that you modify a 8-bit register. (Remember how the pins go from A0-A7, B0-B7? 8 pins per letter, so those 8 pins are toggled by that one register). A register holds 8 bits (Each bit can be 0 or 1).

When you execute the command to put a pin high, the appropriate bit in the register is set to 0. (1 would be low).

Having 8 pins toggled by one register can also have its advantages, mainly that you can toggle any of the 8 pins nearly simultaneously.

For example, if I wanted to turn pins C0 through C6 high and C7 low, the command would be:
PORTC = 0b10000000;

Note how the first pin number coming after the "b" is pin 7, and it goes down from there until pin 0.

0b10000000 is an 8-bit binary number, you can convert it to hex for a cleaner look. Doing it manually is a pain (but useful knowledge), an easier method would be to google "0b10000000 to hex", which results in "0x80".
PORTC = 0x80;

For further reading, see here:
(thanks gmoon and westfw for the links)



  • Creative Misuse Contest

    Creative Misuse Contest
  • Water Contest

    Water Contest
  • Metalworking Contest

    Metalworking Contest

49 Discussions

I have a problem getting a stepper motor to rotate. You have probably solved my problem; Thanks!

When I'm talking about PIND = _BV(PD7) code, the PIND is usually used to read the port, but if the pin is an OUTPUT pin it will flip it on a write. Typo... thanks

Nice way to stimulate people... However I think the comparisons are apples and oranges. Doing a direct write to memory then comparing it to a polished function call, is apples and oranges. Just the call itself generates more code than a write to memory. For example, I don't have a boot-loader, I write directly to the 328p, which makes looking at the assembly created a little easier.

If you code

sei(); // enable interrupts
PORTD = 0;
PIND = _BV(PD7); // more on this code later
digitalWrite(1, HIGH);

This is the generated code (from the .lss listing file). The hexadecimal numbers on the left are where it lives in memory when loaded.

PORTD = 0;
178: 1b b8 out 0x0b, r1 ; 11
PIND = _BV(PD7);
17a: 80 e8 ldi r24, 0x80 ; 128
17c: 89 b9 out 0x09, r24 ; 9
digitalWrite(13, HIGH);
17e: 61 e0 ldi r22, 0x01 ; 1
180: 8d e0 ldi r24, 0x0D ; 13
182: 32 d0 rcall .+100 ; 0x1e8 <digitalWrite>
184: 80 e8 ldi r24, 0x80 ; 128

This isn't tough to follow

causes machine code of '1b b8' (hexadecimal), the pneumonic is out 0x0b, r1, which is 'out' to (port) 0x0b (PORTD). We have to load the value of _BV(PD7) -> (1 << PD7) -> (1 << 7) -> 0x80, into r24, which the compiler (and preprocessor) handle creating.

I'm going to skip PIND = 1, as it pertains other parts of this, but suffice to say that the normal generated code of these writes are like that generated by the PIND = 1, code segment. In other words it loads a value into a register, then writes it out to the proper port.

Here is the call to digitalWrite

It loads the two values to be passed into r22 and r24, then calls the digitalWrite function. Note we already increased the code size by 1/3 from 4 bytes to 6 bytes, a 33% increase in the amount of code you are going to execute, not even counting what happens when it gets to 'function' itself. By the way, my code size was about 334 bytes, adding the one call to digitalWrite increased code size by almost 60% to 607 bytes. Why is this? I'm glad you asked...

Here is the code that gets executed when you invoke digitalWrite(....)

000001e8 <digitalWrite>:
1e8: 0f 93 push r16
1ea: 1f 93 push r17
1ec: cf 93 push r28
1ee: df 93 push r29
1f0: 1f 92 push r1
1f2: cd b7 in r28, 0x3d ; 61
1f4: de b7 in r29, 0x3e ; 62
1f6: 28 2f mov r18, r24
1f8: 30 e0 ldi r19, 0x00 ; 0
1fa: f9 01 movw r30, r18
1fc: e8 59 subi r30, 0x98 ; 152
1fe: ff 4f sbci r31, 0xFF ; 255
200: 84 91 lpm r24, Z
202: f9 01 movw r30, r18
204: e4 58 subi r30, 0x84 ; 132
206: ff 4f sbci r31, 0xFF ; 255
208: 14 91 lpm r17, Z
20a: f9 01 movw r30, r18
20c: e0 57 subi r30, 0x70 ; 112
20e: ff 4f sbci r31, 0xFF ; 255
210: 04 91 lpm r16, Z
212: 00 23 and r16, r16
214: c1 f0 breq .+48 ; 0x246 <digitalWrite+0x5e>
216: 88 23 and r24, r24
218: 19 f0 breq .+6 ; 0x220 <digitalWrite+0x38>
21a: 69 83 std Y+1, r22 ; 0x01
21c: bc df rcall .-136 ; 0x196 <turnOffPWM>
21e: 69 81 ldd r22, Y+1 ; 0x01
220: e0 2f mov r30, r16
222: f0 e0 ldi r31, 0x00 ; 0
224: ee 0f add r30, r30
226: ff 1f adc r31, r31
228: ec 55 subi r30, 0x5C ; 92
22a: ff 4f sbci r31, 0xFF ; 255
22c: a5 91 lpm r26, Z+
22e: b4 91 lpm r27, Z
230: 9f b7 in r25, 0x3f ; 63
232: f8 94 cli
234: 8c 91 ld r24, X
236: 61 11 cpse r22, r1
238: 03 c0 rjmp .+6 ; 0x240 <digitalWrite+0x58>
23a: 10 95 com r17
23c: 81 23 and r24, r17
23e: 01 c0 rjmp .+2 ; 0x242 <digitalWrite+0x5a>
240: 81 2b or r24, r17
242: 8c 93 st X, r24
244: 9f bf out 0x3f, r25 ; 63
246: 0f 90 pop r0
248: df 91 pop r29
24a: cf 91 pop r28
24c: 1f 91 pop r17
24e: 0f 91 pop r16
250: 08 95 ret

Notice a call in there to turnOffPWM, this handles pulse width modulation, in case you use incompatible pins. turnOffPWM function is much longer than the digitalWrite function. Since you have the potential to go through all of the code, this types of comparisons are pretty useless. Also notice the overhead of the function itself where it has to push (store on stack) 5 register values and restore them before it returns. This is common as these 5 registers can now be used by the function and are restored to their previous value (pop off stack) before returning.

I find the Arduino and primarily all the Atmel chips 'fun' chip to play with. It's neat to find new things such as the code that shows you how to flip a bit. Atmel has the hardware designed in a way that support bit manipulation, especially in ports.

I see many code examples that show reading ports like PORTD, where the actual read should be from PIND. Since this 'reads' the port, you don't write to it. However if the port is an input, you can write to PIND and flip any bits that you wish..


17a: 80 e8 ldi r24, 0x80 ; 128
17c: 89 b9 out 0x09, r24 ; 9

PORTD &= ~_BV(PD7) generates a 'cbi' or clear bit on port but is complex looking to people

Lots of times you just want to flip it:)

Bottom line, this comparison is pretty useless unless you want to show the overhead. Don't think digitalWrite, just writes to memory. Functions alleviate you of dealing with the very detailed parts of programming. If you want speed and small code size you pay for it by needing to know more about the item you are working on and generally coding it by hand (i.e. assembly). For example, I see many demo code sequences that use a pull up or pull down resistor to make a switch or something work properly. The 328p has internal pull-ups, reducing the component count, just have to enable them. I encourage people to read the data sheets. They may seem intimidating but you get used to them and get quicker at figuring out what you can do with the device.
The level of obfuscation by functions are usually by the choice of 'what's easier' to do and how detailed to you want. The time spent doing assembly is generally not worth it (unless you like doing it:) But it is worth knowing an looking at when you come to some problems. Just be aware.

I use eclipse and have the listings turned on. If you have the Arduino IDE I think you have to enable the listing file in the preferences.txt file in the default directory for the IDE. I 'think' the line you need to add is "build.verbose=true".

Hope this helps some anyway....


2 years ago

Writing directly into the hardware registers, you loose in readability and portability.

I've published on Github a tool I called HWA that lets you use an object-oriented interface to the hardware that does not require a C++ compiler and produces high efficiency binary code.

It is there:

3 replies

using HWA , can I run Arduino at speeds similar to that of low level programmed Arduino?


Being not a library, HWA helps producing highly optimized binary code, most often the same as a clever low-level programmer would have obtained.

using HWA , can I run Arduino at speeds similar to that of low level programmed Arduino?

Tambien cuenta mucho que tipo de variable utilizamos para el manejo de nuestras variables ovbiamente le tomara mas tiempo manejar un long que un short int, o que un unsigned char. asi que tambien hay que tomarlo muy en cuenta a la hora de realizar nuestros sketch para que funciones mas rapido y sin forzar nuestro arduino.

Aca dejo un ejemplo de cuanto toma el arduino en ejecutar un sketch lo encontre en un libro. y bueno creo que es bueno que sea del conocimiento de muchos y lo que dice RazorConcepts es cierto al arduino le toma menos tiempo el ejecutar acciones cuando se le programa en modo Real C programming. que es como son programados los AVR pero como una parte de que sean para usos de personas aveces inexpertas es la una de las funciones principales aveces se les resta velocidad para darles un entorno de programacion mas amigable con las personas que no saben mucho (como yo)
aca les comparto el link para que lo prueben si decean esta en codebender:

Here you may see how fast are Arduino boards (turn on english subtitles):

Thanks and very useful to know.

In my case, it is 20 times faster, below are the result when I ran the same code:-

Time for digitalWrite(): 5220

time for true c command: 252

I used a ATMEGA328 and I got:

digitalWrite(): 5224
true c : 256

digitalRead() already exists, in wiring_digital.c, its burried within the arduino hardware folders.

For more details on arduino registers about port manipulation and what PORTB really is, refer to:


2 years ago

another way si use single instruction macro with CBI and SBI...

on bottom of post...

An example (assuming an inductor of proper value is connected to the CLKIN pin and corresponding IO pin) ...

PORTB |= 0b00100000; // if needed repeat this operation for total of 32 cycles
PORTB &= ~0b00000000; // if needed repeat for 32 cycles and so on ...

Then (in theory) after flashing this simple code the uC should be clocked to the RFID's carrier and also powered by it as well :D

Does anyone know how many actual clock cycles it takes to set pin states using this 'pure c' method? I found a very interesting post on emulating RFID tags using nothing but a PIC and a single radial coil. The same could be done with an ATTiny. It exploits the uC's internal capacitance and clamping diodes on the IO pins. Essentially, the RF modulation of the RFID reader supplies the oscillator frequency and just enough RF/induced current to even parasitically power the chip! One end of the coil is connected to the GP/CLKIN and the other end to a GP/IO pin. If the right amount of clock cycles are used in between switching pin states, it emulates a RFID card (simply switching low/high/low). There must be 32 clock cycles between each state per Manchester encoding. So, if you know how many cycles a pin state operation takes then you can achieve this kind of extremely-simple emulation. Here's the article -

If you count down rather than up in the loop it will be even faster! On my Arduino Mega changing the loop to count down for the True C commands reduced the time taken from 288microseconds to 192 microseconds, big difference if you need it to be as fast as possible!

1 reply

Counting down is more efficient, but only if you are comparing to zero. It is always more efficient to compare a value to zero than to another value! Similarly, do{}while(); loops are more efficient than for() loops. There are lots of pre-optimization tricks that you can do to squeeze your code into the smallest of chip spaces.