Arduino IDE 1.6.x Compiler Optimisations = Faster Code

41,115

60

27

Intro: Arduino IDE 1.6.x Compiler Optimisations = Faster Code

After downloading the latest Arduino IDE (1.6.1) I was rather disappointed that some of my sketches ran significantly slower than the same sketch compiled under IDE 1.0.6. This was particularly noticeable on one of my sketches that drove a TFT display.

The good news however was that the 1.6.1 IDE produced a sketch that was 20% smaller, this was great as I was beginning to run out of FLASH space on my UNO for different fonts.

To solve the mysteries of the compiled code sizes and speed differences I decided to investigate further. Ideally I wanted the speed back that the older IDE gave me, whilst still being able to save program FLASH space!

Step 1: Compilers

A search on the internet will tell you much more than I can about compilers, here is just a brief summary.

The Arduino sketches are written in a high level language, namely C++ or C but the micro-controller executes machine code instructions, thus the job of the compiler is to convert (viz translate) the human readable code into a sequence of instructions that the micro-controller can execute. Essentially the compiler converts one language to another.

When the compiler is called upon to create the executable code there are a number of options that can be invoked, in the case of the GCC compiler used by the Arduino IDE there are 5 speed/size optimisation options as detailed here. The different options cause the compiler to make more effort to optimise the executable code for size or speed.

The default optimisation used in the Arduino IDE is for size, this is option "-Os" in the command line. The reason the code sizes and speeds generated by the two Arduino IDE's is so different is because a much newer version of the GCC compiler is used in the latest IDE. Clearly this new version creates a significantly smaller executable but the penalty it appears is a significantly slower execution speed.

A few mild words of caution

It is worth noting that, in rare cases, changing the optimisation level can affect the way a program behaves when running. This is because optimisation tries to"rewrite" the software to make a "time" and "size" efficient executable. The probability of the software function being affected is dependent on how "aggressively" the compiler modifies the code.

The default optimisation level for the Arduino IDE (-Os) is already pretty aggressive so it is very unlikely that you will see new behaviour problems introduced if the optimisation level is changed.

Because the compiler follows a set of rules we sometimes need to include "compiler directives" within the software itself to avoid problems during the optimisation process, a classic problem is the failure to declare variables as "volatile" when they are used by a main program and an Interrupt routine... if you don't know what "volatile" means then Google "why is volatile needed".

Step 2: Size and Speed Differences

The sketch I used for testing was a graphics speed test.

Here are the results of tests for a sketch compiled for an Arduino Mega, similar results would be expected for an UNO:

IDE 1.0.6 :

  • Compiled size: 26,620 bytes
  • Execution time: 13.3 seconds

IDE 1.6.1:

  • Compiled size: 19,558 bytes
  • Execution time: 17.8 seconds

These results showed why I noticed such a dramatic speed difference...

We can also see the 1.6.1 IDE produces a FLASH image 7062 bytes smaller, that is significant when you consider it could make the difference between getting it running on an UNO or needing an upgrade to a Mega.

Unfortunately the execution speed has dropped 34% which is not helpful. The question I wanted to answer was:

Can we have the best of both worlds, a fast execution time and a smaller sketch?

Step 3: Results of Changing the Compiler Optimisation

Bear in mind that I have just tested one sketch and different options may be better in some circumstances.

These are the results I obtained when using the IDE 1.6.1 and changing the compiler optimisation directive:

-Os (Arduino IDE default)

  • Compiled size: 19,558 bytes
  • Execution time: 17.8 seconds

-O0 (no optimisation at all!)

  • Compiled size: 31,382 bytes
  • Execution time: 44.7 seconds

-O1

  • Compiled size: 20,428 bytes
  • Execution time: 17.0 seconds

-O2

  • Compiled size: 20,500 bytes
  • Execution time: 12.7 seconds

-O3

  • Compiled size: 25,550 bytes
  • Execution time: 12.2 seconds

As I am using an Arduino Mega I am not particularly concerned about the FLASH size, so option -O3 gives a better speed (shorter run time) and the sketch is smaller than the IDE 1.0.6 gave me. However I have decided to set the 1.6.1 IDE to optimisation -O2 as that looks like a good compromise between better speed and smaller FLASH code.

The size and speed improvements obtained for your own sketches may well give better or worse results and a different compiler option may give better results.

Step 4: How to Change the Optimisation Level...

The compiler command lines are contained within a text file buried within the Arduino application image, it is necessary to burrow down through a few directory levels to find a text file called "platform.txt".

In a Windows environment you need to open the folder where the arduino.exe is and find the file in the folder path.

arduino-1.6.1\hardware\arduino\avr\platform.txt

See step 6 of this Instructable if you are using the latest 1.6.x IDE, as the file path to platform.txt has changed!

If you are nervous about messing something up then make a copy of the file somewhere!

Open the platform.txt file in WordPad (Notepad will not work due to the way the file is structured). Turn off "Word wrap" so lines can be counted more easily.

Find this line, about 16 lines down from the top:

compiler.c.flags=-c -g -Os -w -ffunction-sections -fdata-sections -MMD

Change the -Os to -O2 as below:

compiler.c.flags=-c -g -O2 -w -ffunction-sections -fdata-sections -MMD

Next find a second line a little further down the file, about 23 lines from the top:

compiler.cpp.flags=-c -g -Os -w -fno-exceptions -ffunction-sections -fdata-sections -fno-threadsafe-statics -MMD

Again, change the -Os to -O2 as below:

compiler.cpp.flags=-c -g -O2 -w -fno-exceptions -ffunction-sections -fdata-sections -fno-threadsafe-statics -MMD

In practice it is just a case of changing the "s" to a "2". Note that is a letter "O" not a zero in the command line.

Now save the file, don't worry about any format warning. Next time it will open in Notepad OK!

Changing the compiler options will have no effect if you do it while you have the Arduino IDE open, you must close all the Arduino windows and open up the IDE again to get the change to be recognised.

I found my sketch ran a tiny weeny bit (a few microseconds!) faster with the first line changed to -O1 but the difference was far too small to notice when the sketch is running.

Step 5: Result!

Mission accomplished! Smaller code TICK, faster speed TICK, so it was a win-win for me!

Have fun!

Step 6: Arduino IDE 1.6.2 to 1.6.3 - Platform.txt Location

The latest 1.6.x IDE is now available on the Arduino website. The same methods can be used to change the compiler optimisation but the "platform.txt" file is in a different location.

If you open the Arduino IDE "File" menu, select "Preferences" then in the bottom on the window you will see the file path to help find it. On my Windows setup this is:

C:\Users\XXXX\AppData\Roaming\Arduino15\

where XXXX is your user name.

There is already a platform.txt at that directory level but I see no way to change the compiler options in that one. You need to burrow down through a few more directory levels to find this platform.txt file:

C:\Users\XXXX\AppData\Roaming\Arduino15\packages\arduino\hardware\avr\1.6.x\platform.txt

Now open the file, edit the compiler option, and save as described in Step 4. On my copy of the txt file the lines to modify are 17 and 24.

Hopefully in future IDE versions the file will be in a similar location.

Annoyingly the new 1.6.x IDE version over-writes some files used by other older IDE versions that may be resident! So if you load a new IDE you will need to change the compiler options yet again.

Step 7: IDE 1.6.6 and 1.6.7

To quote McCoy from Star Trek "I know engineers, they love to change things". So the platform.txt file is back in the IDE folder for the latest versions (a good move)! For example:

C:\Users\xxx\xxxx\Arduino\arduino-1.6.6\hardware\arduino\avr

I am not sure if it is because my software programming skills have improved or for some other unfathomable reason, but the speed gains from using the -O2 optimisation level instead of -Os seem somewhat lower these days at a few percent. So might be that the gains are very dependent on what the software is doing and may not be worth the effort. I suspect that I got lucky on some of my sketches and some tight loops were made significantly faster by -O2.

Share

    Recommendations

    • Tiny Home Contest

      Tiny Home Contest
    • Fix It! Contest

      Fix It! Contest
    • Furniture Contest 2018

      Furniture Contest 2018

    27 Discussions

    Instead of having to modify the platform.txt file, an alternate way to get *almost* the same effect is to simply place this at the top of your code: #pragma GCC optimize ("-O2"). Replace -O2 with the optimization level you want. If you are using libraries, place this at the top of the .h file for the library, or else it won't apply this optimization level to the library, even if you have it at the top of your main Arduino sketch.

    Sources:

    https://gcc.gnu.org/onlinedocs/gcc-4.8.4/gcc/Function-Specific-Option-Pragmas.html

    https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

    http://stackoverflow.com/questions/30038172/adding-unused-elements-to-c-c-structure-speeds-up-and-slows-down-code-executio

    3 replies

    Thank you! I did not know that was possible. I will update the Instructable with a new step to include this useful information. This is going to be a very convenient method of test different optimisation levels on some sketches and avoids the tedious editing of the platform.txt file. Cheers.

    Bodmer, until 12 hrs of research over the last few days trying to solve my own problem, on the stackoverflow link I posted, I didn't know either! Thank you for this instructable or else I never would have figured it out. Until 3 days ago I didn't even know what a pragma was, nor what compiler optimization levels were, and now I'm already sharing that info. I've learned lots this week. This instructable was key, so thanks again. In case you wanted to cite me for how you figured out your additional info for your instructable, I always appreciate link-backs using my name (Gabriel Staples) and website (http://www.electricrcaircraftguy.com/). Also, this #pragma directive is not quite the same thing as editing the platform.txt file, but that's why I searched so hard for it. I really wanted something faster & more convenient, and that could be easily changed (and kept on a fixed setting), for individual codes. Do some comparisons of #pragma vs changing the platform.txt file and you'll see the size results (and prob. speed too) are not necessarily identical. This #pragma only modifies functions, so perhaps the "command-line" modification in the txt file also modifies global variables and the #pragma does not....I"m not 100% sure. I plan on using the #pragma option simply for convenience though, but need to do some more experimenting myself.

    Thanks for the update, I have had a play and the code sizes did change indicating the optimisation was different, I am a bit busy with work at the moment so have not had much time to look at this further. Thanks again.

    0
    None
    Bodmer

    3 years ago on Introduction

    Out of interest I took the simplest example sketch "Blink", took out the delays and looked at the LED logic line with an oscilloscope. The toggle rate was 15% faster with option O2 compared to Os!

    It would be interesting to see if there are any significant speed differences for computationally intense sketches such as Sine/Cosine, and floating point maths...

    The graphics test sketch has a lot of short loops that are run at a high frequency so even optimising out a few machine code instructions will show singificant speed improvements.

    0
    None
    dmwatkins

    3 years ago

    Thanks! This is great! Good work!

    0
    None
    AlphaOmega1

    7 months ago

    Very good post, about to play.
    Thanks for your effort

    BTW rather than using notepad or wordpad, try Notepad++
    A nice quick lightweight package that you'll find you can't do without ;)

    2 replies
    0
    None
    BodmerAlphaOmega1

    Reply 6 months ago

    Yes, I do use Notepad++. It is very good. I have tried Atom too but prefer Notepad++

    0
    None
    AlphaOmega1AlphaOmega1

    Reply 7 months ago

    I should add that Notepad++ will make the distinction between 0 & O (Zero and the letter "O"

    0
    None
    azo747

    2 years ago

    I'm working on a clock based on led matrix. It works great on compiler ver. 1.0.5r2 but with 1.6.9 the leds flicker. Only with -O3 parameter, the update function timing reduced from 1124 us (micro seconds) without optimization to 788 us, faster than the old 1.0.5r2 with 888 us. Thanks a lot !!

    1 reply
    0
    None
    Bodmerazo747

    Reply 2 years ago

    That is interesting. Thanks for the feedback.

    0
    None
    hedoluna

    2 years ago

    Thanks, man! This guide opens a lot of new possibilities!

    0
    None
    LGROBBINS

    3 years ago on Introduction

    Hi,

    I have a sketch that compiled with 1.0.x takes 30,696 bytes of flash (out of 30,720 max) on a Nano - just 24 bytes shy of filling all the PROGMEM. When compiled with 1.6.3 and the Os compiler option, it takes up 29,174 - a nice saving. I haven't checked executions speed yet because I will have to delete something to add Serial printing under 1.0.6, but I did try O2 and O3 under 1.6.3 (and 1.6.2). Unfortunately, with both of these options the file is "too big". So, whether the machine code is larger or smaller than under 1.0.x with O2 or O3 depends on the source - in your test it was smaller, for this program it is larger. In other words, YMMV. Wonder if there are other options (sub-options) that might help.

    Ciao,

    Lenny

    2 replies
    0
    None
    BodmerLGROBBINS

    Reply 3 years ago on Introduction

    Hi Lenny, yes, these variances are dependant on so many factors and hence all the caveats stated. I have found that simpler coding styles seem to help the compiler better optimise the code for size and speed. I used a Mega for the test simply so the test code would fit with the 'no' optimisation at all option, which really shows for comparison how good a job the compiler does on.
    There are many, many sub - options using other compiler flags, these are well documented in the GO man pages and online documentation.

    0
    None
    LGROBBINSBodmer

    Reply 3 years ago on Introduction

    To emphasize what Bodmer wrote, here's the results of a little further testing with a slightly smaller sketch. This Nano is part of a CAN network, and was tested in three states with slightly different CAN messaging loads. Times are for 1000 loop cycles, percentages in parentheses are vs. Arduino 1.0.6:

    Master56 loop times:
    Arduino 1.0.6 -- 27,134 bytes flash
    drive - 3563
    seat - 3046
    lights - 3044

    Arduino 1.6.3, Os -- 25,270 bytes flash (93%), globals = 635 bytes
    drive - 3545 (99.5%)
    seat - 3023 (99.2%)
    lights - 3025 (99.4%)

    Master56 loop times:
    Arduino 1.6.3, O2 -- 27,250 (100.4%) bytes flash, globals = 635 bytes
    drive - 3412 (95.8%)
    seat - 2925 (96.0%)
    lights - 2924 (96.1%)

    Master56 loop times:
    Arduino 1.6.3, O3 -- Sketch too big

    These results surprised me a bit. The default Os option with 1.6.3 not only gives more compact code, but is slightly (insignificantly) faster, not slower, than 1.0.6, the O3 option which increased flash use only a bit in Bodmer's test case here goes over the max. available, while O2 which was the most flash hungry in Bodmers test case uses only a few bytes more than 1.0.6 and is faster by a very few percent.

    Bottom line seems to be that for me the choice of Os by the Arduino team seems to have been a good one - compact code with no speed penalty - but that if size or speed are critical one has to test each of the possibilities, and as Bodmer has said there are many more than just Os, O2 and O3,

    Ciao,

    Lenny