loading

After downloading the latest Arduino IDE (1.6.1) I was rather disappointed that some of my sketches ran significantly slower than the same sketch compiled under IDE 1.0.6. This was particularly noticeable on one of my sketches that drove a TFT display.

The good news however was that the 1.6.1 IDE produced a sketch that was 20% smaller, this was great as I was beginning to run out of FLASH space on my UNO for different fonts.

To solve the mysteries of the compiled code sizes and speed differences I decided to investigate further. Ideally I wanted the speed back that the older IDE gave me, whilst still being able to save program FLASH space!

Step 1: Compilers

A search on the internet will tell you much more than I can about compilers, here is just a brief summary.

The Arduino sketches are written in a high level language, namely C++ or C but the micro-controller executes machine code instructions, thus the job of the compiler is to convert (viz translate) the human readable code into a sequence of instructions that the micro-controller can execute. Essentially the compiler converts one language to another.

When the compiler is called upon to create the executable code there are a number of options that can be invoked, in the case of the GCC compiler used by the Arduino IDE there are 5 speed/size optimisation options as detailed here. The different options cause the compiler to make more effort to optimise the executable code for size or speed.

The default optimisation used in the Arduino IDE is for size, this is option "-Os" in the command line. The reason the code sizes and speeds generated by the two Arduino IDE's is so different is because a much newer version of the GCC compiler is used in the latest IDE. Clearly this new version creates a significantly smaller executable but the penalty it appears is a significantly slower execution speed.

A few mild words of caution

It is worth noting that, in rare cases, changing the optimisation level can affect the way a program behaves when running. This is because optimisation tries to"rewrite" the software to make a "time" and "size" efficient executable. The probability of the software function being affected is dependent on how "aggressively" the compiler modifies the code.

The default optimisation level for the Arduino IDE (-Os) is already pretty aggressive so it is very unlikely that you will see new behaviour problems introduced if the optimisation level is changed.

Because the compiler follows a set of rules we sometimes need to include "compiler directives" within the software itself to avoid problems during the optimisation process, a classic problem is the failure to declare variables as "volatile" when they are used by a main program and an Interrupt routine... if you don't know what "volatile" means then Google "why is volatile needed".

Step 2: Size and Speed Differences

The sketch I used for testing was a graphics speed test.

Here are the results of tests for a sketch compiled for an Arduino Mega, similar results would be expected for an UNO:

IDE 1.0.6 :

  • Compiled size: 26,620 bytes
  • Execution time: 13.3 seconds

IDE 1.6.1:

  • Compiled size: 19,558 bytes
  • Execution time: 17.8 seconds

These results showed why I noticed such a dramatic speed difference...

We can also see the 1.6.1 IDE produces a FLASH image 7062 bytes smaller, that is significant when you consider it could make the difference between getting it running on an UNO or needing an upgrade to a Mega.

Unfortunately the execution speed has dropped 34% which is not helpful. The question I wanted to answer was:

Can we have the best of both worlds, a fast execution time and a smaller sketch?

Step 3: Results of Changing the Compiler Optimisation

Bear in mind that I have just tested one sketch and different options may be better in some circumstances.

These are the results I obtained when using the IDE 1.6.1 and changing the compiler optimisation directive:

-Os (Arduino IDE default)

  • Compiled size: 19,558 bytes
  • Execution time: 17.8 seconds

-O0 (no optimisation at all!)

  • Compiled size: 31,382 bytes
  • Execution time: 44.7 seconds

-O1

  • Compiled size: 20,428 bytes
  • Execution time: 17.0 seconds

-O2

  • Compiled size: 20,500 bytes
  • Execution time: 12.7 seconds

-O3

  • Compiled size: 25,550 bytes
  • Execution time: 12.2 seconds

As I am using an Arduino Mega I am not particularly concerned about the FLASH size, so option -O3 gives a better speed (shorter run time) and the sketch is smaller than the IDE 1.0.6 gave me. However I have decided to set the 1.6.1 IDE to optimisation -O2 as that looks like a good compromise between better speed and smaller FLASH code.

The size and speed improvements obtained for your own sketches may well give better or worse results and a different compiler option may give better results.

Step 4: How to Change the Optimisation Level...

The compiler command lines are contained within a text file buried within the Arduino application image, it is necessary to burrow down through a few directory levels to find a text file called "platform.txt".

In a Windows environment you need to open the folder where the arduino.exe is and find the file in the folder path.

arduino-1.6.1\hardware\arduino\avr\platform.txt

See step 6 of this Instructable if you are using the latest 1.6.x IDE, as the file path to platform.txt has changed!

If you are nervous about messing something up then make a copy of the file somewhere!

Open the platform.txt file in WordPad (Notepad will not work due to the way the file is structured). Turn off "Word wrap" so lines can be counted more easily.

Find this line, about 16 lines down from the top:

compiler.c.flags=-c -g -Os -w -ffunction-sections -fdata-sections -MMD

Change the -Os to -O2 as below:

compiler.c.flags=-c -g -O2 -w -ffunction-sections -fdata-sections -MMD

Next find a second line a little further down the file, about 23 lines from the top:

compiler.cpp.flags=-c -g -Os -w -fno-exceptions -ffunction-sections -fdata-sections -fno-threadsafe-statics -MMD

Again, change the -Os to -O2 as below:

compiler.cpp.flags=-c -g -O2 -w -fno-exceptions -ffunction-sections -fdata-sections -fno-threadsafe-statics -MMD

In practice it is just a case of changing the "s" to a "2". Note that is a letter "O" not a zero in the command line.

Now save the file, don't worry about any format warning. Next time it will open in Notepad OK!

Changing the compiler options will have no effect if you do it while you have the Arduino IDE open, you must close all the Arduino windows and open up the IDE again to get the change to be recognised.

I found my sketch ran a tiny weeny bit (a few microseconds!) faster with the first line changed to -O1 but the difference was far too small to notice when the sketch is running.

Step 5: Result!

Mission accomplished! Smaller code TICK, faster speed TICK, so it was a win-win for me!

Have fun!

Step 6: Arduino IDE 1.6.2 to 1.6.3 - Platform.txt Location

The latest 1.6.x IDE is now available on the Arduino website. The same methods can be used to change the compiler optimisation but the "platform.txt" file is in a different location.

If you open the Arduino IDE "File" menu, select "Preferences" then in the bottom on the window you will see the file path to help find it. On my Windows setup this is:

C:\Users\XXXX\AppData\Roaming\Arduino15\

where XXXX is your user name.

There is already a platform.txt at that directory level but I see no way to change the compiler options in that one. You need to burrow down through a few more directory levels to find this platform.txt file:

C:\Users\XXXX\AppData\Roaming\Arduino15\packages\arduino\hardware\avr\1.6.x\platform.txt

Now open the file, edit the compiler option, and save as described in Step 4. On my copy of the txt file the lines to modify are 17 and 24.

Hopefully in future IDE versions the file will be in a similar location.

Annoyingly the new 1.6.x IDE version over-writes some files used by other older IDE versions that may be resident! So if you load a new IDE you will need to change the compiler options yet again.

Step 7: IDE 1.6.6 and 1.6.7

To quote McCoy from Star Trek "I know engineers, they love to change things". So the platform.txt file is back in the IDE folder for the latest versions (a good move)! For example:

C:\Users\xxx\xxxx\Arduino\arduino-1.6.6\hardware\arduino\avr

I am not sure if it is because my software programming skills have improved or for some other unfathomable reason, but the speed gains from using the -O2 optimisation level instead of -Os seem somewhat lower these days at a few percent. So might be that the gains are very dependent on what the software is doing and may not be worth the effort. I suspect that I got lucky on some of my sketches and some tight loops were made significantly faster by -O2.

<p>Very impressive. That works finally i found it out :-) Thank you!!!</p>
Great, thanks for your feedback. I hope your project is a success.
<p>Instead of having to modify the platform.txt file, an alternate way to get *almost* the same effect is to simply place this at the top of your code: #pragma GCC optimize (&quot;-O2&quot;). Replace -O2 with the optimization level you want. If you are using libraries, place this at the top of the .h file for the library, or else it won't apply this optimization level to the library, even if you have it at the top of your main Arduino sketch. </p><p>Sources:</p><p><a href="https://gcc.gnu.org/onlinedocs/gcc-4.8.4/gcc/Function-Specific-Option-Pragmas.html" rel="nofollow">https://gcc.gnu.org/onlinedocs/gcc-4.8.4/gcc/Function-Specific-Option-Pragmas.html</a></p><p><a href="https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html" rel="nofollow">https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html</a></p><p><a href="http://stackoverflow.com/questions/30038172/adding-unused-elements-to-c-c-structure-speeds-up-and-slows-down-code-executio" rel="nofollow">http://stackoverflow.com/questions/30038172/adding-unused-elements-to-c-c-structure-speeds-up-and-slows-down-code-executio</a></p>
<p>Thank you! I did not know that was possible. I will update the Instructable with a new step to include this useful information. This is going to be a very convenient method of test different optimisation levels on some sketches and avoids the tedious editing of the platform.txt file. Cheers.</p>
<p>Bodmer, until 12 hrs of research over the last few days trying to solve my own problem, on the stackoverflow link I posted, I didn't know either! Thank you for this instructable or else I never would have figured it out. Until 3 days ago I didn't even know what a pragma was, nor what compiler optimization levels were, and now I'm already sharing that info. I've learned lots this week. This instructable was key, so thanks again. In case you wanted to cite me for how you figured out your additional info for your instructable, I always appreciate link-backs using my name (Gabriel Staples) and website (<a href="http://www.electricrcaircraftguy.com/" rel="nofollow">http://www.electricrcaircraftguy.com/</a>). Also, this #pragma directive is not quite the same thing as editing the platform.txt file, but that's why I searched so hard for it. I really wanted something faster &amp; more convenient, and that could be easily changed (and kept on a fixed setting), for individual codes. Do some comparisons of #pragma vs changing the platform.txt file and you'll see the size results (and prob. speed too) are not necessarily identical. This #pragma only modifies functions, so perhaps the &quot;command-line&quot; modification in the txt file also modifies global variables and the #pragma does not....I&quot;m not 100% sure. I plan on using the #pragma option simply for convenience though, but need to do some more experimenting myself.</p>
<p>Thanks for the update, I have had a play and the code sizes did change indicating the optimisation was different, I am a bit busy with work at the moment so have not had much time to look at this further. Thanks again.</p>
<p>Really impressive. Thanks!</p>
<p>Out of interest I took the simplest example sketch &quot;Blink&quot;, took out the delays and looked at the LED logic line with an oscilloscope. The toggle rate was 15% faster with option O2 compared to Os!</p><p>It would be interesting to see if there are any significant speed differences for computationally intense sketches such as Sine/Cosine, and floating point maths...</p><p>The graphics test sketch has a lot of short loops that are run at a high frequency so even optimising out a few machine code instructions will show singificant speed improvements.</p>
<p>Very nice.</p>
Thanks! This is great! Good work!
<p>I'm working on a clock based on led matrix. It works great on compiler ver. 1.0.5r2 but with 1.6.9 the leds flicker. Only with -O3 parameter, the update function timing reduced from 1124 us (micro seconds) without optimization to 788 us, faster than the old 1.0.5r2 with 888 us. Thanks a lot !!</p>
That is interesting. Thanks for the feedback.
<p>Thanks, man! This guide opens a lot of new possibilities!</p>
<p>huge thanks for this article.</p>
<p>Hi,</p><p>I have a sketch that compiled with 1.0.x takes 30,696 bytes of flash (out of 30,720 max) on a Nano - just 24 bytes shy of filling all the PROGMEM. When compiled with 1.6.3 and the Os compiler option, it takes up 29,174 - a nice saving. I haven't checked executions speed yet because I will have to delete something to add Serial printing under 1.0.6, but I did try O2 and O3 under 1.6.3 (and 1.6.2). Unfortunately, with both of these options the file is &quot;too big&quot;. So, whether the machine code is larger or smaller than under 1.0.x with O2 or O3 depends on the source - in your test it was smaller, for this program it is larger. In other words, YMMV. Wonder if there are other options (sub-options) that might help.</p><p>Ciao,</p><p>Lenny</p>
Hi Lenny, yes, these variances are dependant on so many factors and hence all the caveats stated. I have found that simpler coding styles seem to help the compiler better optimise the code for size and speed. I used a Mega for the test simply so the test code would fit with the 'no' optimisation at all option, which really shows for comparison how good a job the compiler does on.<br>There are many, many sub - options using other compiler flags, these are well documented in the GO man pages and online documentation.
<p>To emphasize what Bodmer wrote, here's the results of a little further testing with a slightly smaller sketch. This Nano is part of a CAN network, and was tested in three states with slightly different CAN messaging loads. Times are for 1000 loop cycles, percentages in parentheses are vs. Arduino 1.0.6:</p><p>Master56 loop times:<br>Arduino 1.0.6 -- 27,134 bytes flash<br>drive - 3563<br>seat - 3046<br>lights - 3044<br><br>Arduino 1.6.3, Os -- 25,270 bytes flash (93%), globals = 635 bytes<br>drive - 3545 (99.5%)<br>seat - 3023 (99.2%)<br>lights - 3025 (99.4%)<br><br>Master56 loop times:<br>Arduino 1.6.3, O2 -- 27,250 (100.4%) bytes flash, globals = 635 bytes<br>drive - 3412 (95.8%)<br>seat - 2925 (96.0%)<br>lights - 2924 (96.1%)<br><br>Master56 loop times:<br>Arduino 1.6.3, O3 -- Sketch too big</p><p>These results surprised me a bit. The default Os option with 1.6.3 not only gives more compact code, but is slightly (insignificantly) faster, not slower, than 1.0.6, the O3 option which increased flash use only a bit in Bodmer's test case here goes over the max. available, while O2 which was the most flash hungry in Bodmers test case uses only a few bytes more than 1.0.6 and is faster by a very few percent.</p><p>Bottom line seems to be that for me the choice of Os by the Arduino team seems to have been a good one - compact code with no speed penalty - but that if size or speed are critical one has to test each of the possibilities, and as Bodmer has said there are many more than just Os, O2 and O3,</p><p>Ciao,</p><p>Lenny</p>
Hi Lenny, thanks for taking the time to post your results. I totally agree that -Os should be the default for the IDE as FLASH space is in somewhat short supply. The performance results you found may not truly reflect any processing speed gains as the CAN bus transaction rate is more bit rate dependant (and the compiler will not change that) and the processor is probably sat around twiddling it's thumbs waiting for a bus cycle to complete, twiddling thumbs twice as fast would not shorten the wait. If I can find time from my paid work (!) then I will see if there are some standard benchmark tests that might show the strengths and weaknesses of the various compiler options.
<p>You're quite right that my example is a use case and not by any means a benchmark. </p><p>However, I don't think that the CANbus bit rate is involved. That is determined by the MCP2151 controller, not by the MCU and, once past setup, none of the CAN messaging is blocking. If a message needs to be sent, which is happening once in a while, more often in drive mode, it is sent to the 2151 via SPI and loop goes merrily on its way. The 2151 is also queried by SPI to see if there's anything in its receive buffers a couple times per loop, and if there is something in either buffer the message is retrieved, again via SPI. Some of the outgoing messages are just broadcast, others require a response, but that's handled as a kind of asynchronous collaborative multi-tasking.</p><p>So what is blocking, as far as CAN is concerned, are the 8 MHz SPI transfers. The other time consuming, and blocking, items are averaging of multiple reads on two analog pins and some Serial.print actions. All of these together may indeed mean raw instruction execution rate may not be very important in this example. A long time ago, when the program was in a much more primitive state, I had checked the effect of that analog averaging (it is substantial) and CANbus bitrate (rather little effect) and SPI rate (noticeable), but I haven't gone back to dig up the actual numbers, that's just my fallible memory speaking.</p><p>Ciao,</p><p>Lenny</p>
GO = GCC .... drat the auto - correction!
<p>It looks like your the browser you are using does not have the plug-in support for leaving messages...</p><p>Your message did get emailed to me:</p><p>&quot;I noticed that there is another line containing Os</p><p>compiler.c.elf.flags=-w -Os -Wl,--gc-sections</p><p>Should this be changed as well?&quot;</p><p>I tried changing this but it made zero difference to the sketch size or speed , I think this is because it is related to the file formatter that produces the executable image. So I left as -Os</p>
<p>The Instructable has now been updated to cover adaptions to IDE 1.6.2, see Step 6.</p>

About This Instructable

31,421views

58favorites

License:

More by Bodmer:Arduino - TFT display of icons and images from FLASH memory Arduino - TFT display of bitmap images from an SD Card Arduino analogue 'ring' meter on colour TFT display 
Add instructable to: