Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / programming / performance

Why is the digital I/O in Arduino slow and what can be done about it?

4.96/5 (21 votes)
16 May 2014CPOL25 min read 95.2K  
This article explains why the Arduino digital I/O functions are slow and compares them with faster implementation used in Wiring framework.

Introduction

I like Arduino and I use it quite often for prototyping. It is great how quickly you can have things working using the wide code base available elsewhere on the internet. But when I looked into the sources of the Arduino library, I was a bit disappointed. There are some things which could be handled better. Some people even describe it in a less diplomatic way, see here.

Why did I look under the hood? It started with the discovery that the functions for writing and reading digital pins are rather slow. I wanted to know why and what can be done to speed them up. If you are interested in such kind of things, read on.

In this article I will describe how the functions for reading and writing digital I/O are implemented in Arduino and compare the implementation with Wiring framework, which was the original inspiration for Arduino and seems to be written better. It does not seem to be widely known that it is still developed and besides its own hardware it supports also Arduino hardware, so it can be used as an alternative software framework for programming Arduino.

At the end I also present some results of experiments measuring the speed of those two implementations. Some background in microcontroller programming is assumed, otherwise some parts may be hard to understand. However, I hope the main points will be understandable for anyone.

Disclaimer
I do like Arduino and think it is a great project. By this article I do not mean to say that it is bad and you should stop using it. For most users the Arduino library will work just fine as is. Some CPU cycles and memory wasted do not matter. If you are using it for a serious project where every cycle/byte/milliWatt of consumed energy matters, don’t take it for granted that everything is perfect under the hood.

What’s new (updated in May 2014)

When I wrote this article about a year ago, I considered only the 2 options described in details below. I also mentioned a third option – “to somehow encode the register address into the pin number” but I dismissed it as impossible to do with the plain pin numbers which identify the pins in Arduino. But I still wanted to create a faster alternative to the Arduino digital I/O functions and when experimenting with various options, just this originally dismissed option proved to be the best one! It turned out that even with converting the plain pin number into a “pin code” with encoded register address, this approach will be faster than the original I/O.

So I created an alternate version of the digital I/O functions which have the advantage of the Wiring approach (see option 1 below) - that is, they compile into single instruction for pins known at compile time, but they also work faster for pin numbers stored in variables. The details and the code for download can be found here: http://www.codeproject.com/Articles/732646/Fast-digital-I-O-for-Arduino.

 

How Arduino handles digital I/O

The Arduino library defines functions digitalRead and digitalWrite for reading and writing an I/O pin. These functions take the number of the pin (an integer) as their input parameter. For example, to turn on an LED connected to digital pin 7, you would use this code:

digitalWrite(7, HIGH);  

To read the state of digital pin 8 you could write:
 

state = digitalRead(8);

In other words, Arduino programing model uses single integer for identifying an I/O pin. On the Arduino boards these pins are named Digital pin 0, Digital pin 1, etc. This is useful abstraction of the hardware. The MCU itself (Atmel’s ATmega328 in case of Arduino Uno) organizes the pins into ports named A, B, C, etc. Each port has 8 pins. So there are, for example, pins 0 to 7 on port A (referred to as PA0 thru PA7), pins 0 to 7 on port B (PB0 thru PB7), etc. Picture below (from Arduino.cc) shows the relationship of Arduino pin numbers and MCU ports.






Each port is controlled by several registers. For example, to set pin 7 in port D to logical 1 you need to set bit 7 in data register of port D (PORTD). The code in C can look like this:

PORTD |= 0x80;  

So, when we write digitalWrite(8,HIGH), the Arduino software needs to translate this to setting some bit in some register. This is something every software library which wants to abstract hardware pins has to deal with. How do you translate the digital pin number into MCU register?

I can think of two options:

  1. Use conditional expressions (C conditional operator)
  2. Use an array which maps pin numbers onto peripheral registers

There is actually a third option - to somehow encode the register address into the pin number, but this is not possible in case of the Arduino library, which uses the "raw" pin numbers to identify the pins in the program. This is probably unlucky design decision, which leads to the performance problems this article is about. On the other hand, it is true that it is very intuitive for the user.

The option 1 is arguably more intuitive. In the digitalWrite function there will be a condition saying that if the pin number is between 0 and 7, the port register is PORTD (see the pin mapping picture above). If it is above 7, the register is PORTB, etc. This approach has one big advantage – if the pin number is compile time constant (it is known to the compiler at the time when it compiles the program), the resulting code can be as efficient as if the “native” MCU registers were manipulated directly.

If the pin number is not a compile-time constant, the conditions are not processed during compilation by the compiler but become a “normal” program code with conditional expressions. The size (and speed) of the code then depends on how many pins/registers there are, i.e. how many conditions must be evaluated.

If using the option 2, we map the pin number to corresponding register using an array. This is the approach used in Arduino. There is one array, which maps pin number to “port number”:

CSS
const uint8_t PROGMEM digital_pin_to_port_PGM[] = {
    PD, /* 0 */
    PD,
    PD,
    PD,
    PD,
    PD,
    PD,
    PD,
    PB, /* 8 */
    PB,
… 

And a second array, which maps port numbers to the actual registers for controlling this port. There are several such arrays for different types of registers involved, but this is not important for us now. In the following listing there is the array of output registers for Arduino Uno.

const uint16_t PROGMEM port_to_output_PGM[] = {
    NOT_A_PORT,
    NOT_A_PORT,
    (uint16_t) &PORTB,
    (uint16_t) &PORTC,
    (uint16_t) &PORTD,
};  

As can be seen from the code snippets, the process of obtaining the register based on given digital pin number involves two stages. First, the pin number is converted to port number (represented by constant PD, PB etc.) using the digital_pin_to_port_PGM array. Then this port number is used as an index to the port_to_output_PGM array to obtain the register itself.

This two-staged process may seem complicated, but if the pin number was used directly as the index into some array containing registers, this array would need to have as many members as there are pins in the given Arduino board (about 50 in case of Arduino Mega). As there are several such arrays needed for the various registers, this would take up lot of memory. In the two-staged version only one array needs to be of this big size. The register arrays then are only as big as the number of registers involved.

When using the option 2, decision needs to be made where to store the arrays – in data memory (RAM) or in program memory (Flash). Given the limited size of RAM memory in the MCU, it makes sense to store the table in Flash memory. This is the case in Arduino. However, there is disadvantage of slower access to the memory, although not as bad as it might seem. The 8-bit AVR CPU needs 3 clock cycles to read one byte from Flash memory (LPM instruction) and it needs 2 clock cycles to read 1 byte from RAM memory (LDS instruction).

Comparing the two options

In this part I try to compare the two options. Just to remind you, the option 2 is used in Arduino, the option 1 in the Wiring framework.

Speed

Option 1 can be compiled into direct manipulation of the appropriate register if the pin is compile time constant. In this case it is executed very fast. If it is not compile time constant, the resulting code is a series of conditional branches. Executing this code, of course, takes much longer time.

Option 2 results in the same code no matter if the pin number is compile time constant or not. As we will see later, the execution time in this case is very similar to that of the option 1 with non-constant pin number.

In summary, option 1 can be very fast (2 CPU cycles) or slow (about 50 cycles or even more, depending on number of pins). The option 2 is always slow, but may be slightly faster than the slow version of option 1 and the speed does not depend on number of pins.

If we could assume that in most cases the pin number is known at compile-time, then the option 1 is much better. Such assumption, of course, is not easy to prove. I will deal with this in more detail little later.

Speed consistency

Someone may consider it an advantage if the code executes at the same speed in all cases, rather than having fast and slow versions, even if this single-speed version is much slower than the fast version could be. There is also valid concern of breaking existing code which may rely on the slow implementation of the standard functions. Here it is not easy to decide for one of the options. The problem is also closely related with the last criterion I consider – in how many programs the fast version can be used.

Implementation and understandability of the code

Arguably, the option 2 is easier to understand than the option 1. The nested C conditional statements can be confusing and hard to “decode”. However, if we agree that the core library should be maintained (and eventually ported to other platforms) by experienced programmers, this is not an important argument for using the option 2.

In how many cases is (can be) the pin number a compile time constant

This question seems to be of highest importance when comparing the two options, because, obviously, if the fast version of option 1 will be used rarely, then it makes little sense to have it. It also seems to be the main argument why the option 1 is not used in official Arduino distribution – that the faster version will be rarely used, because it is not possible to pass the pin number to a “library” as a compile time constant.

Passing pin numbers to a library

First we should clarify the word library, which may be used either for precompiled binary or a set of functions in source code form. In case of precompiled binary, it is obvious that there is no way to pass compile time constant to the code as the library code is compiled beforehand. The build process in Arduino IDE is configured so that first all the “libraries” are built into a static library and then the user program is linked with this library. This way it is not possible to take advantage of the faster functions and the argument against option 1 is valid. However, it would be possible to skip creating the static library and just build all the sources together, without any performance loss. Actually, the build could even be faster this way, because the build tools are able to build only changed files.

If we assume that the “libraries” can be used in source form and built together with the user program, in which cases we know the pin number at compile time?

Constant pin numbers in user programs

It seems to me that with embedded systems in most cases we do know the pin numbers when writing the program. The pins to which e.g. an LED or a push-button are connected will not change in runtime. I can think of some cases when it is advantageous to use a non-constant expression referring to the pin, such as toggling a chain of LEDs in a loop or handling matrix keyboard or display. In these cases I would say we must accept some tradeoffs between the intuitive implementation of digital I/O in Arduino and the speed of the code. We can have efficient code in our program, which leads to inefficient code in the I/O library, or we write not-so-efficient code in our program which leads to efficient code in the library. Or we abandon the abstraction and manipulate the pins directly.
Basically, in the other cases, if we know the pin at compile time we could use it. But problems may arise if we want to pass this number to a function or a class in C++. We can illustrate it on two cases:
  • A function which manipulates a pin (and we want it to be flexible and pass the pin number to the function as an argument).
  • A C++ object which manipulates a pin

Passing pin number to a function

Imagine you want to write your own function, which takes pin number as an input parameter. An argument passed to such function will not be a compile time constant unless the compiler is smart enough to "trace" back the origin of the argument to find out it is constant. This is possible for inline functions and is used in the Wiring implementation of the digital I/O functions (because we are in fact also passing pin number to a function when calling e.g. digitalWrite), but it seems hard (it at all possible) to implement in general cases.

There may be some confusion about using const keyword for the function parameters - this could hint the compiler that the argument is a compile time constant. However, as I verified with the AVR-GCC compiler used in Arduino, this is not the case. It makes sense after all. The function arguments in C are passed by value, so a copy of the argument is created and the compiler needs to reserve memory (or a register) for this argument. Even if you call the function with simple constant, e. g. 5, the number will be copied into a variable and inside the function there is no way to tell if this variable contains what used to be compile-time constant or not. Also, the const keyword just means that the argument will not be modified inside the function. It does not force us to pass a const variable to the function.

Pin numbers and C++ objects

Example of the second case is the Servo class in Arduino. You can create several instances of this class to control several servo motors. For each instance you call function attach and give it the pin number. In this case it is easy to imagine that the pin number has to be stored as a variable in the servo object and therefore will not be compile time constant. I thought if the pin number was defined as const variable (and initialized in the constructor) things could get better, but the compiler still refused to use the fast version, even for inline member function. Here the Arduino's design decision to use "raw" pin number to refer to a pin in the program seems to really get in our way.

Which option is better?

I still believe in most programs the pin numbers can be used in such a way that they are compile time constants. There are valid cases in which this will not be possible, especially if we want to use object oriented approach, but that in my opinion does not justify using the slow version in all cases as it is done in Arduino library.

In general, it would be better to have a different design of the digital I/O interface, so that it leads to efficient code even with non-constant pin numbers. Such implementations are possible, for examples look at mbed or Atmel Software Framework.

Given the restrictions inherited from the now standard Arduino I/O functions, I still think that it is better to have the choice. If I need fast I/O and write my program so that is can be fast, using constant pin numbers, I get the fast result automatically in the native Arduino software, rather than having to use direct port manipulation or some third-party libraries. For me this is the reason I prefer the Wiring framework for programming my Arduino hardware - because it gives me this option.

Just as a note, I use Eclipse IDE instead of the default IDE shipped with Arduino, so switching to different software framework is rather easy. I haven’t tried to use the Wiring framework in the Arduino IDE, neither programming the Arduino hardware with Wiring IDE directly, although I think the later option should work natively. There is the option to select Arduino as the target hardware in the Wiring IDE.

Practical experiments

Finally, we get into some coding. Here are the results of experiments I did with the Arduino and Wiring libraries. As I mentioned earlier, I use Eclipse to program Arduino and the experimental programs with both Wiring and Arduino frameworks were built using this IDE and the AVR tools distributed with Arduino version 1.0.3. Same configuration was used for building Arduino and Wiring code, the target hardware was Arduino Uno. Compiler optimizations were set to optimize for size.

First, I compiled the following code and looked at the result in assembly.

C#
void loop()
{
    digitalWrite(13, HIGH);
    delay(500);
    digitalWrite(13, LOW);
    delay(500);
} 

Using the Wiring framework the following was the compiler output in assembly.

XML
00000c5c <_Z4loopv>:
 c5c:   2d 9a           sbi 0x05, 5 ; 5
 c5e:   64 ef           ldi r22, 0xF4   ; 244
 c60:   71 e0           ldi r23, 0x01   ; 1
 c62:   80 e0           ldi r24, 0x00   ; 0
 c64:   90 e0           ldi r25, 0x00   ; 0
 c66:   0e 94 f6 00     call    0x1ec   ; 0x1ec <delay>
 c6a:   2d 98           cbi 0x05, 5 ; 5
 c6c:   64 ef           ldi r22, 0xF4   ; 244
 c6e:   71 e0           ldi r23, 0x01   ; 1
 c70:   80 e0           ldi r24, 0x00   ; 0
 c72:   90 e0           ldi r25, 0x00   ; 0
 c74:   0e 94 f6 00     call    0x1ec   ; 0x1ec <delay>
 c78:   08 95           ret

The call to digitalWrite is replaced by single instruction (SBI - set bit in I/O register or CBI - clear bit in I/O register) directly in the loop. Naturally, we get the same result if instead of directly writing the pin number (13) we use #define or define a const int variable to hold the pin number (e.g. const int pin = 13;).

If the pin number is non-constant int, e.g. defined as int pin = 13; and then used in call to digitalWrite(pin, HIGH); the following code results (incomplete listing):

XML
00000d70 <_Z4loopv>:
 d70:   80 91 42 01     lds r24, 0x0142
 d74:   61 e0           ldi r22, 0x01   ; 1
 d76:   0e 94 66 01     call    0x2cc   ; 0x2cc <_pinWrite>  

The call to digitalWrite is replaced by call to _pinWrite and the pin number and HIGH value (1) are passed into the function in registers R24 and R22. In the first line the pin number is loaded from a variable into R24.

Using the Arduino framework the same code in C results in this (partial listing).

XML
00000b52 <loop>:
b52:    8d e0           ldi r24, 0x0D   ; 13
 b54:   61 e0           ldi r22, 0x01   ; 1
 b56:   0e 94 55 05     call    0xaaa   ; 0xaaa <digitalWrite>
 b5a:   64 ef           ldi r22, 0xF4   ; 244
 b5c:   71 e0           ldi r23, 0x01   ; 1
 b5e:   80 e0           ldi r24, 0x00   ; 0
 b60:   90 e0           ldi r25, 0x00   ; 0
 b62:   0e 94 82 04     call    0x904   ; 0x904 <delay> 
The result is very similar to the previous one. The pin number in this case was constant (13) and is loaded into R24 in the first line. We get the same code no matter if the pin number is written directly or defined as const int or int variable.
I do not show the assembly listings for the _pinWrite and digitalWrite functions as this would take up too much space and anyway is hard to understand without experience with assembly programming. As a simple measure, it can be said that the Arduino version contains about 80 instructions and the Wiring’s version (_pinWrite) about 70. So, in size the two versions are pretty much the same. As for the speed, detailed analysis would be necessary, as both versions contain branches and not all instructions will be executed. I guess that the speed of the Arduino version will not vary much, as it takes the same time to load a value from table no matter what the index is. On the other hand, the Wiring version with series of compare instructions should take longer for higher pin numbers.

Speed comparison

Now let’s look at the speed. It would, in fact, be more proper to analyze the assembly listings generated by the compiler and give the numbers of clock cycles needed to execute the functions in all the cases (various pin numbers resulting in different number of conditional statements executed in Wiring version etc.). But this would be rather time consuming and boring. If we accept some inaccuracy we can simply compare the execution times. Here is the program used for the speed tests. It is a simple loop which toggles digital pin at full speed. The time it took to execute the loop is measured using Arduino’s/Wiring’s micros function.
C#
i = 255;
start = micros();
while ( i-- > 0)
{
    digitalWrite(LED_PIN, HIGH);
    digitalWrite(LED_PIN, LOW);
}
end = micros(); 
One version of the program used 8-bit variable set to 255 and the other 16-bit variable set to 1000.

The following table shows how long it took to execute the digitalWrite function once. The symbol “us” is used for microsecond.

Time per digitalWrite Arduino Wiring, const pin* Wiring, non-const pin
8-bit loop variable 4.09 us 0.23 us 4.53 us
16-bit loop variable 4.20 us 0.25 us 4.63 us

* In this case we are really measuring the loop control code rather than the I/O operation itself, see below.

As can be seen, the Wiring version takes about 0.23 us if the pin number is known at compile time and 4.5 us for non-constant pin number. The Arduino version always takes 4.1 us.

Note that the times presented in the table are not the real time of the function execution. They include also the time needed for executing the loop itself and for getting the start and stop timestamps. The error brought by this supporting code is relatively negligible for the slow versions of the digitalWrite, but in case of the fast version used in Wiring when pin number is known at compile time the error may exceed the duration of the measured operation itself. At 16 MHz clock used in Arduino, execution of the CBI and SBI instructions (which each take 2 cycles) should be 0.125 us while we obtain 0.23 us.

Conclusion

It is rather well known that the functions for manipulating digital I/O in Arduino are slow. The reasons are also known and discussed in the community and solutions are offered (see here). It is not that well known that the Wiring framework contains implementation of these functions which does exactly what the solutions for Arduino propose – it is very fast if the pin number is compile time constant and only if it is not known, it becomes slow. The Wiring framework thus offers a tool to compare the two approaches to abstracting the digital pins easily. It also offers alternative software framework for programming the Arduino hardware without the need of using direct port manipulation (which will not work with different Arduino hardware versions), or using third-party libraries for the fast I/O.

I attempted to compare the implementations used in Arduino and Wiring above, so in this conclusion I will just summarize the pros and cons of both versions and present my opinion.

Arduino implementation uses table (an array) located in program memory to map the digital pin number to the MCU registers. Wiring version uses conditional operators to perform this mapping, which results in very efficient code if the pin number is known at compile time and the conditions are resolved by the compiler. If the pin number is not constant, the resulting program will contain series of branches (condition evaluations).

Arduino pros:

  • Easy to understand and write (easy to port)
  • Consistent speed (although slow); does not depend on whether the pin number is compile time constant and/or the value of the pin number itself.

Arduino cons:

  • Slow; what could be a single instruction in “native” C code for the MCU, become 50 or more instructions.
  • Unsafe code - exceeding the boundaries of the arrays is not handled. Could be fixed at the cost of slowing it down little more.

Wiring pros:

  • Very fast if we know the pin number at compile time

Wiring cons:

  • Harder to understand and write (nested conditional operators)
  • The execution speed is not consistent; the functions may run at different speed in different programs – depends on using the pin as compile time constant. Also for higher pin numbers which are evaluated later in the series of evaluations, the function can be little slower than for low pin numbers.

This list can provide some overview. There are certainly some pros and cons I missed. Anyway, in my opinion the crucial questions for preferring one version to the other is the number of cases in which the pin number is (can be) known at compile-time. In other words, in how many cases we can benefit from the fast version. I did not perform any serious research on this question, so the following lines represent just my opinion. Note also that I take the design of the Arduino/Wiring API as a given thing. That is, we need to have functions which take the pin number as a simple number equal to the number of the pin in its name. If another interface was used, that could for sure change the situation.

I believe that in most cases the pin number can be defined as a constant. I assume that the target projects can use the Arduino/Wiring framework in source code form. It may be more comfortable to use it as a pre-build library, but this makes it impossible to benefit from the compile-time constant optimization. Even the Arduino IDE in the current version (1.0.3) rebuilds the library every time user program is built, so there seems no difficulty in building the library with the user program in source form instead of first building the library and then linking it with the user program.

There are some cases when it is not possible to have pin number constant at compile-time without sacrificing the efficiency of the program (but in this case the efficiency of the digitalRead/Write API should also be considered). There are also some cases when using constant pin number seems to clash with the principle of object oriented programming – using multiple instances of one class which each operate on a different pin or set of pins. Arduino itself is designed in an object-oriented way even though this seems to be a bit of an overkill for most use-cases. For example, the widely used Serial and Wire objects are in fact single “built-in” instances of their respective classes. While it is possible to imagine project which will use more than one serial or I2C interface, such cases are not common and it is arguable whether we need universal code, capable of using several instances of the class.

Similarly, in case of the LCD class, it is nice to allow the user to specify any set of pins to interface the display, but is it worth the performance problems? It seems reasonable to me to impose some restrictions on the design of the user hardware to allow for more efficient code in the software library.

Another case when the pin number is not constant is bad user program design, such as, unfortunately, still the examples included in Arduino IDE. Using int pin = 13; will, of course, result in ineffective code, while adding single word const could allow the compiler to produce optimized code at almost no cost in terms of brain capacity of the programmer.

Conclusion (of the conclusion)

For the above reasons I believe that the Wiring approach is better and should be adopted in Arduino. It seems that in most programs the pin numbers are (or could easily be) defined as constant. The main argument for not adopting this approach seems to be that only user programs and few libraries with hard-coded pin numbers could use the fast version. This is true, but as mentioned above most of the libraries could probably use hard-coded pin numbers and the user programs seem to be the most important use-case to me.

On the other hand, I admit that for most users it is not important whether the digitalWrite function takes 0.15 or 4.5 microseconds to execute. From this point of view it is possible to understand that the effort needed to switch the implementation in Arduino from memory-based tables to conditional macros may not be worth the trouble. For the majority of users there would be no difference and the minority who needs faster core or just cannot cope with the idea that toggling a pin should take 50 CPU cycles can either use some of the fast I/O solutions available, write their own code or use the Wiring framework to program their Arduino hardware.

Let me conclude that the purpose of this article is not to convince the Arduino team to change the implementation. Rather I wanted to provide overview of the problem and summarize the pros and cons of the solutions. Arduino will remain a useful tool no matter if the digital I/O functions are slow or fast, but as with any other tool one should be aware of its limitations. For those who think about writing their own software framework for embedded systems this Arduino problem shows how important it is to consider all the use cases when designing the API.

The solution

The above text, which I wrote about a year ago, actually does not give any satisfactory solution. It basically says that you can use the Wiring library if you need fast digital I/O and your pins are compile-time constants. But that is not easy and for pin numbers stored in variables it may result in even slower digital I/O than with original Arduino library. After a year I returned to the problem and tried to find a better solution. And I think I found it. Please see my new article for details: http://www.codeproject.com/Articles/732646/Fast-digital-I-O-for-Arduino.
Just to give you some overview, this solution is easy to add to normal Arduino IDE (just copy few files). It works as fast as the Wiring (macros) version for pins known at compile-time and for pins stored in variables it executes in about half the time of the standard Arduino digital I/O functions.

History

2013-05-10 First version.

2014-05-16 Added information about faster version of digital I/O for Arduino.


 

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)