Embedded systems were traditionally the domain of microcontrollers. You programmed them in C on bare metal, directly poking values into registers and hooking into interrupt vectors. Only if it was really necessary you would include some kind of a light-weight operating system. Times are changing though. These days it's becoming more and more common to see full Linux systems and high-level languages in this area. It's not surprising: if I can just pop open a shell, see what exceptions my Python script is throwing and fix them on the fly, I'm not going to bother with microcontrollers and the whole in-circuit debugger thing. Some even say it won't be long before we will all be just running web browsers on our devices.
It seems to be common knowledge that the traditional approach really excels at latency. If you're moderately careful with your code, you can get your system to react very quickly and consistently to events. Common embedded Linux systems don't have real-time features. They seem to address this deficiency with some combination of "don't care", "it's good enough" and throwing raw CPU power at the problem. Or as the author of RPi.GPIO library puts it:
If you are after true real-time performance and predictability, buy yourself an Arduino.
I was wondering what kind of performance you could expect from these modern systems. I tend to be very conservative in my work: I have a pile of embedded Linux-running boards, but they are mostly gathering dust while I stick to old-fashioned Cortex M3s and AVRs. So I thought it would be interesting to do some experiments and get some real data about these things.
To test how fast a program can respond to an event, I chose a very simple task: Raise an output digital line whenever a rising edge happens on an input digital line. This allowed me to very simply measure response times in an automated fashion using an USB-connected oscilloscope and a signal generator.
I tested two devices: An Arduino Uno using a 16 MHz ATmega328 microcontroller and an Raspberry Pi Zero using a 1 GHz ARM-based CPU running Raspbian Jessie. I tried several approaches to implementing the task. On Arduino, I implemented it with an interrupt and a polling loop. On Raspberry Pi, I tried a kernel module, a native binary written in C and a Python program. You can see exact source code on GitHub.
For all of these, I chose the most obvious approach possible. My implementations were based as much as possible on the preferred libraries mentioned in the documentation or whatever came up on top of my web searches. This meant that for Arduino, I was using the Arduino IDE and the library that comes with it. For Raspberry Pi, I used the RPi.GPIO Python library, the GPIO sysfs interface for native code in user space and the GPIO consumer interface for the kernel module (based on examples from Stefan Wendler). Definitely many of these could be further hand-optimized, but I was mostly interested here in out-of-the-box performance you could get in the first try.
Here is a histogram of 500 measurements for the five implementations:
As expected, Arduino and the Raspberry Pi kernel module were both significantly faster and more consistent than the two Raspberry Pi user space implementations. Somewhat shocking though, the interpreted Python program was considerably faster than my C program compiled into native code.
If you check the source, RPi.GPIO library maps the hardware registers directly into its process memory. This means that it does not need any syscalls for controlling the GPIO lines. On the other hand, my C implementation uses the kernel's sysfs interface. This is arguably a cleaner and safer way to do it, but it requires calls into the kernel to change GPIO states and these require expensive context switches. This difference is likely the reason why Python was faster.
Here is the zoomed-in left part of the histogram. Raspberry Pi kernel module can be just as fast as the Arduino, but is less consistent. Not surprising, since the kernel has many other interrupts to service and not that impressive considering 60 times faster CPU clock.
Arduino itself is not that consistent out-of-the-box. While most interrupts are served in around 9 microseconds (so around 140 CPU cycles), occasionally they take as long as 15 microseconds. Probably Arduino library is to blame here since it uses the timer interrupt for delay functions. This interrupt seems to be always enabled, even when a delay function is not running, and hence competes with the GPIO interrupt I am using.
Also, this again shows that polling on Arduino can sometimes be faster than interrupts.
Another interesting result was the effect of CPU load on Raspberry Pi response times. Somewhat counter intuitively, response times are smaller on average when there is some other process consuming CPU cycles. This happens even with the kernel module, which makes me think it has something to do with power saving features. Perhaps this is due to CPU frequency scaling or maybe the kernel puts an idle CPU into some sleep mode from which it takes longer to wake up.
In conclusion, I was a bit impressed how well Python scores on this test. While it's an order of magnitude slower than Arduino, 200 microseconds on average is not bad. Of course, there's no hard upper limit on that. In my test, some responses took two times as much and things really start falling apart if you increase the interrupt load (like for instance, with a process that does something with the SD card or network adapter). Some of the results on Raspberry Pi were quite surprising and they show once again that intuition can be pretty wrong when it comes to software performance.
I will likely be looking into more details regarding some of these results. If you would like to reproduce my measurements, I've put source code, raw data and a notebook with analysis on GitHub.