Last week I wrote about some typical interrupt response times you get from an Arduino and Raspberry Pi, if you follow basic examples from documentation or whatever comes up on Google. I got some quite unexpected results, like for instance a Python script that responds faster than a compiled C program. To check some of my guesses as to what caused those results, I did another set of measurements.
For Arduino, most response times were grouped around 9 microseconds, but there were a few outliers. I checked the Arduino library source and it indeed always enables AVR timer/counter0 overflow interrupt. If timer interrupt happens at the same time as the GPIO interrupt I was measuring, the GPIO interrupt can get delayed. Performing the measurement with the timer interrupt masked out indeed removes the outliers:
With timer off, all measured response times are between 9.1986 to 8.9485 μs. This is a 0.2501 μs long interval. It fits perfectly with theory - at 16 MHz CPU clock and instruction length between 1 and 5 cycles, uncertainty for interrupt latency is 0.25 μs.
The second weird thing was the aforementioned discrepancy between Python and C on Raspberry Pi. The default Python library uses an ugly hack to bypass the kernel GPIO driver and control GPIO lines directly from user space: it mmaps a range of physical memory containing GPIO registers into its own process memory space using
/dev/mem. This is similar to how X servers on Linux (used to?) access graphics hardware from user space. While this approach is very unportable, it's also much faster since you don't need to do context switches into kernel for every operation.
To check just how much faster mmap method is on Raspberry Pi, I copied the GPIO access code from the RPi.GPIO library into my test C program:
As you can see, the native program is now faster than the interpreted Python script. This also demonstrates just how costly context switches are: the sysfs version is more than two times slower on average. It's also worth noting that both RPi.GPIO and my C program still use
select() on a sysfs file to wait for the interrupt. Just output pin change can be done with direct memory accesses.
Finally, Raspberry Pi was faster when the CPU was loaded which seemed counterintuitive. I tracked this down to automatic CPU frequency scaling. By default, Raspberry Pi Zero seems to be set to run between 700 MHz and 1000 MHz using
ondemand governor. If I switch to
performance governor, it keeps the CPU running at 1 GHz at all times. In that case, as expected, the CPU load increases the average response time:
It's interesting to note that Linux kernel comes with pluggable idle loop implementations (CONFIG_CPU_IDLE). The idle loop can be selected through
/sys/devices/system/cpu/cpuidle in a similar way to the CPU frequency governor. The Raspbian Jessie release however has that disabled. It uses the default idle loop for ARMv6 processors. Assembly code has been patched though. The ARM Wait For Interrupt WFI instruction in the vanilla kernel has been replaced with some mcreq (write to coprocessor?) instructions. I can't find any info on the JIRA ticket referenced in the comment and the change has been added among other BCM-specific changes in a single 6400-line commit. Idle loop implementation is interesting because if it puts the CPU into a power saving mode, it can affect the interrupt latency as well.
As before, source code and raw data is on GitHub.