SPI interrupts versus polling
SPI peripheral on the ATmega328 microcontroller only has a single byte buffer. Since this microcontroller family doesn't have a DMA controller this means that when it wants to transfer multiple bytes the CPU must re-fill the buffer each time 8 bits have been transferred over the line. There are two straightforward mechanisms to do that: the software can poll a flag in the SPI status register and wait in a busy loop until the hardware peripheral has emptied the buffer, or it can install an interrupt routine for the serial transfer complete interrupt request which fills the buffer when necessary, without the need for polling.
Arduino ships with a SPI library that uses the former approach: it transfers one byte at a time and uses a busy loop to wait for transfer to finish. While playing with the OLED shield I was wondering if an interrupt request would be more efficient. The usual SPI use case with such a display is that the CPU calculates some pixel values and stores them in a buffer. This buffer is then pushed through the SPI to the display controller and the CPU goes to calculate another buffer worth of pixels. I thought it might be possible to get a higher frame rate if the CPU could do some calculations while the transfer was in progress and only get interrupted when it needs to fill the transmit buffer.
Hence I came up with two implementations of the seps525_dataBurst() function:
/* polling method */ void seps525_dataBurst(const uint16_t* values, int len) { uint8_t *p = (uint8_t*) values; for(; len > 0; --len) { /* start the transfer */ SPDR = *(p+1); /* wait for the transfer to finish */ while (!(SPSR & _BV(SPIF))); SPDR = *p; p += 2; while (!(SPSR & _BV(SPIF))); } }
and
/* interrupt request method */ static uint8_t* volatile burstData; static volatile int burstLen = -1; ISR(SPI_STC_vect) { burstLen--; if(burstLen < 0) { /* finished, turn off the interrupt request */ SPCR &= ~_BV(SPIE); } else { /* start the transfer of the next byte */ SPDR = *burstData; burstData++; } } void seps525_dataBurst(uint8_t* values, int len) { /* wait if the software buffer is not empty */ while(burstLen >= 0); burstLen = len; burstData = values; /* enable interrupt */ SPCR |= _BV(SPIE); /* start the transfer of the first byte */ SPI_STC_vect(); }
(interrupt request routine here is somewhat simpler and doesn't correct for the CPU byte order)
Surprisingly, the polling function outperforms the interrupt request by almost a factor of 2! On the plot below the upper trace shows the performance of the polling routine while the lower trace shows the interrupt routine. Both were called from an identical animation event loop. When the signal connected to the oscilloscope is low the SPI transfer of a single display frame is in progress, so the period of the signal shows the frame rate.
As shown the polling routine achieves around 15 frames per second while the interrupt gets to around 8 frames per second.
In hindsight this result might not be so surprising. SPI in this case works at half the CPU frequency, which means that the CPU can only execute less than 16 instructions per byte sent. The polling loop wastes less time per transfer than an overhead of an interrupt call.
What is the lesson here? Certainly that a more complicated solution is not always better. Interrupt routine was tricky to get right, has more than twice the amount of code and is harder to understand for someone looking at the code for the first time (not to mention that it requires global variables). In embedded software, just like with any other kind of programming, it makes sense to do some profiling to see if it is actually worth it to complicate the code with performance optimizations.
Hi Tomaž,
Thanks for the writeup, I know this is an older post but I found this from a google search wondering the same thing about spi transfer - I do remember reading in one of the atmel docs that the atmega328 takes 5 clock cycles gong into and out of an isr, so maybe that's a contributing factor as well?
Thanks!
-Josh