SPI interrupts versus polling

17.06.2012 9:12

SPI peripheral on the ATmega328 microcontroller only has a single byte buffer. Since this microcontroller family doesn't have a DMA controller this means that when it wants to transfer multiple bytes the CPU must re-fill the buffer each time 8 bits have been transferred over the line. There are two straightforward mechanisms to do that: the software can poll a flag in the SPI status register and wait in a busy loop until the hardware peripheral has emptied the buffer, or it can install an interrupt routine for the serial transfer complete interrupt request which fills the buffer when necessary, without the need for polling.

Arduino ships with a SPI library that uses the former approach: it transfers one byte at a time and uses a busy loop to wait for transfer to finish. While playing with the OLED shield I was wondering if an interrupt request would be more efficient. The usual SPI use case with such a display is that the CPU calculates some pixel values and stores them in a buffer. This buffer is then pushed through the SPI to the display controller and the CPU goes to calculate another buffer worth of pixels. I thought it might be possible to get a higher frame rate if the CPU could do some calculations while the transfer was in progress and only get interrupted when it needs to fill the transmit buffer.

Hence I came up with two implementations of the seps525_dataBurst() function:

/* polling method */
void seps525_dataBurst(const uint16_t* values, int len)
{
  uint8_t *p = (uint8_t*) values;
  for(; len > 0; --len) {
    /* start the transfer */
    SPDR = *(p+1);
    /* wait for the transfer to finish */
    while (!(SPSR & _BV(SPIF)));
    SPDR = *p;
    p += 2;
    while (!(SPSR & _BV(SPIF)));
  }
}

and

/* interrupt request method */
static uint8_t* volatile burstData;
static volatile int burstLen = -1;

ISR(SPI_STC_vect) {
  burstLen--;
  if(burstLen < 0) {
    /* finished, turn off the interrupt request */
    SPCR &= ~_BV(SPIE);
  } else {
    /* start the transfer of the next byte */
    SPDR = *burstData;
    burstData++;
  }
} 

void seps525_dataBurst(uint8_t* values, int len)
{
  /* wait if the software buffer is not empty */
  while(burstLen >= 0);
  burstLen = len;
  burstData = values;

  /* enable interrupt */
  SPCR |= _BV(SPIE);

  /* start the transfer of the first byte */
  SPI_STC_vect();
}

(interrupt request routine here is somewhat simpler and doesn't correct for the CPU byte order)

Surprisingly, the polling function outperforms the interrupt request by almost a factor of 2! On the plot below the upper trace shows the performance of the polling routine while the lower trace shows the interrupt routine. Both were called from an identical animation event loop. When the signal connected to the oscilloscope is low the SPI transfer of a single display frame is in progress, so the period of the signal shows the frame rate.

As shown the polling routine achieves around 15 frames per second while the interrupt gets to around 8 frames per second.

Performance comparison between IRQ and polling methods

In hindsight this result might not be so surprising. SPI in this case works at half the CPU frequency, which means that the CPU can only execute less than 16 instructions per byte sent. The polling loop wastes less time per transfer than an overhead of an interrupt call.

What is the lesson here? Certainly that a more complicated solution is not always better. Interrupt routine was tricky to get right, has more than twice the amount of code and is harder to understand for someone looking at the code for the first time (not to mention that it requires global variables). In embedded software, just like with any other kind of programming, it makes sense to do some profiling to see if it is actually worth it to complicate the code with performance optimizations.

Posted by Tomaž | Categories: Code

Comments

Hi Tomaž,

Thanks for the writeup, I know this is an older post but I found this from a google search wondering the same thing about spi transfer - I do remember reading in one of the atmel docs that the atmega328 takes 5 clock cycles gong into and out of an isr, so maybe that's a contributing factor as well?

Thanks!
-Josh

Posted by Josh

Josh, there's always some overhead associated with interrupt routines. In this case, the act of entering and exiting the ISR seems to take about the same time as the useful work the ISR does (hence the frame rate dropping to one-half when using interrupts). I haven't checked exactly how many cycles were lost and where.

Posted by Tomaž

Thank you for the measurement and article. Very informative.

Factor 2 is not that bad after all, but for high speed transfer interrupt is obviously not useful.

So if you have a slow transfer with clock divider set to 128 it would be a good idea to use interrupt method, as it does not block the main-program.

Also if you have some analog data ro read and want them send with SPI, the interrupt can be useful.

Thanks for the study: I'm currently writing an Arduino driver for the AD9850 Direct Digital Synthesizer (DDS) chip, using SPI interface. There are a few implementations already available, but they all use bit-banging techniques and disregard the ATMEGA SPI controller completely! Using SPI instead of bit-banging leads to faster DDS frequency updates and less CPU time wasted shifting bits. I was considering using interrupts too, but your results show that it does not worth the hassle.

Thanks Avian , Very much informative.

ISR has an advantage of not blocking your main loop.

Thanks for this study.

I wonder what you are actually look at with your scope :
* polling : function duration
* ISR : function call, interrupt disable.
Is that correct ?

Of course There is a overhead with ISR (ISR latency) and there is also the use of global variable that requires the load of data at each execution, whereas in polling mode the CPU keep data in registers. The CPU also uses its call stack for context switching.
But using interrupt you can lower the latency response for other events : if you send a big buffer over SPI using polling, your application is locked sending data and this may be a problem.

We must always at design step consider this overhead over each transaction accomplishment.

Add a new comment


(No HTML tags allowed. Separate paragraphs with a blank line.)