Measuring interrupt response times, part 2

27.04.2016 11:40

Last week I wrote about some typical interrupt response times you get from an Arduino and Raspberry Pi, if you follow basic examples from documentation or whatever comes up on Google. I got some quite unexpected results, like for instance a Python script that responds faster than a compiled C program. To check some of my guesses as to what caused those results, I did another set of measurements.

For Arduino, most response times were grouped around 9 microseconds, but there were a few outliers. I checked the Arduino library source and it indeed always enables AVR timer/counter0 overflow interrupt. If timer interrupt happens at the same time as the GPIO interrupt I was measuring, the GPIO interrupt can get delayed. Performing the measurement with the timer interrupt masked out indeed removes the outliers:

Effect of timer interrupt on Arduino response time.

With timer off, all measured response times are between 9.1986 to 8.9485 μs. This is a 0.2501 μs long interval. It fits perfectly with theory - at 16 MHz CPU clock and instruction length between 1 and 5 cycles, uncertainty for interrupt latency is 0.25 μs.

The second weird thing was the aforementioned discrepancy between Python and C on Raspberry Pi. The default Python library uses an ugly hack to bypass the kernel GPIO driver and control GPIO lines directly from user space: it mmaps a range of physical memory containing GPIO registers into its own process memory space using /dev/mem. This is similar to how X servers on Linux (used to?) access graphics hardware from user space. While this approach is very unportable, it's also much faster since you don't need to do context switches into kernel for every operation.

To check just how much faster mmap method is on Raspberry Pi, I copied the GPIO access code from the RPi.GPIO library into my test C program:

Response times using sysfs and mmap methods on Raspberry Pi.

As you can see, the native program is now faster than the interpreted Python script. This also demonstrates just how costly context switches are: the sysfs version is more than two times slower on average. It's also worth noting that both RPi.GPIO and my C program still use epoll() or select() on a sysfs file to wait for the interrupt. Just output pin change can be done with direct memory accesses.

Finally, Raspberry Pi was faster when the CPU was loaded which seemed counterintuitive. I tracked this down to automatic CPU frequency scaling. By default, Raspberry Pi Zero seems to be set to run between 700 MHz and 1000 MHz using ondemand governor. If I switch to performance governor, it keeps the CPU running at 1 GHz at all times. In that case, as expected, the CPU load increases the average response time:

Effect of cpufreq governor on Raspberry Pi response time.

It's interesting to note that Linux kernel comes with pluggable idle loop implementations (CONFIG_CPU_IDLE). The idle loop can be selected through /sys/devices/system/cpu/cpuidle in a similar way to the CPU frequency governor. The Raspbian Jessie release however has that disabled. It uses the default idle loop for ARMv6 processors. Assembly code has been patched though. The ARM Wait For Interrupt WFI instruction in the vanilla kernel has been replaced with some mcreq (write to coprocessor?) instructions. I can't find any info on the JIRA ticket referenced in the comment and the change has been added among other BCM-specific changes in a single 6400-line commit. Idle loop implementation is interesting because if it puts the CPU into a power saving mode, it can affect the interrupt latency as well.

As before, source code and raw data is on GitHub.

Posted by Tomaž | Categories: Digital | Comments »

Measuring interrupt response times

18.04.2016 15:13

Embedded systems were traditionally the domain of microcontrollers. You programmed them in C on bare metal, directly poking values into registers and hooking into interrupt vectors. Only if it was really necessary you would include some kind of a light-weight operating system. Times are changing though. These days it's becoming more and more common to see full Linux systems and high-level languages in this area. It's not surprising: if I can just pop open a shell, see what exceptions my Python script is throwing and fix them on the fly, I'm not going to bother with microcontrollers and the whole in-circuit debugger thing. Some even say it won't be long before we will all be just running web browsers on our devices.

It seems to be common knowledge that the traditional approach really excels at latency. If you're moderately careful with your code, you can get your system to react very quickly and consistently to events. Common embedded Linux systems don't have real-time features. They seem to address this deficiency with some combination of "don't care", "it's good enough" and throwing raw CPU power at the problem. Or as the author of RPi.GPIO library puts it:

If you are after true real-time performance and predictability, buy yourself an Arduino.

I was wondering what kind of performance you could expect from these modern systems. I tend to be very conservative in my work: I have a pile of embedded Linux-running boards, but they are mostly gathering dust while I stick to old-fashioned Cortex M3s and AVRs. So I thought it would be interesting to do some experiments and get some real data about these things.

Measuring interrupt response times on Arduino.

To test how fast a program can respond to an event, I chose a very simple task: Raise an output digital line whenever a rising edge happens on an input digital line. This allowed me to very simply measure response times in an automated fashion using an USB-connected oscilloscope and a signal generator.

I tested two devices: An Arduino Uno using a 16 MHz ATmega328 microcontroller and an Raspberry Pi Zero using a 1 GHz ARM-based CPU running Raspbian Jessie. I tried several approaches to implementing the task. On Arduino, I implemented it with an interrupt and a polling loop. On Raspberry Pi, I tried a kernel module, a native binary written in C and a Python program. You can see exact source code on GitHub.

Measuring interrupt response times on Raspberry Pi.

For all of these, I chose the most obvious approach possible. My implementations were based as much as possible on the preferred libraries mentioned in the documentation or whatever came up on top of my web searches. This meant that for Arduino, I was using the Arduino IDE and the library that comes with it. For Raspberry Pi, I used the RPi.GPIO Python library, the GPIO sysfs interface for native code in user space and the GPIO consumer interface for the kernel module (based on examples from Stefan Wendler). Definitely many of these could be further hand-optimized, but I was mostly interested here in out-of-the-box performance you could get in the first try.

Here is a histogram of 500 measurements for the five implementations:

Histogram of response time measurements.

As expected, Arduino and the Raspberry Pi kernel module were both significantly faster and more consistent than the two Raspberry Pi user space implementations. Somewhat shocking though, the interpreted Python program was considerably faster than my C program compiled into native code.

If you check the source, RPi.GPIO library maps the hardware registers directly into its process memory. This means that it does not need any syscalls for controlling the GPIO lines. On the other hand, my C implementation uses the kernel's sysfs interface. This is arguably a cleaner and safer way to do it, but it requires calls into the kernel to change GPIO states and these require expensive context switches. This difference is likely the reason why Python was faster.

Histogram of response time measurements (zoomed)

Here is the zoomed-in left part of the histogram. Raspberry Pi kernel module can be just as fast as the Arduino, but is less consistent. Not surprising, since the kernel has many other interrupts to service and not that impressive considering 60 times faster CPU clock.

Arduino itself is not that consistent out-of-the-box. While most interrupts are served in around 9 microseconds (so around 140 CPU cycles), occasionally they take as long as 15 microseconds. Probably Arduino library is to blame here since it uses the timer interrupt for delay functions. This interrupt seems to be always enabled, even when a delay function is not running, and hence competes with the GPIO interrupt I am using.

Also, this again shows that polling on Arduino can sometimes be faster than interrupts.

Effect of CPU load on response time.

Another interesting result was the effect of CPU load on Raspberry Pi response times. Somewhat counter intuitively, response times are smaller on average when there is some other process consuming CPU cycles. This happens even with the kernel module, which makes me think it has something to do with power saving features. Perhaps this is due to CPU frequency scaling or maybe the kernel puts an idle CPU into some sleep mode from which it takes longer to wake up.

In conclusion, I was a bit impressed how well Python scores on this test. While it's an order of magnitude slower than Arduino, 200 microseconds on average is not bad. Of course, there's no hard upper limit on that. In my test, some responses took two times as much and things really start falling apart if you increase the interrupt load (like for instance, with a process that does something with the SD card or network adapter). Some of the results on Raspberry Pi were quite surprising and they show once again that intuition can be pretty wrong when it comes to software performance.

I will likely be looking into more details regarding some of these results. If you would like to reproduce my measurements, I've put source code, raw data and a notebook with analysis on GitHub.

Posted by Tomaž | Categories: Digital | Comments »

Another hard drive failure

07.02.2015 21:41

Earlier today one of my hard drives died. It was a fairly old 750 GB "Caviar GP" drive from a Western Digital "My Book" external enclosure. All it does now is emit an impressively loud metallic clicking noise.

I should have seen this coming, of course. At this point I have a pile of failed drives stashed in a box somewhere. I remember that this particular one has been unusually slow to start and mount for the last couple of times I used it. Also, smartd has previously reported "2 Currently unreadable (pending) sectors". Both of which I ignored, because I assumed this was yet another problem with the power supply. I had a "My Book" 12V external power supply fail before with similar symptoms.

I only used this drive for backups recently, so except for some archival copies of machines I no longer own, probably nothing of value was lost. Having at least a listing of contents before it failed would be nice though.

Disassembled Western Digital "My Book" external drive.

Of course, I opened it up to see if there's anything obvious wrong with it. The "My Book" USB interface board and the power supply are not the cause, because the drive has the same problem even when it is connected directly to a SATA port. I can hear the platters spinning and the clicking noise can only be caused by heads trashing around, so those are not stuck either.

Corrosion of surface finish on the controller PCB.

The only thing that immediately looks wrong is the unusual amount of corrosion on the hard drive controller PCB. It's bad enough that one some exposed test points both the immersion gold and the copper layer are completely gone. I'm not quite sure what could have caused that. As far as I can remember, this drive was sitting somewhere around my desk for the whole time, so it hasn't been exposed to any hostile environments. It might be a manufacturing defect of some sort - maybe the board was not rinsed well enough after processing.

Bottom side of the hard drive controller PCB.

I cleaned the pads where the motor and the head connect to the circuit board, but that didn't make any difference.

The copper below the green solder mask looks fine though. The bottom side of the PCB contains one large BGA chip. Maybe that one developed some bad connections, if the problem is indeed in the controller board. Just as an experiment, I also tried the disk-in-the-freezer trick, but that did not make the disk behave any differently.

Posted by Tomaž | Categories: Digital | Comments »

CubieTruck UDMA CRC errors

18.10.2014 20:07

Last year I bought a CubieTruck, a small, low-powered ARM computer, to host this web site and a few other things. Combined with a Samsung 840 EVO SSD on the SATA bus, it proved to be a relatively decent replacement for my aging Intel box.

One thing that has been bothering me right from the start though is that every once in a while, there were problems with the SATA bus. Occasionally, isolated error messages like these appeared in the kernel log:

kernel: ata1.00: exception Emask 0x10 SAct 0x2000000 SErr 0x400100 action 0x6 frozen
kernel: ata1.00: irq_stat 0x08000000, interface fatal error
kernel: ata1: SError: { UnrecovData Handshk }
kernel: ata1.00: failed command: WRITE FPDMA QUEUED
kernel: ata1.00: cmd 61/18:c8:68:0e:49/00:00:02:00:00/40 tag 25 ncq 12288 out
kernel:          res 40/00:c8:68:0e:49/00:00:02:00:00/40 Emask 0x10 (ATA bus error)
kernel: ata1.00: status: { DRDY }
kernel: ata1: hard resetting link
kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
kernel: ata1.00: supports DRM functions and may not be fully accessible
kernel: ata1.00: supports DRM functions and may not be fully accessible
kernel: ata1.00: configured for UDMA/133
kernel: ata1: EH complete

At the same time, the SSD reported increased UDMA CRC error count through the SMART interface:

UDMA CRC weekly error count on CubieTruck.

These errors were mostly benign. Apart from the cruft in the log files they did not appear to have any adverse effects. Only once or twice in the last 10 months or so did they cause the kernel to remount filesystems on the SSD as read-only, which required some manual intervention to get the CubieTruck back on-line.

I've seen some forum discussions that suggested this might be caused by a bad power supply. However, checking the power lines with an oscilloscope did not show anything suspicious. On the other hand, I did notice during this test that the errors seemed to occur when I was touching the SATA cable. This made me think that the cable or the connectors on it might be the culprit - something that was also suggested in the forums.

Originally, CubieTruck comes with a custom SATA cable that combines both power and data lines for the hard drive and has special connectors (at least considering what you usually see in the context of the SATA cabling) on the motherboard side.

Last few weeks it appeared that the errors were getting increasingly more common, so I decided to try replacing the cable. Instead of ordering a new CubieTruck SSD kit I improvised a bit: I didn't have proper connectors for CubieTruck's power lines at hand, so I just soldered the cables directly to the motherboard. On the SSD drive I used the standard 15-pin SATA power connector.

For the data connection, I used an ordinary SATA data cable. The shortest one I could find was about three times as long as necessary, so it looks a bit uglier now. The connector on the motherboard side also needed some work with a scalpel to fit into CubieTruck's socket. The original connector on the cable that came with CubieTruck is thinner than those on standard SATA cables I tried.

Replacement SATA cables for CubieTruck.

So far it seems this fixed the CRC errors. In the past few days since I replaced the cable I haven't seen any new errors pop up, but I guess it will take a month or so to be sure.

Posted by Tomaž | Categories: Digital | Comments »

GA 7VT600 lmsensors settings

02.05.2014 17:35

Recently I've put into use a relatively ancient Gigabyte GA 7VT600 1394 motherboard that's been gathering dust on the top shelf of my wardrobe. I used it to replace an even older MSI board which, while still working perfectly, was getting a bit slow.

After replacing the dead lithium battery for RTC and NVRAM, it seems to work just fine with stock Debian Wheezy and passes a few ad-hoc stress tests.

One thing I noticed though is that sensors tool from the lm-sensors package isn't very useful by default.

it87-isa-0290
Adapter: ISA adapter
in0:          +1.70 V  (min =  +0.00 V, max =  +4.08 V)
in1:          +1.33 V  (min =  +0.00 V, max =  +4.08 V)
in2:          +3.25 V  (min =  +0.00 V, max =  +4.08 V)
in3:          +2.86 V  (min =  +0.00 V, max =  +4.08 V)
in4:          +3.23 V  (min =  +0.00 V, max =  +4.08 V)
in5:          +1.89 V  (min =  +0.00 V, max =  +4.08 V)
in6:          +1.89 V  (min =  +0.00 V, max =  +4.08 V)
in7:          +3.01 V  (min =  +0.00 V, max =  +4.08 V)
Vbat:         +0.00 V  
fan1:        3308 RPM  (min =    0 RPM, div = 8)
fan2:           0 RPM  (min =    0 RPM, div = 8)
temp1:        +35.0°C  (low  = +127.0°C, high = +127.0°C)  sensor = thermistor
temp2:        +31.0°C  (low  = +127.0°C, high = +127.0°C)  sensor = thermistor
temp3:        +47.0°C  (low  = +127.0°C, high = +127.0°C)  sensor = thermal diode
intrusion0:  OK

The board obviously has a IT87 series chip that provides some hardware monitoring functionality (you need the it87 kernel module). Apart from the lack of useful labels, some voltages also seem to be divided by voltage dividers before being measured by it87. I would expect at least 5 V and 12 V lines there.

Figuring out which fan is which was trivial. For finding out other things, I compared the printout above with what the BIOS setup utility says. I picked out the most logical divider values for voltages. Since these also seem to fit the order of sensors, I'm relatively confident they are correct.

PC Health Status screen on GA 7VT600 motherboard.

in5 and in6 readings are very unstable and don't seem to be shown in the BIOS screen. I'm guessing they are not connected on this board. temp2 is also not shown, but seems to give reasonable values, so I'm guessing there is a temperature sensor connected there, but I don't know where it is.

So, for future reference, put this into /etc/sensors.d/ga-7vt600 to get a nicely labeled and properly calculated values for this hardware:

chip "it87-isa-0290"
    label temp1 "Sys Temp"
    label temp2 "Aux Temp"
    label temp3 "CPU Temp"

    label fan1 "CPU Fan"
    label fan2 "Sys Fan"

    label in0 "Vcore"
    label in1 "DDR Vtt"
    label in2 "+3.3V"
    label in3 "+5V"
    label in4 "+12V"
    label in7 "5VSB"

    compute in3 @*1.679, @/1.679
    compute in4 @*3.973, @/3.973
    compute in7 @*1.679, @/1.679
Posted by Tomaž | Categories: Digital | Comments »

CubieTruck Perl performance

23.01.2014 22:57

Two months ago I bought a CubieTruck, one of the many cheap, bare-bone ARM-based computers that keep popping-up everywhere these days. My idea was to replace the aging x86 server that is running this website with something more power-efficient. So I was looking for a reasonably powerful board with a proper SATA interface and a decent amount of RAM. Raspberry Pi was out of the question, but the latest incarnation of CubieBoard with a dual-core 1 GHz ARM Cortex-A7, 2 GB of RAM, SATA 2.0 and Gigabit Ethernet seemed to fit the bill.

Unfortunately I could not find any reliable benchmarks I could use to estimate how ARM SoCs perform in comparison with my existing setup. So before I decided to migrate I took a while to do some performance tests and get to know this hardware.

CubieTruck

The software setup I'm interested in benchmarking is somewhat archaic in these days of Node.js and NoSQL. I'm using Perl 5 with HTML::Template doing most of the heavy lifting (at least according to Devel::NYTProf profiler). Most parts are statically generated and some are dynamic using a handful of Speedy CGI Perl 5 scripts. These are combined into a consistent website you see here with a somewhat convoluted Apache configuration using the threaded worker.

In the following benchmarks I'm comparing:

  • An AMD Duron at 700 MHz, 1.2 GB RAM running stock x86 Debian Squeeze. Root filesystem is mounted from an IDE hard drive.
  • A CubieTruck A20 running armhf Debian Wheezy and the kernel supplied for the CubieTruck Ubuntu Server installation. Root filesystem is mounted from an SD card.

Both machines were connected through a 100 Mb/s Ethernet switch to a laptop which was running the remote end of the benchmarks.


First, to see how fast the static part of the web site is generated, I ran the full (single threaded) HTML rebuild. I measured the required user space CPU time with the time utility. This is the fastest run of three on each machine:

AMD DuronCubieTruck
CPU time to rebuild static pages45.3 s61.8 s

Then, to check if network was operating at the bit rate I thought it was, I ran iperf to measure TCP throughput between the server and the laptop:

AMD DuronCubieTruck
iperf throughput test94.0 Mb/s94.5 Mb/s

Finally, I ran a suite of tests using the Apache benchmarking tool. I measured how many requests per minute a server can handle for different types of content and different number of concurrent requests. Numbers in parentheses show size of HTTP body (without headers).

CubieTruck requests per second for a static HTML page.

CubieTruck requests per second for a dynamic HTML page.

CubieTruck requests per second for an image.

CubieTruck requests per second for API call.

The site rebuild is somewhat disappointingly almost one-third slower than on a 10 year old PC. However the single threaded Apache performance is on par with it. In the case of more concurrent users the CubieTruck of course has an advantage because of an additional CPU core. Actually in both cases with static content CubieTruck managed to saturate the line when there was more than one concurrent request.

I tried to make these tests in a way that the slow SD card in the CubieTruck would minimally affect their outcome. All of data should fit into the buffer cache, which is why in the first test I only took into account the fastest run and only user space CPU time. However I now suspect that the SD card still affected the numbers somehow (the rebuild operation is the heaviest of the tests regarding filesystem I/O). I don't know for sure how kernel computes the time returned by the time utility.

These results are good enough that I can't dismiss CubieTruck based on performance. If a proper SATA drive wouldn't speed it up, I could probably parallelize the build process with not much work. That should cut down on time if it's really Perl performance on ARM that is slowing it down. On the other hand I'm having some other concerns about using CubieTruck as a personal server so I'm not completely decided yet about putting it on my rack.

Posted by Tomaž | Categories: Digital | Comments »

Repairing the Happy Hacking Keyboard

29.09.2013 15:40

My trusty old Happy Hacking Keyboard has been working pretty reliably for the last four years. After fixing a botched plastic mold and strategically placing a piece of cardboard in its innards that is. Regarding the typing feel it is still my favorite keyboard that doesn't take a lot of space on a crowded desk and I only switch to a regular-sized Logitech when I'm working with EDA programs where I need functions keys a lot.

So I was pretty disappointed when it stopped working a week ago. Checking the kernel log revealed all sorts of random USB bus errors:

usb 6-1.1: USB disconnect, device number 14
usb 6-1: reset full-speed USB device number 13 using uhci_hcd
usb 6-1: device not accepting address 13, error -71
usb 6-1: reset full-speed USB device number 13 using uhci_hcd
usb 6-1: device firmware changed
hub 6-1:1.0: hub_port_status failed (err = -19)
hub 6-1:1.0: hub_port_status failed (err = -19)
hub 6-1:1.0: hub_port_status failed (err = -19)
hub 6-1:1.0: activate --> -19
usb 6-1: USB disconnect, device number 13
usb 6-1: new full-speed USB device number 15 using uhci_hcd
usb 6-1: string descriptor 0 read error: -71
usb 6-1: New USB device found, idVendor=04fe, idProduct=0008
usb 6-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
usb 6-1: can't set config #1, error -71
hub 6-0:1.0: port 1 disabled by hub (EMI?), re-enabling...

This looked like something systemic. Either the controller was resetting continuously or there was something wrong with the USB wiring between the controller and the computer.

A new, identical HHKB model goes for more than $110 today so I opened it up to see if there's anything I can do. After checking the cable with an ohm-meter my suspicion fell on the power supply which seems to consist of a 3.3 V LDO regulator and some capacitors. I could see no obvious transients on the power rails when the controller switched on after the keyboard was plugged in. One interesting thing I did see was that if negotiation with USB host fails the controller switches itself off completely, including its quartz oscillator.

Happy Hacking Keyboard Lite2 USB controller

Since I had some problems with flaky USB cables before, I removed the original cable and soldered a new cable directly to the circuit board. This fixed the problem! After poking around some more, it turned out that after re-soldering the connector to the PCB the original cable worked as well.

I removed the membrane before poking around the controller board since a hot soldering iron and plastics don't mix well. Re-inserting the soft matrix tails into the (non-ZIF) connectors was somewhat tricky. I resorted to using pliers plus a bit of paper to protect the delicate wires.

Re-inserting flexible cables into connectors using pliers

I also noticed that silver wires on the keyboard matrix itself seem to be developing a kind of a dark oxide on the outer edges. I don't remember for sure whether they looked like this from the start though. If something is eating away at the wires that definitely puts a kind of a definitive limit to this keyboard's longevity.

Possible oxidation on the keyboard matrix

In conclusion, just checking with a multimeter doesn't mean there's not a bad solder joint somewhere on a high-speed bus. The wisdom of checking first for bad RoHS soldering on mechanically (and thermally) stressed components confirmed itself again. Also, did you know there are a couple of alternative, open source Happy Hacking Keyboard controllers out there?

Posted by Tomaž | Categories: Digital | Comments »

Some notes about CC chips

20.05.2013 12:31

Here are two unusual things I noticed while working with Texas Instruments (used to be Chipcon) CC2500 and CC1101 integrated transceivers.

It appears that the actual bit rate in continuous synchronous serial mode can differ significantly from what Texas Instrument's SmartRF Studio 7 calculates. See for example the following measurements that were taken on a CC2500 chip using a 27 MHz crystal oscillator as a reference clock.

MDMCFG4 valueRF studio [baud]measured [baud]
0x8a50.038.5
0x8b100.077.2
0x8c200.0154.0
0x8d400.0305.0

These bit rates were measured using an oscilloscope attached to the clock output of the transceiver, so I trust them to be correct. Bit rates I measured on a CC1101 agree with what SmartRF Studio predicts.

Update: I revisited this issue and the problem was a bug in my code that caused MDMCFG3 register (which also affects data rate) not to be properly programmed on CC2500. Accounting for this bug, the data rates are within 1% of those calculated by SmartRF Studio 7 or from the formula given in the datasheet.

The other issue I saw is symbol mapping for 4FSK modulation in CC1101. It looks like it depends on the configured bit rate. For example, with 200 baud, the symbol to frequency mapping appears to be as follows:

symbolΔf
00−0.33 fdev
01−1.00 fdev
10+0.33 fdev
11+1.00 fdev

However, with 45 baud, the mapping is different, with symbol bit order apparently switched around:

symbolΔf
00−0.33 fdev
01+0.33 fdev
10−1.00 fdev
11+1.00 fdev

Update: It's possible this difference has something to do with when exactly the radio samples the data line in relation to the clock. Either I don't understand exactly what is going on or the radio isn't sampling the data when it is supposed to. Also, the factors of fdev in tables were wrong (symbol frequencies are equally spaced, with maximum deviation from central frequency equal fdev).

Of course, this doesn't matter if you are using two identically configured CC1101 chips on both ends of the radio link. But it is important if you want to use it to communicate with some other hardware.

Posted by Tomaž | Categories: Digital | Comments »

VESNA and signal synthesis

26.04.2013 20:49

Experimental radio equipment on VESNA can be used for more than spectrum sensing. Radio boards based on Texas Instruments CC2500 and CC1101 transceivers can also be used for packet transmission and reception and, perhaps somewhat surprisingly, also as flexible signal generators. These can be used in experiments when you want for instance to introduce a controlled interference in some system or to check if your spectrum sensor is working correctly.

Usually when I'm talking with people about what our hardware is capable of the conversation often starts with amazement at how small the radio part is and then inevitably turns to the question whether VESNA has a software defined radio. I have to answer no, it doesn't, and then there's usually an awkward silence because that seems like a dead-end for any serious research work these days.

CC1101 transceiver on SNE-ISMTV-868

True, these transceivers don't provide software access to the undemodulated baseband samples (like for instance USRP does). However they are still amazingly flexible. They offer plenty of reconfigurability and, after you get to know some of their quirks, can be adapted for a lot of weird use cases without hardware changes. If you skip the various high-level digital parts of the chip for proprietary packet format handling there's a flexible front-end in there that allows you to choose between a handful of frequency, amplitude and phase modulators and several options for channel filters and such.

The closest you can come to software defined radio is a transparent continuous transmit mode where you can feed the transceiver arbitrary binary data from the microcontroller and it will simply get modulated and fed into the antenna. There is a catch though, since the chips offer at most 2-bit quantization of the baseband signal before modulation (they were designed for simple digital transmissions after all). This means that you have to get creative if you want to approximate an analog modulation and be ready for plenty of quantization noise.

This is how it looks like on a spectrum analyzer when you run a direct digital synthesis algorithm on the microcontroller that generates a baseband sawtooth frequency sweep and use the amplitude modulator to up-convert it to 2.4 GHz:

RF signal synthesis using VESNA

(Click to watch RF signal synthesis using VESNA video)

Using delta-sigma modulation you can approximate arbitrary waveforms in this way. For instance, you can make a passable simulation of an (analog) wireless microphone transmission using a 4-FSK modulator in CC1101 tuned into the UHF band.

Of course, setting this up takes more work than popping a few blocks into GNU Radio Companion and part of my job is to make it more accessible to people using VESNA and our VESNA-based testbeds. If you're interested in such signal generation using Texas Instruments CC series, some platform-independent code capable of doing this should start hitting the vesna-spectrum-sensor repository on GitHub in the next few weeks.

Posted by Tomaž | Categories: Digital | Comments »

The lowly connector

11.04.2013 20:47

Here's a small addendum to my previous list of lessons regarding easy debugability of microcontroller boards.

It's not a bad idea to pay a bit of extra attention to the connectors that will be used when developing and debugging software. It's likely these will see much more use than any other connector on the board, especially if the board is also meant for teaching or research like VESNA.

The first concern is that it should be hard to connect it in a wrong way. IDC and similar pin header connectors are popular for JTAG. Use a male part that has a shroud so that it's impossible to connect it when displaced by a pin or turned 180 degrees. That might sound obvious, but when you're debugging that tricky race condition and switching the debugger between three systems on your table late in the evening, the last thing you need is a burned out board because of a misplaced connector.

The other thing that is also worth considering is the life time of the connector itself. While the life time of the part on the board might not be problematic, the part that stays with the developer can be. For VESNA one of the most common reasons to make a trip to the soldering station is a torn wire in the connector for the serial debug console. We use something similar to the Berg connector and that one doesn't really take many connect-disconnect cycles before either wires get torn out or little springs break and the metal part falls out of the plastic housing. It's not always obvious that has happened and again it's a pain to realize after a long debugging session that the reason a board is talking garbage is not due to a bug you're trying to catch but rather due to a broken ground line in your debug console.

Posted by Tomaž | Categories: Digital | Comments »

About Digi Connect ME

03.10.2012 23:32

Remember my recent story about Atmel modules? For the last two days I had a very strong déjà-vu feeling when I was trying to debug an issue that popped up in the last minute and is preventing a somewhat time critical experiment from being performed on one of our VESNA testbeds.

It turned out that while 7-bit ASCII strings were correctly transferred between VESNAs and a client calling a HTTP API, binary data sent down the same pipeline got corrupted somewhere on the way. Plus, to make things just a bit more interesting, it sometimes also made the whole network of VESNAs unreachable.

VESNA coordinator with Digi Connect ME module.

Now, problems like this are unfortunately quite a common occurrence. Before data from a sensor node reaches a client, it must pass through multiple hops in a (notoriously unreliable) ZigBee mesh network, a coordinator VESNA that serves as a proxy between the mesh and a TCP/IP tunnel, a Java application on the end of the tunnel that translates between a home-grown HTTP-like protocol used between VESNAs and proper HTTP and finally an Apache server acting as a reverse-proxy for the public HTTP API. Leaky abstractions and encapsulations are a plenty and it's not unusual to have some code somewhere along the line assume that the data it is passing is ASCII-only, valid UTF-8 or something else entirely.

I won't bore you with too many details about debugging. I opted for a top-down approach, with the Java layer being the main suspect based on previous experience. After that got a thorough check, I moved to sniffing the tunnel with Wireshark, which has an extra complication that it's SSL encrypted. Luckily, it doesn't seem to use Diffie–Hellman key exchange, meaning that the SSL dissector was quite effective once I figured out how to extract private keys from Java's keystore. That also seemed to look OK, so the next layer down was the SSL tunnel endpoint, which is a Digi Connect ME module.

This is basically a black box that takes a duplex RS-232 connection on one end and tunnels it through an encrypted TCP/IP connection. It's an deceptively simple description though. In fact Digi Connect ME is a miniature computer with an integrated web server for configuration, scripting support and a ton of other features (I have close to 500 pages of documentation for it on my computer and I only downloaded documents I though might be useful in debugging this issue).

VESNA coordinator hooked up to a logic analyzer.

Anyway, when I looked closely, the problem was quite apparent. On the RS-232 side the module was set to use XON/XOFF software flow-control. This obviously won't work when sending arbitrary binary data. Not only will the module interpret XON and XOFF bytes as special and so drop them from the pipeline, an XOFF that is not followed by XON will also halt the transmission, leading to hangs and timeouts. The fix looked simple enough: switch to hardware flow control using dedicated CTS and RTS lines.

As you might have guessed, it was not that easy. It turns out that when hardware flow control is enabled in Digi Connect ME, it will randomly duplicate characters when sending them down the serial line. Again, the suspicion first fell on our homegrown UART driver on the VESNA side, but a logic analyzer trace below confirmed that it's the actual Digi Connect module that is the culprit and not some bug on our side.

Logic analyzer trace from a Digi Connect ME module.

Now, at this point I'm seriously confused. Digi Connect ME is quite popular and, at least judging from browsing the web, used in a lot of applications. But after the ZigBit module it's also the second piece of hardware that exhibits such broken behavior that I can't believe it has passed any thorough quality check. All of my experience speaks that it must be us that are doing something wrong and in both cases I have tested just about all possibilities where VESNA could be doing something wrong. Actually, I want it to be a mistake on our end because that means I can fix it. But honestly, once you see wrong data being sent on a logic analyzer, I don't think there can be any more doubt. Must really everything turn rotten once you look into it with enough detail?

Posted by Tomaž | Categories: Digital | Comments »

IguanaWorks USB IR transceiver

19.08.2012 22:16

I bookmarked this little gadget a while ago. Having recently solved my problems with scriptable switching of PulseAudio audio outputs I thought it's time to finally order it and try to automate a few other home-theater related operations through it. Over the summer a few other infrared-communication related things also piled up on my desk, so having an universal IR transmitter and receiver within reach seemed like a good idea.

IguanaWorks USB IR transceiver

This is an IguanaWorks USB IR transceiver, the hybrid version. Hybrid meaning it has both an integrated IR LED and detector pair and a 3.5 mm jack for an external transmitter.

From the software side it comes with quite an elaborate framework, free and open source of course. Software also comes in the form of Debian binary and source packages, which is a nice plus. I did have a small problem compiling them though since the build process seems to depend on the iguanair user being present on the system. This user only gets created during installation which makes it kind of a catch-22 situation. Once compiled the packages did work fine on my Debian Squeeze system.

After everything is installed, you get:

  • igdaemon, a daemon that communicates with the actual USB dongle,
  • igclient, a client that exposes daemon functionality through a command-line interface,
  • a patched version of lircd daemon that includes a driver that offloads communication to igdaemon.

lirc is the usual Linux framework for dealing with infrared remotes. It knows how to inject keypresses into the Linux input system when an IR command is received and comes with utilities that can send commands back through the IR transmitter to other devices. This is the first time I'm dealing with it and I'm still a bit confused how it all fits together, but right now it appears some parts of the lirc ecosystem don't currently work with iguanair at all. For instance, xmode2 utility that shows received IR signals in an oscilloscope-like display isn't supported.

As I'm currently mostly interested in using this from my scripts, using igclient directly seems to be simplest option. There are also Python bindings for the client library, but they appear undocumented and I haven't yet took a dive into the source code to figure it out.

The client reports the received signals in the form of space-pulse durations, like this:

$ igclient --receiver-on --sleep 10
received 1 signal(s):
  space: 95573
received 3 signal(s):
  space: 7616
  pulse: 64
  space: 65536

I'm not yet sure what the units for those numbers are. According to the documentation the transmit functionality expects a similarly formatted input, but I have yet to try it out. It seems that if I want to plot the signals on a time line I will have to write my own utility for that.

To be honest I expected using this to be simpler from the computer side. In the end it basically has the same functionality as my 433 MHz receiver. One thing I also overlooked is that it's only capable of transmitting modulated on-off keyed transmissions (25 - 125 kHz carrier), which makes it useless for devices that don't use that, like shutter glasses. But given that I did basically zero research before ordering it I can't really blame anyone else but me for that (and that bookmark must have been at least a year old). Just yesterday I also stumbled on IR toy which appears to be a similar device. It would be interesting to know how it compares with the IguanaWorks one.

Posted by Tomaž | Categories: Digital | Comments »

On Atmel SerialNet ZigBit modules

13.08.2012 22:27

Don't use Atmel BitCloud/SerialNet ZigBit modules.

With this important public service announcement out of the way, let me start at the beginning.

Atmel makes ZigBit modules that contain an IEEE 802.15.4-compatible integrated radio from their AT86RF2xx family and an AVR-based microcontroller on a small hybrid component. The CPU runs a proprietary mesh-networking stack (BitCloud) built on top of the ZigBee specification and exposes a high-level interface on a serial line they call SerialNet (think "send the following data to this network address"-style interface). The module can be used either as a very simple way of adding mesh networking to some host device or as a stand-alone microcontroller with a built-in radio (Atmel provides a proprietary BitCloud SDK, so you can build your own firmware for the AVR).

Atmel ZigBit module on a VESNA SNR-MOD board.

At SensorLab we built a sensor node radio board for VESNA using these modules (more specifically, ATZB 900 B0 for 868 MHz and ATZB 24 B0 for 2.4 GHz links) as they appeared to be simple to use and would provide a temporary solution for connecting VESNAs with a wireless mesh until we come up with a working and reliable 6LoWPAN implementation. So far we have deployed well over 50 of these in different VESNA installations.

I can now say that these modules have been nothing but trouble from the start. First there is the issue of documentation. Atmel's documentation has always been superb in my memory. Compare one of their ATmega datasheets with the vague hand-waving STMicroelectronics calls microcontroller documentation and you'll know why. Unfortunately, the SerialNet user guide is an exception to this rule. They leave many corner cases undefined and you are left to your own experimentation to find out how the module behaves. There is almost no timing information. How long can you expect to wait for a response to a command? How long will the module be unresponsive and ignore commands after I change this setting? Even the hardware reset procedure is not described anywhere beyond a "Reset input (active low)".

The problems with this product however go deeper than this. In my experience developers, my self included, tend to be too quick to blame problems on bugs in someone else's code. When colleagues complained how buggy these modules are I said that it's much more likely a problem in our code or hardware design. That is until I started investigating myself the numerous problems we had with networking: the modules would return responses they shouldn't have according to the specification, they would say that they are connected to the network even though no other network node could communicate with them. Modules would even occasionally persistently corrupt themselves, requiring firmware reprogramming before they would start responding to commands again. Believe me, it's annoying to reach for a JTAG connector when the module in question is on a lamp post in some other part of the country.

For most of these bugs I can only offer anecdotal evidence. However I have been investigating one important issue for around two months now and I'm confident that there is something seriously wrong with these modules. I strongly suspect there is a race condition somewhere in Atmel's (proprietary and closed-source, of course) code that causes some kind of buffer corruption when a packet is received over the radio at the same time as the module receives a command over the serial line. This will cause the module to lose bytes on the serial line, making it impossible to reliably decode the communications protocol.

For instance, this is how the communications should look like over the serial line. Host in this case is VESNA and module is Atmel ATZB 900 B0:

→ AT+WNWK\x0d                                # host asks for network status
← DATA 0000,0,77:(77 bytes of data)\x0d\x0a  # module asynchronously reports received data
← OK\x0d\x0a                                 # module answers that network is OK
← DATA 0000,0,77:(77 bytes of data)\x0d\x0a  # module asynchronously reports received data

This is how it sometimes looks like:

→ AT+WNWK\x0d
← DATA 0000,0,77:(77 bytes of data)\x0d\x0a
← OK\x0d                                     # note missing \x0a
← DATA 0000,0,77:(77 bytes of data)\x0d\x0a

And sometimes it gets as bad as this:

→ AT+WNWK\x0d
← DATA 0000,0,77:(77 bytes of data)\x0d\x0a
← ODATA 0000,0,77:(77 bytes of data)\x0d\x0a # note only O from OK sent

An inviting explanation for these problems would be that we have a bad implementation of an UART on VESNA. Except that this happens even when the module is connected to a computer via a serial-to-USB converter and I have traces from a big and expensive Tektronix logic analyzer (as well as Sigrok) to prove that corrupted data is indeed present on the hardware serial line and not an artifact of some bug on the host side:

Missing Line Feed character from an Atmel ZigBit module.

A logic analyzer trace demonstrating a missing line feed character. Click to enlarge.

Data corruption on the serial line from an Atmel ZigBit module.

A logic analyzer trace demonstrating a jumbled-up OK and DATA response. Click to enlarge.

I have seen this happen in the lab under controlled conditions on 10 different modules and have good reasons to suspect the same thing is happening on the deployed 50-plus modules. Also, this bug is present in both BitCloud 1.14 and 1.13 and in both vanilla and security-enabled builds. All of this points to the fact that this problem is not due to some isolated fluke on our side.

For well over a month I have been on the line with Atmel technical support and while they had politely answered all of my mail they had also failed to acknowledge the issue or provide any helpful information even though I sent them a simple test case that reliably reproduces the problem in a few seconds. Of course, without their help there is exactly zero chance of getting to the bottom of this and given all of the above I seriously doubt this is anything else than a bug in their firmware.

At this point I have mostly given up any hopes that this issue will be resolved. During my investigation I did find out that decreasing the amount of chatter on the serial line decreases the probability of errors, so I did manage to work around this bug a bit by switching to non-verbose responses (ATV0) and using packets that are a few bytes shorter than the maximum (say 75 bytes for encrypted frames). This will hopefully improve the reliability of already deployed hardware. For the future, we will be looking into alternatives, as unfortunately 6LoWPAN still seems to be somewhat outside of our grip.

Posted by Tomaž | Categories: Digital | Comments »

Microcontroller board design tips

20.06.2012 17:52

I've been working with VESNA for a better half of a year now and over time I have come to know some oversights that were made in the original hardware design that make development of software for VESNA unnecessary complicated. Of course, mistakes like these are only obvious in hindsight, so I'm sharing here a few tips that should prevent them in future designs.

Have a JTAG port always accessible. JTAG is what allows you to do any kind of debugging beyond simple printfs. Together with an on-chip debugger and the GNU debugger it makes it possible to debug firmware running on the microcontroller in much the same way as an executable on your PC. You can inspect parts of the CPU's address space, program new firmware right from the debugger and set break and watchpoints. However a JTAG port takes between 4 and 6 pins and many microcontrollers allow you to turn it off and remap other peripherals to those pins.

Relying on the functionality hidden behind JTAG pins is a bad idea and should only be used as a last resort. Doubly so on a board that is meant for multiple purposes and is going to see a lot of software development. VESNA has several peripherals on the core board that can only be used after JTAG is turned off. Needless to say, these are a pain to develop drivers for. This means that people avoid using those parts of the hardware as far as possible and in the cases when the use is unavoidable, the software is badly tested and bugs tends to linger around for longer than for other parts of the system.

Straightforward physical accessibility is also important. On VESNA, JTAG pins are routed through the general-purpose expansion connector and we had several cases where accessing the connector was a simple geometric impossibility. It also means that all expansion boards have to be designed with the debugability of the underlying core board in mind. Having a separate connector would be a much better choice. When designing a board, think how it will be mounted and what kind of expansions will there be in the future. These should not make the debug port inaccessible.

Use a separate serial port for diagnostics and debug messages. The Cortex M3 microcontroller used on VESNA has several hardware UART peripherals, but only one is accessible on a convenient connector, which means that one serial line is used for data transfer as well as random debug messages used during development. This violates the rule of separation, which means that debug messages jumbled up together with data are not a rare occurrence.

Advice regarding accessibility of the JTAG port applies to the diagnostic port as well. VESNA's serial line is available on the connector that is also used for external power. This means that in some configurations the serial line simply can't be connected, making debugging hard. A much better design choice would be to bundle JTAG and a diagnostic serial console on the same connector and leave the power supply separate.

Actually, I have come to learn that special care should be given to accessibility of the circuit in general. It should be possible to run the basic core board in a way that makes it possible to probe most nets and microcontroller pins with an oscilloscope or a logic analyzer. The positioning of the JTAG on the expansion connector on VESNA means that in order to use a debugger you have to use a special break-out debug board which, until a later redesign, used to cover half of the core board, making a part of the circuit inaccessible to a oscilloscope probe without a soldering iron and some wires. When a bug hits that requires you to use a debugger and an oscilloscope you really don't need any additional complications.

In short, design for debugability, both software and hardware. It will make people happier.

Posted by Tomaž | Categories: Digital | Comments »

Organic display, part 2

02.06.2012 21:06

Here is the follow up to the organic LED display I was reviewing back in February and my recently manufactured Arduino shield for it.

There is not much more to tell about the hardware. I basically went with the design I described in my first post. Display controller is powered from the 3.3 V line supplied by Arduino while the display itself uses a LM2703 step-up converter connected to Arduino's 5 V supply. This micropower switcher is sufficient to power the OLED array, but the reference design proved to be a bit slow to react to fast changes in display current. For instance, with a checkerboard pattern on the display, the supply voltage will drop for almost 1 V when display scans the high-brightness areas and this causes some visible shadowing. If you look closely at the photo below, you can see that the centers of white squares are darker than the corners.

Arduino OLED shield showing a checkerboard pattern

The reference LM2703 design I followed uses a single ceramic 4.7 µF capacitor at the switcher output, so this problem could most likely be fixed by adding some larger output capacitors.

VDDH supply when displaying a checkerboard pattern

Voltage drop on the VDDH line when displaying the 45° checkerboard pattern.

From the software side, SEPS525 controller turned out to be an interesting challenge. Densitron's datasheet warns that you need to correctly program the controller (like pixel driving current and timings) before turning on the high-voltage supply or risk damaging the display. This is somewhat of a gamble, since the SPI interface only allows for one-way communication and except for turning the display on you can't know whether you actually managed to set up SPI correctly. In my case this resulted in a few mishaps where the switcher's coil loudly complained about currents that were probably well beyond display's maximum ratings. Luckily, the display appears to have survived all of them.

SEPS525 datasheet is quite long and vague in places, but once you figure out the SPI protocol and set up the configuration registers correctly the use is quite straightforward: you give it a drawing area (either full screen or a part of it) and burst the pixel data in row-major order while the controller takes care of incrementing the framebuffer pointer and line wrap. Display supports either 6/6/6 or 5/6/5 RGB color formats, however I'm not aware of any microcontroller SPI peripheral that is actually capable of transmitting 18 bit words through SPI. Arduino's certainly isn't, which means you must either sacrifice 2 bits of color depth or bit-bang the SPI protocol which is horribly slow.

Talking about speed, bursting the whole screen worth of data from ATmega328 using hardware SPI through Arduino's library takes around 100 ms. If you want to do some calculations on the pixels (like the checkerboard pattern above) this might quickly become 1000 ms, so don't expect high frame rates. Another problem is that the framebuffer takes 40960 bytes, which is too much to store even a single full-screen image in ATmega328's 32 kB flash. I cooked up a simple RLE-compressor for the example you see below, which approximately halves the size of the stored images for the price of some extra CPU cycles (drawing this particular one takes 180 ms).

Arduino OLED shield showing Preening RD by MochaDelight

Preening RD by MochaDelight

Hardware design and software is available under CC-BY-SA 3.0 and GNU GPL 3 licenses respectively. The repository below contains a SEPS525 driver and an Arduino sketch that demonstrates its use with the three push-buttons that are also on the shield:

https://www.tablix.org/~avian/git/arduino-seps525-oled.git

I have a few of these boards extra. So if you're interested in getting a bare PCB or a kit, please let me know. Be warned though that soldering the OLED connector makes for a good exercise in precision SMD soldering.

Posted by Tomaž | Categories: Digital | Comments »