Raspberry Pi Compute Module eMMC benchmarks

03.07.2016 13:52

I have a Raspberry Pi Compute Module development kit on my desk at the moment. I'm doing some testing and prototyping because we're considering using it for a project at the Institute. The Compute Module is basically a small PCB with the Broadcom's BCM2835 system-on-chip, 4 GB of flash ROM on an eMMC connection and little else. Even providing power supply at a number of different voltages is left as an exercise for the user.

Raspberry Pi Compute Module

I was wondering how the eMMC flash performs compared to the SD card on the more common Pies. I couldn't find any good benchmarks on the web. Wikipedia says that the latest eMMC standard rivals SATA speeds, but there's not much info around on what kind the Compute Module uses. I've used Samsung's ARM Chromebook with eMMC flash a while ago and that felt pretty fast. On the other hand, watching package updates scroll by on the Compute Module gave me a feeling that it's quite sluggish.

To get some more objective benchmark, I decided to compare the I/O performance with my Raspberry Pi Zero. Zero uses the same BCM2835 SoC, so the results should be somewhat comparable. I used the SD card that originally came with Zero preloaded with the Noobs distribution. It only has the raspberry logo printed on it, so I don't know the exact model or manufacturer. Both Compute Module and Zero were running the latest Raspbian Jessie.

One surprising discovery during this benchmark was that CPU on Zero runs between 700 MHz and 1 GHz while the Compute Module will only run at 700 MHz. These are the ranges detected at boot by bcm2835-cpufreq and default /boot/config.txt that came with the Raspbian image (i.e. no special overclocking). Because of this I performed the benchmarks on Zero at 700 MHz and 1 GHz.

For comparison, I also ran the same benchmark on my Cubietruck that has an Allwinner A20 system-on-chip with SATA-connected Samsung EVO 840 SSD and runs vanilla Debian Jessie.

This is the benchmark script I used. For each run, I chose the fastest result out of 5:

N=5

DEVICE=/dev/sda
#DEVICE=/dev/mmcblk0

I=0
while [ $I -lt $N ]; do
	hdparm -t $DEVICE
	I=$(($I+1))
done

I=0
while [ $I -lt $N ]; do
	hdparm -T $DEVICE
	I=$(($I+1))
done

I=0
while [ $I -lt $N ]; do
	dd if=/dev/zero of=tempfile bs=1M count=128 conv=fdatasync 2>&1
	I=$(($I+1))
done

I=0
while [ $I -lt $N ]; do
	echo 3 > /proc/sys/vm/drop_caches
	dd if=tempfile of=/dev/null bs=1M count=128 2>&1
	I=$(($I+1))
done

I=0
while [ $I -lt $N ]; do
	dd if=tempfile of=/dev/null bs=1M count=128 2>&1
	I=$(($I+1))
done

Here is write performance, as measured by dd. I wonder if dd figures are affected by filesystem fragmentation since it writes an actual file that might not be contiguous. I've been using Zero for a while with this Raspbian image while the Compute Module has been freshly re-imaged. Fragmentation shouldn't be as significant as with spinning disks, but it probably still has some effect.

Comparison of write performance.

Read performance, as measured by hdparm as well as dd. To remove the effect of cache when measuring with dd, I explicitly dropped kernel block device caches before each run.

Comparison of read performance.

From this it seems Compute Module's eMMC flash is slightly faster than the SD card, both on read and writes when comparing to Zero running at the same CPU clock frequency. It's interesting that Zero's results change significantly with CPU frequency, which seems to suggest that some part of SD card I/O is CPU bound. That said, performance seems to be somewhere roughly on the same order of magnitude. Cubietruck is significantly faster than both. In light of this result, it's sad that never versions of Cubieboard (and cheap ARM SoCs in general) dropped the SATA interface.

Finally, I tested block device cache performance. This more or less shows only RAM and CPU performance and shouldn't depend on storage speed.

Comparison of cached read performance.

Interestingly, Zero seems to be somewhat faster than the Compute Module at 700 MHz here. /proc/cpuinfo shows a different revision, although it's not clear to me whether that marks board revision or SoC revision. It might be that processors in Zero and Compute Module are not identical pieces of silicon.

In the end, I should note that these results are not super accurate. Complexities of I/O benchmarking on Linux aside, there are several things that might have affected the results. I already mentioned different filesystem state. A different SD card in Zero might give very different results (I didn't have a second empty card at hand to try that). While Raspberry Pies were idle during these tests, Cubietruck was running my web server and various other little tidbits that tend to accumulate on such machines.

Posted by Tomaž | Categories: Digital | Comments »

Ultra-narrowband and BPSK on TI CC chips

15.06.2016 20:56

Ultra-narrowband is a fancy new name for an old thing. The idea is to use a phase modulated carrier to transmit data at a very low bitrate. This saves energy and improves spectral efficiency (bits per second of data throughput per hertz of radio bandwidth). This in turn makes it convenient for battery-powered sensors and 20-billion Internet-connected toasters of tomorrow. For similar reasons, amateur radio operators have been chatting over PSK31, which is essentially the same thing as ultra-narrowband, for almost two decades now.

Currently SIGFOX seems to be the main commercial operator that's pushing this technology. They don't publish protocol details, however they've written a 3GPP proposal for C-UNB standard, which is public. The benefit of ultra-narrowband is that the simple BPSK modulation can be implemented with existing cheap and well tested integrated transceivers. Compare with the original Weightless standard for instance, which required custom silicon for its much more advanced physical layer and seems mostly forgotten these days (although it's not a completely fair comparison, since SIGFOX operates in unlicensed spectrum and Weightless had to deal with complexities of TV whitespaces, but I digress).

CC1101 transceiver on SNE-ISMTV-868

The CC-series of transceivers from Texas Instruments (like CC1101 and CC1120) has a lot of software-configurable modulation blocks built-in, but a BPSK modulator is not among them. However, you can find some references to ultra-narrowband being implemented with these chips which suggests that people are using them for this purpose. The C-UNB proposal also mentions that it can be easily implemented with modified FSK modulation, but doesn't go into more detail. I wanted to implement ultra-narrowband on CC1101 for a project we're doing at the Institute, so I looked into this possibility.

As any introductory course in telecommunications is quick to point out, frequency and phase modulation are basically the same thing. If you take a frequency modulator and feed it a time-derivative of a signal the result is identical to a phase modulator fed with the unmodified signal. In practice however it's not that simple. BPSK requires that the phase changes ±180° for each symbol change. The frequency-shift keying block in CC chips does not have a well-defined relation between frequency deviation and symbol rate. This means that it's hard to define how much signal phase changes during each symbol.

CC1101 does have a minimum-shift keying mode. This is a special form of frequency modulation that has well-defined phase shifts between symbols. Wikipedia says that the carrier phase continuously shifts by ±90° each symbol period, which does not sound useful at first:

Minimum-shift keying illustration.

In this interpretation of phase shifts, the carrier frequency fc is in the middle between frequencies for the two symbols, f0 and f1. This is the usual interpretation for frequency modulation, where you have approximately equal numbers of both symbols in a typical transmission.

However, if you transmit mostly one symbol, say f0, the receiver will consider that to be the carrier f'c. In that case, each occurrence of symbol 1 rotates the phase of the signal compared to f'c by +180°. This is exactly what you need to implement BPSK.

Alternative interpretation of phase in MSK.

BPSK requires that phase shifts are fast compared to symbol rate, so you want to encode each BPSK symbol with many MSK symbols. Ultra-narrowband uses symbol rates on the order of 100 symbols/s while CC1101 supports up to around 1 Msymbol/s. This means that you could have fast phase changes, but 10 MSK symbols per each BPSK symbol seems to suffice.

In the end, bits encoded into MSK symbols look somewhat similar to the theoretical time-derivative I mentioned above. You have an impulse of a single f1 symbol each time you have a transition from bit 0 to 1 or vice versa:

Using multiple MSK symbols as one BPSK symbol.

So far, this has been all theoretic. How well does it work in practice? The most obvious problem is frequency stability. The local oscillator on CC1101 is designed to be re-calibrated often, but you cannot calibrate it while you are transmitting. With such low bitrates, packet transmissions last for several seconds. During that time the frequency can drift quite a lot, especially compared to the very limited bandwidth of these transmissions. This is the usual problem with narrowband transmissions and CC1101 has no mechanism for compensating for it on reception. That is why I doubt a CC1101-to-CC1101 link would work in this way and I haven't tried it.

Transmission from a CC1101 to a specialized receiver however seems to work quite nicely in practice. You just have to use a SDR with a wide-enough channel for reception and compensate for frequency drifts in software. I have some lab measurements to share, but those will have to wait for another post.

Posted by Tomaž | Categories: Digital | Comments »

Measuring interrupt response times, part 2

27.04.2016 11:40

Last week I wrote about some typical interrupt response times you get from an Arduino and Raspberry Pi, if you follow basic examples from documentation or whatever comes up on Google. I got some quite unexpected results, like for instance a Python script that responds faster than a compiled C program. To check some of my guesses as to what caused those results, I did another set of measurements.

For Arduino, most response times were grouped around 9 microseconds, but there were a few outliers. I checked the Arduino library source and it indeed always enables AVR timer/counter0 overflow interrupt. If timer interrupt happens at the same time as the GPIO interrupt I was measuring, the GPIO interrupt can get delayed. Performing the measurement with the timer interrupt masked out indeed removes the outliers:

Effect of timer interrupt on Arduino response time.

With timer off, all measured response times are between 9.1986 to 8.9485 μs. This is a 0.2501 μs long interval. It fits perfectly with theory - at 16 MHz CPU clock and instruction length between 1 and 5 cycles, uncertainty for interrupt latency is 0.25 μs.

The second weird thing was the aforementioned discrepancy between Python and C on Raspberry Pi. The default Python library uses an ugly hack to bypass the kernel GPIO driver and control GPIO lines directly from user space: it mmaps a range of physical memory containing GPIO registers into its own process memory space using /dev/mem. This is similar to how X servers on Linux (used to?) access graphics hardware from user space. While this approach is very unportable, it's also much faster since you don't need to do context switches into kernel for every operation.

To check just how much faster mmap method is on Raspberry Pi, I copied the GPIO access code from the RPi.GPIO library into my test C program:

Response times using sysfs and mmap methods on Raspberry Pi.

As you can see, the native program is now faster than the interpreted Python script. This also demonstrates just how costly context switches are: the sysfs version is more than two times slower on average. It's also worth noting that both RPi.GPIO and my C program still use epoll() or select() on a sysfs file to wait for the interrupt. Just output pin change can be done with direct memory accesses.

Finally, Raspberry Pi was faster when the CPU was loaded which seemed counterintuitive. I tracked this down to automatic CPU frequency scaling. By default, Raspberry Pi Zero seems to be set to run between 700 MHz and 1000 MHz using ondemand governor. If I switch to performance governor, it keeps the CPU running at 1 GHz at all times. In that case, as expected, the CPU load increases the average response time:

Effect of cpufreq governor on Raspberry Pi response time.

It's interesting to note that Linux kernel comes with pluggable idle loop implementations (CONFIG_CPU_IDLE). The idle loop can be selected through /sys/devices/system/cpu/cpuidle in a similar way to the CPU frequency governor. The Raspbian Jessie release however has that disabled. It uses the default idle loop for ARMv6 processors. Assembly code has been patched though. The ARM Wait For Interrupt WFI instruction in the vanilla kernel has been replaced with some mcreq (write to coprocessor?) instructions. I can't find any info on the JIRA ticket referenced in the comment and the change has been added among other BCM-specific changes in a single 6400-line commit. Idle loop implementation is interesting because if it puts the CPU into a power saving mode, it can affect the interrupt latency as well.

As before, source code and raw data is on GitHub.

Posted by Tomaž | Categories: Digital | Comments »

Measuring interrupt response times

18.04.2016 15:13

Embedded systems were traditionally the domain of microcontrollers. You programmed them in C on bare metal, directly poking values into registers and hooking into interrupt vectors. Only if it was really necessary you would include some kind of a light-weight operating system. Times are changing though. These days it's becoming more and more common to see full Linux systems and high-level languages in this area. It's not surprising: if I can just pop open a shell, see what exceptions my Python script is throwing and fix them on the fly, I'm not going to bother with microcontrollers and the whole in-circuit debugger thing. Some even say it won't be long before we will all be just running web browsers on our devices.

It seems to be common knowledge that the traditional approach really excels at latency. If you're moderately careful with your code, you can get your system to react very quickly and consistently to events. Common embedded Linux systems don't have real-time features. They seem to address this deficiency with some combination of "don't care", "it's good enough" and throwing raw CPU power at the problem. Or as the author of RPi.GPIO library puts it:

If you are after true real-time performance and predictability, buy yourself an Arduino.

I was wondering what kind of performance you could expect from these modern systems. I tend to be very conservative in my work: I have a pile of embedded Linux-running boards, but they are mostly gathering dust while I stick to old-fashioned Cortex M3s and AVRs. So I thought it would be interesting to do some experiments and get some real data about these things.

Measuring interrupt response times on Arduino.

To test how fast a program can respond to an event, I chose a very simple task: Raise an output digital line whenever a rising edge happens on an input digital line. This allowed me to very simply measure response times in an automated fashion using an USB-connected oscilloscope and a signal generator.

I tested two devices: An Arduino Uno using a 16 MHz ATmega328 microcontroller and an Raspberry Pi Zero using a 1 GHz ARM-based CPU running Raspbian Jessie. I tried several approaches to implementing the task. On Arduino, I implemented it with an interrupt and a polling loop. On Raspberry Pi, I tried a kernel module, a native binary written in C and a Python program. You can see exact source code on GitHub.

Measuring interrupt response times on Raspberry Pi.

For all of these, I chose the most obvious approach possible. My implementations were based as much as possible on the preferred libraries mentioned in the documentation or whatever came up on top of my web searches. This meant that for Arduino, I was using the Arduino IDE and the library that comes with it. For Raspberry Pi, I used the RPi.GPIO Python library, the GPIO sysfs interface for native code in user space and the GPIO consumer interface for the kernel module (based on examples from Stefan Wendler). Definitely many of these could be further hand-optimized, but I was mostly interested here in out-of-the-box performance you could get in the first try.

Here is a histogram of 500 measurements for the five implementations:

Histogram of response time measurements.

As expected, Arduino and the Raspberry Pi kernel module were both significantly faster and more consistent than the two Raspberry Pi user space implementations. Somewhat shocking though, the interpreted Python program was considerably faster than my C program compiled into native code.

If you check the source, RPi.GPIO library maps the hardware registers directly into its process memory. This means that it does not need any syscalls for controlling the GPIO lines. On the other hand, my C implementation uses the kernel's sysfs interface. This is arguably a cleaner and safer way to do it, but it requires calls into the kernel to change GPIO states and these require expensive context switches. This difference is likely the reason why Python was faster.

Histogram of response time measurements (zoomed)

Here is the zoomed-in left part of the histogram. Raspberry Pi kernel module can be just as fast as the Arduino, but is less consistent. Not surprising, since the kernel has many other interrupts to service and not that impressive considering 60 times faster CPU clock.

Arduino itself is not that consistent out-of-the-box. While most interrupts are served in around 9 microseconds (so around 140 CPU cycles), occasionally they take as long as 15 microseconds. Probably Arduino library is to blame here since it uses the timer interrupt for delay functions. This interrupt seems to be always enabled, even when a delay function is not running, and hence competes with the GPIO interrupt I am using.

Also, this again shows that polling on Arduino can sometimes be faster than interrupts.

Effect of CPU load on response time.

Another interesting result was the effect of CPU load on Raspberry Pi response times. Somewhat counter intuitively, response times are smaller on average when there is some other process consuming CPU cycles. This happens even with the kernel module, which makes me think it has something to do with power saving features. Perhaps this is due to CPU frequency scaling or maybe the kernel puts an idle CPU into some sleep mode from which it takes longer to wake up.

In conclusion, I was a bit impressed how well Python scores on this test. While it's an order of magnitude slower than Arduino, 200 microseconds on average is not bad. Of course, there's no hard upper limit on that. In my test, some responses took two times as much and things really start falling apart if you increase the interrupt load (like for instance, with a process that does something with the SD card or network adapter). Some of the results on Raspberry Pi were quite surprising and they show once again that intuition can be pretty wrong when it comes to software performance.

I will likely be looking into more details regarding some of these results. If you would like to reproduce my measurements, I've put source code, raw data and a notebook with analysis on GitHub.

Posted by Tomaž | Categories: Digital | Comments »

Another hard drive failure

07.02.2015 21:41

Earlier today one of my hard drives died. It was a fairly old 750 GB "Caviar GP" drive from a Western Digital "My Book" external enclosure. All it does now is emit an impressively loud metallic clicking noise.

I should have seen this coming, of course. At this point I have a pile of failed drives stashed in a box somewhere. I remember that this particular one has been unusually slow to start and mount for the last couple of times I used it. Also, smartd has previously reported "2 Currently unreadable (pending) sectors". Both of which I ignored, because I assumed this was yet another problem with the power supply. I had a "My Book" 12V external power supply fail before with similar symptoms.

I only used this drive for backups recently, so except for some archival copies of machines I no longer own, probably nothing of value was lost. Having at least a listing of contents before it failed would be nice though.

Disassembled Western Digital "My Book" external drive.

Of course, I opened it up to see if there's anything obvious wrong with it. The "My Book" USB interface board and the power supply are not the cause, because the drive has the same problem even when it is connected directly to a SATA port. I can hear the platters spinning and the clicking noise can only be caused by heads trashing around, so those are not stuck either.

Corrosion of surface finish on the controller PCB.

The only thing that immediately looks wrong is the unusual amount of corrosion on the hard drive controller PCB. It's bad enough that one some exposed test points both the immersion gold and the copper layer are completely gone. I'm not quite sure what could have caused that. As far as I can remember, this drive was sitting somewhere around my desk for the whole time, so it hasn't been exposed to any hostile environments. It might be a manufacturing defect of some sort - maybe the board was not rinsed well enough after processing.

Bottom side of the hard drive controller PCB.

I cleaned the pads where the motor and the head connect to the circuit board, but that didn't make any difference.

The copper below the green solder mask looks fine though. The bottom side of the PCB contains one large BGA chip. Maybe that one developed some bad connections, if the problem is indeed in the controller board. Just as an experiment, I also tried the disk-in-the-freezer trick, but that did not make the disk behave any differently.

Posted by Tomaž | Categories: Digital | Comments »

CubieTruck UDMA CRC errors

18.10.2014 20:07

Last year I bought a CubieTruck, a small, low-powered ARM computer, to host this web site and a few other things. Combined with a Samsung 840 EVO SSD on the SATA bus, it proved to be a relatively decent replacement for my aging Intel box.

One thing that has been bothering me right from the start though is that every once in a while, there were problems with the SATA bus. Occasionally, isolated error messages like these appeared in the kernel log:

kernel: ata1.00: exception Emask 0x10 SAct 0x2000000 SErr 0x400100 action 0x6 frozen
kernel: ata1.00: irq_stat 0x08000000, interface fatal error
kernel: ata1: SError: { UnrecovData Handshk }
kernel: ata1.00: failed command: WRITE FPDMA QUEUED
kernel: ata1.00: cmd 61/18:c8:68:0e:49/00:00:02:00:00/40 tag 25 ncq 12288 out
kernel:          res 40/00:c8:68:0e:49/00:00:02:00:00/40 Emask 0x10 (ATA bus error)
kernel: ata1.00: status: { DRDY }
kernel: ata1: hard resetting link
kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
kernel: ata1.00: supports DRM functions and may not be fully accessible
kernel: ata1.00: supports DRM functions and may not be fully accessible
kernel: ata1.00: configured for UDMA/133
kernel: ata1: EH complete

At the same time, the SSD reported increased UDMA CRC error count through the SMART interface:

UDMA CRC weekly error count on CubieTruck.

These errors were mostly benign. Apart from the cruft in the log files they did not appear to have any adverse effects. Only once or twice in the last 10 months or so did they cause the kernel to remount filesystems on the SSD as read-only, which required some manual intervention to get the CubieTruck back on-line.

I've seen some forum discussions that suggested this might be caused by a bad power supply. However, checking the power lines with an oscilloscope did not show anything suspicious. On the other hand, I did notice during this test that the errors seemed to occur when I was touching the SATA cable. This made me think that the cable or the connectors on it might be the culprit - something that was also suggested in the forums.

Originally, CubieTruck comes with a custom SATA cable that combines both power and data lines for the hard drive and has special connectors (at least considering what you usually see in the context of the SATA cabling) on the motherboard side.

Last few weeks it appeared that the errors were getting increasingly more common, so I decided to try replacing the cable. Instead of ordering a new CubieTruck SSD kit I improvised a bit: I didn't have proper connectors for CubieTruck's power lines at hand, so I just soldered the cables directly to the motherboard. On the SSD drive I used the standard 15-pin SATA power connector.

For the data connection, I used an ordinary SATA data cable. The shortest one I could find was about three times as long as necessary, so it looks a bit uglier now. The connector on the motherboard side also needed some work with a scalpel to fit into CubieTruck's socket. The original connector on the cable that came with CubieTruck is thinner than those on standard SATA cables I tried.

Replacement SATA cables for CubieTruck.

So far it seems this fixed the CRC errors. In the past few days since I replaced the cable I haven't seen any new errors pop up, but I guess it will take a month or so to be sure.

Posted by Tomaž | Categories: Digital | Comments »

GA 7VT600 lmsensors settings

02.05.2014 17:35

Recently I've put into use a relatively ancient Gigabyte GA 7VT600 1394 motherboard that's been gathering dust on the top shelf of my wardrobe. I used it to replace an even older MSI board which, while still working perfectly, was getting a bit slow.

After replacing the dead lithium battery for RTC and NVRAM, it seems to work just fine with stock Debian Wheezy and passes a few ad-hoc stress tests.

One thing I noticed though is that sensors tool from the lm-sensors package isn't very useful by default.

it87-isa-0290
Adapter: ISA adapter
in0:          +1.70 V  (min =  +0.00 V, max =  +4.08 V)
in1:          +1.33 V  (min =  +0.00 V, max =  +4.08 V)
in2:          +3.25 V  (min =  +0.00 V, max =  +4.08 V)
in3:          +2.86 V  (min =  +0.00 V, max =  +4.08 V)
in4:          +3.23 V  (min =  +0.00 V, max =  +4.08 V)
in5:          +1.89 V  (min =  +0.00 V, max =  +4.08 V)
in6:          +1.89 V  (min =  +0.00 V, max =  +4.08 V)
in7:          +3.01 V  (min =  +0.00 V, max =  +4.08 V)
Vbat:         +0.00 V  
fan1:        3308 RPM  (min =    0 RPM, div = 8)
fan2:           0 RPM  (min =    0 RPM, div = 8)
temp1:        +35.0°C  (low  = +127.0°C, high = +127.0°C)  sensor = thermistor
temp2:        +31.0°C  (low  = +127.0°C, high = +127.0°C)  sensor = thermistor
temp3:        +47.0°C  (low  = +127.0°C, high = +127.0°C)  sensor = thermal diode
intrusion0:  OK

The board obviously has a IT87 series chip that provides some hardware monitoring functionality (you need the it87 kernel module). Apart from the lack of useful labels, some voltages also seem to be divided by voltage dividers before being measured by it87. I would expect at least 5 V and 12 V lines there.

Figuring out which fan is which was trivial. For finding out other things, I compared the printout above with what the BIOS setup utility says. I picked out the most logical divider values for voltages. Since these also seem to fit the order of sensors, I'm relatively confident they are correct.

PC Health Status screen on GA 7VT600 motherboard.

in5 and in6 readings are very unstable and don't seem to be shown in the BIOS screen. I'm guessing they are not connected on this board. temp2 is also not shown, but seems to give reasonable values, so I'm guessing there is a temperature sensor connected there, but I don't know where it is.

So, for future reference, put this into /etc/sensors.d/ga-7vt600 to get a nicely labeled and properly calculated values for this hardware:

chip "it87-isa-0290"
    label temp1 "Sys Temp"
    label temp2 "Aux Temp"
    label temp3 "CPU Temp"

    label fan1 "CPU Fan"
    label fan2 "Sys Fan"

    label in0 "Vcore"
    label in1 "DDR Vtt"
    label in2 "+3.3V"
    label in3 "+5V"
    label in4 "+12V"
    label in7 "5VSB"

    compute in3 @*1.679, @/1.679
    compute in4 @*3.973, @/3.973
    compute in7 @*1.679, @/1.679
Posted by Tomaž | Categories: Digital | Comments »

CubieTruck Perl performance

23.01.2014 22:57

Two months ago I bought a CubieTruck, one of the many cheap, bare-bone ARM-based computers that keep popping-up everywhere these days. My idea was to replace the aging x86 server that is running this website with something more power-efficient. So I was looking for a reasonably powerful board with a proper SATA interface and a decent amount of RAM. Raspberry Pi was out of the question, but the latest incarnation of CubieBoard with a dual-core 1 GHz ARM Cortex-A7, 2 GB of RAM, SATA 2.0 and Gigabit Ethernet seemed to fit the bill.

Unfortunately I could not find any reliable benchmarks I could use to estimate how ARM SoCs perform in comparison with my existing setup. So before I decided to migrate I took a while to do some performance tests and get to know this hardware.

CubieTruck

The software setup I'm interested in benchmarking is somewhat archaic in these days of Node.js and NoSQL. I'm using Perl 5 with HTML::Template doing most of the heavy lifting (at least according to Devel::NYTProf profiler). Most parts are statically generated and some are dynamic using a handful of SpeedyCGI Perl 5 scripts. These are combined into a consistent website you see here with a somewhat convoluted Apache configuration using the threaded worker.

In the following benchmarks I'm comparing:

  • An AMD Duron at 700 MHz, 1.2 GB RAM running stock x86 Debian Squeeze. Root filesystem is mounted from an IDE hard drive.
  • A CubieTruck A20 running armhf Debian Wheezy and the kernel supplied for the CubieTruck Ubuntu Server installation. Root filesystem is mounted from an SD card.

Both machines were connected through a 100 Mb/s Ethernet switch to a laptop which was running the remote end of the benchmarks.


First, to see how fast the static part of the web site is generated, I ran the full (single threaded) HTML rebuild. I measured the required user space CPU time with the time utility. This is the fastest run of three on each machine:

AMD Duron CubieTruck
CPU time to rebuild static pages 45.3 s 61.8 s

Then, to check if network was operating at the bit rate I thought it was, I ran iperf to measure TCP throughput between the server and the laptop:

AMD Duron CubieTruck
iperf throughput test 94.0 Mb/s 94.5 Mb/s

Finally, I ran a suite of tests using the Apache benchmarking tool. I measured how many requests per minute a server can handle for different types of content and different number of concurrent requests. Numbers in parentheses show size of HTTP body (without headers).

CubieTruck requests per second for a static HTML page.

CubieTruck requests per second for a dynamic HTML page.

CubieTruck requests per second for an image.

CubieTruck requests per second for API call.

The site rebuild is somewhat disappointingly almost one-third slower than on a 10 year old PC. However the single threaded Apache performance is on par with it. In the case of more concurrent users the CubieTruck of course has an advantage because of an additional CPU core. Actually in both cases with static content CubieTruck managed to saturate the line when there was more than one concurrent request.

I tried to make these tests in a way that the slow SD card in the CubieTruck would minimally affect their outcome. All of data should fit into the buffer cache, which is why in the first test I only took into account the fastest run and only user space CPU time. However I now suspect that the SD card still affected the numbers somehow (the rebuild operation is the heaviest of the tests regarding filesystem I/O). I don't know for sure how kernel computes the time returned by the time utility.

These results are good enough that I can't dismiss CubieTruck based on performance. If a proper SATA drive wouldn't speed it up, I could probably parallelize the build process with not much work. That should cut down on time if it's really Perl performance on ARM that is slowing it down. On the other hand I'm having some other concerns about using CubieTruck as a personal server so I'm not completely decided yet about putting it on my rack.

Posted by Tomaž | Categories: Digital | Comments »

Repairing the Happy Hacking Keyboard

29.09.2013 15:40

My trusty old Happy Hacking Keyboard has been working pretty reliably for the last four years. After fixing a botched plastic mold and strategically placing a piece of cardboard in its innards that is. Regarding the typing feel it is still my favorite keyboard that doesn't take a lot of space on a crowded desk and I only switch to a regular-sized Logitech when I'm working with EDA programs where I need functions keys a lot.

So I was pretty disappointed when it stopped working a week ago. Checking the kernel log revealed all sorts of random USB bus errors:

usb 6-1.1: USB disconnect, device number 14
usb 6-1: reset full-speed USB device number 13 using uhci_hcd
usb 6-1: device not accepting address 13, error -71
usb 6-1: reset full-speed USB device number 13 using uhci_hcd
usb 6-1: device firmware changed
hub 6-1:1.0: hub_port_status failed (err = -19)
hub 6-1:1.0: hub_port_status failed (err = -19)
hub 6-1:1.0: hub_port_status failed (err = -19)
hub 6-1:1.0: activate --> -19
usb 6-1: USB disconnect, device number 13
usb 6-1: new full-speed USB device number 15 using uhci_hcd
usb 6-1: string descriptor 0 read error: -71
usb 6-1: New USB device found, idVendor=04fe, idProduct=0008
usb 6-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
usb 6-1: can't set config #1, error -71
hub 6-0:1.0: port 1 disabled by hub (EMI?), re-enabling...

This looked like something systemic. Either the controller was resetting continuously or there was something wrong with the USB wiring between the controller and the computer.

A new, identical HHKB model goes for more than $110 today so I opened it up to see if there's anything I can do. After checking the cable with an ohm-meter my suspicion fell on the power supply which seems to consist of a 3.3 V LDO regulator and some capacitors. I could see no obvious transients on the power rails when the controller switched on after the keyboard was plugged in. One interesting thing I did see was that if negotiation with USB host fails the controller switches itself off completely, including its quartz oscillator.

Happy Hacking Keyboard Lite2 USB controller

Since I had some problems with flaky USB cables before, I removed the original cable and soldered a new cable directly to the circuit board. This fixed the problem! After poking around some more, it turned out that after re-soldering the connector to the PCB the original cable worked as well.

I removed the membrane before poking around the controller board since a hot soldering iron and plastics don't mix well. Re-inserting the soft matrix tails into the (non-ZIF) connectors was somewhat tricky. I resorted to using pliers plus a bit of paper to protect the delicate wires.

Re-inserting flexible cables into connectors using pliers

I also noticed that silver wires on the keyboard matrix itself seem to be developing a kind of a dark oxide on the outer edges. I don't remember for sure whether they looked like this from the start though. If something is eating away at the wires that definitely puts a kind of a definitive limit to this keyboard's longevity.

Possible oxidation on the keyboard matrix

In conclusion, just checking with a multimeter doesn't mean there's not a bad solder joint somewhere on a high-speed bus. The wisdom of checking first for bad RoHS soldering on mechanically (and thermally) stressed components confirmed itself again. Also, did you know there are a couple of alternative, open source Happy Hacking Keyboard controllers out there?

Posted by Tomaž | Categories: Digital | Comments »

Some notes about CC chips

20.05.2013 12:31

Here are two unusual things I noticed while working with Texas Instruments (used to be Chipcon) CC2500 and CC1101 integrated transceivers.

It appears that the actual bit rate in continuous synchronous serial mode can differ significantly from what Texas Instrument's SmartRF Studio 7 calculates. See for example the following measurements that were taken on a CC2500 chip using a 27 MHz crystal oscillator as a reference clock.

MDMCFG4 valueRF studio [baud]measured [baud]
0x8a50.038.5
0x8b100.077.2
0x8c200.0154.0
0x8d400.0305.0

These bit rates were measured using an oscilloscope attached to the clock output of the transceiver, so I trust them to be correct. Bit rates I measured on a CC1101 agree with what SmartRF Studio predicts.

Update: I revisited this issue and the problem was a bug in my code that caused MDMCFG3 register (which also affects data rate) not to be properly programmed on CC2500. Accounting for this bug, the data rates are within 1% of those calculated by SmartRF Studio 7 or from the formula given in the datasheet.

The other issue I saw is symbol mapping for 4FSK modulation in CC1101. It looks like it depends on the configured bit rate. For example, with 200 baud, the symbol to frequency mapping appears to be as follows:

symbolΔf
00−0.33 fdev
01−1.00 fdev
10+0.33 fdev
11+1.00 fdev

However, with 45 baud, the mapping is different, with symbol bit order apparently switched around:

symbolΔf
00−0.33 fdev
01+0.33 fdev
10−1.00 fdev
11+1.00 fdev

Update: It's possible this difference has something to do with when exactly the radio samples the data line in relation to the clock. Either I don't understand exactly what is going on or the radio isn't sampling the data when it is supposed to. Also, the factors of fdev in tables were wrong (symbol frequencies are equally spaced, with maximum deviation from central frequency equal fdev).

Of course, this doesn't matter if you are using two identically configured CC1101 chips on both ends of the radio link. But it is important if you want to use it to communicate with some other hardware.

Posted by Tomaž | Categories: Digital | Comments »

VESNA and signal synthesis

26.04.2013 20:49

Experimental radio equipment on VESNA can be used for more than spectrum sensing. Radio boards based on Texas Instruments CC2500 and CC1101 transceivers can also be used for packet transmission and reception and, perhaps somewhat surprisingly, also as flexible signal generators. These can be used in experiments when you want for instance to introduce a controlled interference in some system or to check if your spectrum sensor is working correctly.

Usually when I'm talking with people about what our hardware is capable of the conversation often starts with amazement at how small the radio part is and then inevitably turns to the question whether VESNA has a software defined radio. I have to answer no, it doesn't, and then there's usually an awkward silence because that seems like a dead-end for any serious research work these days.

CC1101 transceiver on SNE-ISMTV-868

True, these transceivers don't provide software access to the undemodulated baseband samples (like for instance USRP does). However they are still amazingly flexible. They offer plenty of reconfigurability and, after you get to know some of their quirks, can be adapted for a lot of weird use cases without hardware changes. If you skip the various high-level digital parts of the chip for proprietary packet format handling there's a flexible front-end in there that allows you to choose between a handful of frequency, amplitude and phase modulators and several options for channel filters and such.

The closest you can come to software defined radio is a transparent continuous transmit mode where you can feed the transceiver arbitrary binary data from the microcontroller and it will simply get modulated and fed into the antenna. There is a catch though, since the chips offer at most 2-bit quantization of the baseband signal before modulation (they were designed for simple digital transmissions after all). This means that you have to get creative if you want to approximate an analog modulation and be ready for plenty of quantization noise.

This is how it looks like on a spectrum analyzer when you run a direct digital synthesis algorithm on the microcontroller that generates a baseband sawtooth frequency sweep and use the amplitude modulator to up-convert it to 2.4 GHz:

RF signal synthesis using VESNA

(Click to watch RF signal synthesis using VESNA video)

Using delta-sigma modulation you can approximate arbitrary waveforms in this way. For instance, you can make a passable simulation of an (analog) wireless microphone transmission using a 4-FSK modulator in CC1101 tuned into the UHF band.

Of course, setting this up takes more work than popping a few blocks into GNU Radio Companion and part of my job is to make it more accessible to people using VESNA and our VESNA-based testbeds. If you're interested in such signal generation using Texas Instruments CC series, some platform-independent code capable of doing this should start hitting the vesna-spectrum-sensor repository on GitHub in the next few weeks.

Posted by Tomaž | Categories: Digital | Comments »

The lowly connector

11.04.2013 20:47

Here's a small addendum to my previous list of lessons regarding easy debugability of microcontroller boards.

It's not a bad idea to pay a bit of extra attention to the connectors that will be used when developing and debugging software. It's likely these will see much more use than any other connector on the board, especially if the board is also meant for teaching or research like VESNA.

The first concern is that it should be hard to connect it in a wrong way. IDC and similar pin header connectors are popular for JTAG. Use a male part that has a shroud so that it's impossible to connect it when displaced by a pin or turned 180 degrees. That might sound obvious, but when you're debugging that tricky race condition and switching the debugger between three systems on your table late in the evening, the last thing you need is a burned out board because of a misplaced connector.

The other thing that is also worth considering is the life time of the connector itself. While the life time of the part on the board might not be problematic, the part that stays with the developer can be. For VESNA one of the most common reasons to make a trip to the soldering station is a torn wire in the connector for the serial debug console. We use something similar to the Berg connector and that one doesn't really take many connect-disconnect cycles before either wires get torn out or little springs break and the metal part falls out of the plastic housing. It's not always obvious that has happened and again it's a pain to realize after a long debugging session that the reason a board is talking garbage is not due to a bug you're trying to catch but rather due to a broken ground line in your debug console.

Posted by Tomaž | Categories: Digital | Comments »

About Digi Connect ME

03.10.2012 23:32

Remember my recent story about Atmel modules? For the last two days I had a very strong déjà-vu feeling when I was trying to debug an issue that popped up in the last minute and is preventing a somewhat time critical experiment from being performed on one of our VESNA testbeds.

It turned out that while 7-bit ASCII strings were correctly transferred between VESNAs and a client calling a HTTP API, binary data sent down the same pipeline got corrupted somewhere on the way. Plus, to make things just a bit more interesting, it sometimes also made the whole network of VESNAs unreachable.

VESNA coordinator with Digi Connect ME module.

Now, problems like this are unfortunately quite a common occurrence. Before data from a sensor node reaches a client, it must pass through multiple hops in a (notoriously unreliable) ZigBee mesh network, a coordinator VESNA that serves as a proxy between the mesh and a TCP/IP tunnel, a Java application on the end of the tunnel that translates between a home-grown HTTP-like protocol used between VESNAs and proper HTTP and finally an Apache server acting as a reverse-proxy for the public HTTP API. Leaky abstractions and encapsulations are a plenty and it's not unusual to have some code somewhere along the line assume that the data it is passing is ASCII-only, valid UTF-8 or something else entirely.

I won't bore you with too many details about debugging. I opted for a top-down approach, with the Java layer being the main suspect based on previous experience. After that got a thorough check, I moved to sniffing the tunnel with Wireshark, which has an extra complication that it's SSL encrypted. Luckily, it doesn't seem to use Diffie–Hellman key exchange, meaning that the SSL dissector was quite effective once I figured out how to extract private keys from Java's keystore. That also seemed to look OK, so the next layer down was the SSL tunnel endpoint, which is a Digi Connect ME module.

This is basically a black box that takes a duplex RS-232 connection on one end and tunnels it through an encrypted TCP/IP connection. It's an deceptively simple description though. In fact Digi Connect ME is a miniature computer with an integrated web server for configuration, scripting support and a ton of other features (I have close to 500 pages of documentation for it on my computer and I only downloaded documents I though might be useful in debugging this issue).

VESNA coordinator hooked up to a logic analyzer.

Anyway, when I looked closely, the problem was quite apparent. On the RS-232 side the module was set to use XON/XOFF software flow-control. This obviously won't work when sending arbitrary binary data. Not only will the module interpret XON and XOFF bytes as special and so drop them from the pipeline, an XOFF that is not followed by XON will also halt the transmission, leading to hangs and timeouts. The fix looked simple enough: switch to hardware flow control using dedicated CTS and RTS lines.

As you might have guessed, it was not that easy. It turns out that when hardware flow control is enabled in Digi Connect ME, it will randomly duplicate characters when sending them down the serial line. Again, the suspicion first fell on our homegrown UART driver on the VESNA side, but a logic analyzer trace below confirmed that it's the actual Digi Connect module that is the culprit and not some bug on our side.

Logic analyzer trace from a Digi Connect ME module.

Now, at this point I'm seriously confused. Digi Connect ME is quite popular and, at least judging from browsing the web, used in a lot of applications. But after the ZigBit module it's also the second piece of hardware that exhibits such broken behavior that I can't believe it has passed any thorough quality check. All of my experience speaks that it must be us that are doing something wrong and in both cases I have tested just about all possibilities where VESNA could be doing something wrong. Actually, I want it to be a mistake on our end because that means I can fix it. But honestly, once you see wrong data being sent on a logic analyzer, I don't think there can be any more doubt. Must really everything turn rotten once you look into it with enough detail?

Posted by Tomaž | Categories: Digital | Comments »

IguanaWorks USB IR transceiver

19.08.2012 22:16

I bookmarked this little gadget a while ago. Having recently solved my problems with scriptable switching of PulseAudio audio outputs I thought it's time to finally order it and try to automate a few other home-theater related operations through it. Over the summer a few other infrared-communication related things also piled up on my desk, so having an universal IR transmitter and receiver within reach seemed like a good idea.

IguanaWorks USB IR transceiver

This is an IguanaWorks USB IR transceiver, the hybrid version. Hybrid meaning it has both an integrated IR LED and detector pair and a 3.5 mm jack for an external transmitter.

From the software side it comes with quite an elaborate framework, free and open source of course. Software also comes in the form of Debian binary and source packages, which is a nice plus. I did have a small problem compiling them though since the build process seems to depend on the iguanair user being present on the system. This user only gets created during installation which makes it kind of a catch-22 situation. Once compiled the packages did work fine on my Debian Squeeze system.

After everything is installed, you get:

  • igdaemon, a daemon that communicates with the actual USB dongle,
  • igclient, a client that exposes daemon functionality through a command-line interface,
  • a patched version of lircd daemon that includes a driver that offloads communication to igdaemon.

lirc is the usual Linux framework for dealing with infrared remotes. It knows how to inject keypresses into the Linux input system when an IR command is received and comes with utilities that can send commands back through the IR transmitter to other devices. This is the first time I'm dealing with it and I'm still a bit confused how it all fits together, but right now it appears some parts of the lirc ecosystem don't currently work with iguanair at all. For instance, xmode2 utility that shows received IR signals in an oscilloscope-like display isn't supported.

As I'm currently mostly interested in using this from my scripts, using igclient directly seems to be simplest option. There are also Python bindings for the client library, but they appear undocumented and I haven't yet took a dive into the source code to figure it out.

The client reports the received signals in the form of space-pulse durations, like this:

$ igclient --receiver-on --sleep 10
received 1 signal(s):
  space: 95573
received 3 signal(s):
  space: 7616
  pulse: 64
  space: 65536

I'm not yet sure what the units for those numbers are. According to the documentation the transmit functionality expects a similarly formatted input, but I have yet to try it out. It seems that if I want to plot the signals on a time line I will have to write my own utility for that.

To be honest I expected using this to be simpler from the computer side. In the end it basically has the same functionality as my 433 MHz receiver. One thing I also overlooked is that it's only capable of transmitting modulated on-off keyed transmissions (25 - 125 kHz carrier), which makes it useless for devices that don't use that, like shutter glasses. But given that I did basically zero research before ordering it I can't really blame anyone else but me for that (and that bookmark must have been at least a year old). Just yesterday I also stumbled on IR toy which appears to be a similar device. It would be interesting to know how it compares with the IguanaWorks one.

Posted by Tomaž | Categories: Digital | Comments »

On Atmel SerialNet ZigBit modules

13.08.2012 22:27

Don't use Atmel BitCloud/SerialNet ZigBit modules.

With this important public service announcement out of the way, let me start at the beginning.

Atmel makes ZigBit modules that contain an IEEE 802.15.4-compatible integrated radio from their AT86RF2xx family and an AVR-based microcontroller on a small hybrid component. The CPU runs a proprietary mesh-networking stack (BitCloud) built on top of the ZigBee specification and exposes a high-level interface on a serial line they call SerialNet (think "send the following data to this network address"-style interface). The module can be used either as a very simple way of adding mesh networking to some host device or as a stand-alone microcontroller with a built-in radio (Atmel provides a proprietary BitCloud SDK, so you can build your own firmware for the AVR).

Atmel ZigBit module on a VESNA SNR-MOD board.

At SensorLab we built a sensor node radio board for VESNA using these modules (more specifically, ATZB 900 B0 for 868 MHz and ATZB 24 B0 for 2.4 GHz links) as they appeared to be simple to use and would provide a temporary solution for connecting VESNAs with a wireless mesh until we come up with a working and reliable 6LoWPAN implementation. So far we have deployed well over 50 of these in different VESNA installations.

I can now say that these modules have been nothing but trouble from the start. First there is the issue of documentation. Atmel's documentation has always been superb in my memory. Compare one of their ATmega datasheets with the vague hand-waving STMicroelectronics calls microcontroller documentation and you'll know why. Unfortunately, the SerialNet user guide is an exception to this rule. They leave many corner cases undefined and you are left to your own experimentation to find out how the module behaves. There is almost no timing information. How long can you expect to wait for a response to a command? How long will the module be unresponsive and ignore commands after I change this setting? Even the hardware reset procedure is not described anywhere beyond a "Reset input (active low)".

The problems with this product however go deeper than this. In my experience developers, my self included, tend to be too quick to blame problems on bugs in someone else's code. When colleagues complained how buggy these modules are I said that it's much more likely a problem in our code or hardware design. That is until I started investigating myself the numerous problems we had with networking: the modules would return responses they shouldn't have according to the specification, they would say that they are connected to the network even though no other network node could communicate with them. Modules would even occasionally persistently corrupt themselves, requiring firmware reprogramming before they would start responding to commands again. Believe me, it's annoying to reach for a JTAG connector when the module in question is on a lamp post in some other part of the country.

For most of these bugs I can only offer anecdotal evidence. However I have been investigating one important issue for around two months now and I'm confident that there is something seriously wrong with these modules. I strongly suspect there is a race condition somewhere in Atmel's (proprietary and closed-source, of course) code that causes some kind of buffer corruption when a packet is received over the radio at the same time as the module receives a command over the serial line. This will cause the module to lose bytes on the serial line, making it impossible to reliably decode the communications protocol.

For instance, this is how the communications should look like over the serial line. Host in this case is VESNA and module is Atmel ATZB 900 B0:

→ AT+WNWK\x0d                                # host asks for network status
← DATA 0000,0,77:(77 bytes of data)\x0d\x0a  # module asynchronously reports received data
← OK\x0d\x0a                                 # module answers that network is OK
← DATA 0000,0,77:(77 bytes of data)\x0d\x0a  # module asynchronously reports received data

This is how it sometimes looks like:

→ AT+WNWK\x0d
← DATA 0000,0,77:(77 bytes of data)\x0d\x0a
← OK\x0d                                     # note missing \x0a
← DATA 0000,0,77:(77 bytes of data)\x0d\x0a

And sometimes it gets as bad as this:

→ AT+WNWK\x0d
← DATA 0000,0,77:(77 bytes of data)\x0d\x0a
← ODATA 0000,0,77:(77 bytes of data)\x0d\x0a # note only O from OK sent

An inviting explanation for these problems would be that we have a bad implementation of an UART on VESNA. Except that this happens even when the module is connected to a computer via a serial-to-USB converter and I have traces from a big and expensive Tektronix logic analyzer (as well as Sigrok) to prove that corrupted data is indeed present on the hardware serial line and not an artifact of some bug on the host side:

Missing Line Feed character from an Atmel ZigBit module.

A logic analyzer trace demonstrating a missing line feed character. Click to enlarge.

Data corruption on the serial line from an Atmel ZigBit module.

A logic analyzer trace demonstrating a jumbled-up OK and DATA response. Click to enlarge.

I have seen this happen in the lab under controlled conditions on 10 different modules and have good reasons to suspect the same thing is happening on the deployed 50-plus modules. Also, this bug is present in both BitCloud 1.14 and 1.13 and in both vanilla and security-enabled builds. All of this points to the fact that this problem is not due to some isolated fluke on our side.

For well over a month I have been on the line with Atmel technical support and while they had politely answered all of my mail they had also failed to acknowledge the issue or provide any helpful information even though I sent them a simple test case that reliably reproduces the problem in a few seconds. Of course, without their help there is exactly zero chance of getting to the bottom of this and given all of the above I seriously doubt this is anything else than a bug in their firmware.

At this point I have mostly given up any hopes that this issue will be resolved. During my investigation I did find out that decreasing the amount of chatter on the serial line decreases the probability of errors, so I did manage to work around this bug a bit by switching to non-verbose responses (ATV0) and using packets that are a few bytes shorter than the maximum (say 75 bytes for encrypted frames). This will hopefully improve the reliability of already deployed hardware. For the future, we will be looking into alternatives, as unfortunately 6LoWPAN still seems to be somewhat outside of our grip.

Posted by Tomaž | Categories: Digital | Comments »