Closer look at the original Galaksija

09.05.2017 20:34

A few weeks ago I met with Mr. Vojislav Ivetić in Maribor. He entrusted me with an old Galaksija computer circuit board. Several years ago he obtained it from Janez Stergar at the Faculty of Electrical Engineering and Computer Science, University of Maribor. He told me that the historical computer was in an unknown condition, very likely not working, and was interested in restoring it back to usable state. This post is the result of my visual inspection of the circuit to estimate the extent of the restoration that would be necessary.

Galaksija is a small home microcomputer that was designed in Belgrade by Voja Antonić around the Z80 microprocessor. The designs were openly published in a magazine in 1984 with the intention that readers would build their own computers from scratch. Do-it-yourself kits could be ordered by mail and eventually also complete, factory made computers. Galaksija was often easier to obtain than similar foreign computers due to heavy import restrictions in the former Yugoslavia. It is generally considered the most successful of several attempts at a domestic home microcomputer.

At the first glance, Mr. Ivetić' Galaksija appears to be built from one of the kits. It has a white mechanical keyboard and a factory made single-layer printed circuit board with the green solder mask and white silk screen print on top. The integrated circuits and other components were most likely gathered from various sources and soldered manually (not all are in sockets). All original Galaksija computers I've seen looked very similar to this. Some had black keyboards, but they all shared the same PCB design.

Galaksija circuit board from Mr. Ivetić.

The circuit board has the basic Galaksija configuration. Only the 4 kB ROM A is installed. This ROM contains the BASIC interpreter, video driver and the rest of Galaksija's minimalistic operating system (here marked Master EPROM). The ROM B socket is empty.

The quartz windows on UV-erasable EPROMs are only covered with a white paper sticker. If the board was stored for a long time exposed to light, it might be that the EPROMs have lost their charge due to ambient UV light and will have to reprogrammed.

Iskra EMS6116 static RAM on a Galaksija computer.

There is a single 2 kB static RAM chip installed. Interestingly, the logo suggests this is an Iskra EMS6116, a domestic integrated circuit. I was not aware that RAM was produced by Iskra. In fact, the original magazine article that gives instructions for Galaksija builders suggests ordering RAM and other chips by mail from abroad (with suggested distributors that will ship to Yugoslavia and tips on getting the shipments through customs). Sockets for additional two 2 kB RAM chips are empty.

All other chips are foreign made. The Z80 CPU and EPROMs are all from SGS (former Italian semiconductor company, later merged into STMicroelectronics). These also have the most recent date codes among the identifiable components on the board: first week of 1986. Original Galaksija design was published in January 1984, so this board was built at least 2 years later. Other logic chips I could identify are from TI and SGS. The oldest chip is the 74LS38 from 1979.

Improvised circuit on shift/load line.

There is a small bundle of components wrapped in sticky tape hanging off the PCB on four wires. It looks like it contains an IC in a DIP package and some capacitors. The circuit sits in front of the shift/load input to the 74LS166 shift register that generates the video signal. It's also connected to the ground and the power supply. Since the extra circuit is not connected to any other digital lines, I'm guessing it is most likely a delay to fix some timing problem.

Location of the improvised circuit on the schematic.

Normally, the shift/load input is driven directly by a circuit that detects when the CPU is in the M1 (opcode fetch) cycle. See full schematic here. I know from my previous research that M1 detection circuit on the original Galaksija is unreliable, since it depends on signal timings that are not guaranteed by the design of the Z80 CPU. It's possible that this was an attempt to work around this issue.

Two potentiometers for setting sync pulse lengths.

There is no RF modulator installed. The circuit has been modified so that composite video signal is directly present on a pair of improvised screw terminals. I'm guessing this Galaksija was used with a monitor or a TV with composite input. Those were quite rare at the time, but it was not uncommon for people to modify their TV sets to add a composite input.

Two potentiometers are wired in series with R12 and R13. They have been glued down, but are now hanging loose on wires. Potentiometers seem to have been installed to adjust horizontal and vertical sync pulse widths. They are not part of the original design. They affect the time constants of 74LS123 monostable multivibrators that generate synchronization impulses in the composite video signal.

Missing space key on the Galaksija.

The space keycap is missing, but the key itself is present. I guess even if a suitable replacement can't be found, one could be drawn in a CAD program and 3D-printed.

Example of a lifted track on Galaksija PCB.

A look at the bottom side reveals that the condition of the copper laminate is quite bad. Many tracks and annular rings have broken or lifted off the substrate. The PCB shows signs of old repairs to some of the damaged tracks, so at least part of this damage is not due to age. Maybe soldering was done at a too high temperature or the quality of the laminate was not particularly good. This Galaksija shows no signs that it was ever mounted in a case, so the damage might also be due to mechanical stress. Many tracks around EPROM sockets are broken, suggesting that the stress of inserting and removing the EPROMs was at least partially responsible.

Ruined annular rings under a transistor on Galaksija.

I've counted around 40 points on the PCB that would need repair. Some are hairline breaks in traces that seem easy to reliably bridge with solder. Other parts would require replacements of copper areas using foil and epoxy glue to bring them back to original condition. Fortunately this PCB has relatively large features compared to modern SMD boards. However, this extent of repair still seems like a lot of delicate work. I'm also not certain that other areas of the laminate that look fine now would not start failing during repair.

If all else fails, another possibility would be to have a whole replacement PCB made and re-solder the keyboard and other original components. This would obviously decrease the historical authenticity. While the scans of original PCB masks are available on the web, those are not precise enough to make a usable replacement board. They would need to be redrawn before they can be sent to a fab.

In conclusion, all basic components are there and look fairly well preserved. At the moment I have no reason to believe that any chips are bad. However the PCB should be repaired before attempting to power up this board. The extent of damage and the amount of fine work with the copper foil would make this repair quite time consuming. It would be nice to somehow check the state of the most critical chips before proceeding on that path. Fixing the PCB would be a big waste of time if the CPU or RAM chip will eventually turn out to be bad. On the other hand, replacements for 74LSxx series logic still seem to be relatively easy to come by.

Posted by Tomaž | Categories: Digital | Comments »

ESP8266 humidity monitoring

27.02.2017 20:11

Last year my flat developed a bit of a mold problem, or maybe I just found out about it then. It's possible the fungus already lived a long, fulfilling life before being discovered. It wouldn't be surprising for a building from an era when thermal- and hydro-isolation were pretty far down on the priority list. In any case, it made me want to monitor the relative air humidity and dew point levels a bit more closely. I had the apartment pretty well covered with sensors already, but the room with the mold in particular lacked a hygrometer.

Wireless humidity monitor based on the ESP8266 module.

Not to re-invent too much hot water, I more or less replicated the nicely documented temperature and humidity web server project from Adafruit. It was doubly appealing because I still had a full bag of old ESP8266 modules that I bought for pennies back when they were the exciting new thing. The only problem was the fact that the humidity sensors supported out-of-the-box by that project were only available from their shop, which is relatively expensive for small items with oversea shipping. Since I was in a hurry, I bought a few DHT11 modules anyway (which turned out to be a mistake).

There's not much to say about the hardware. It's the minimalistic Adafruit's circuit soldered on a perforated board. For the power supply I used a small 3.3V switch-mode converter module I had left over from another project. I was nicely surprised by how easy ESP8266 support was to install into the Arduino IDE. Another pleasant discovery was that ESP8266 with the Arduino-based firmware seems to consume much less power than with the stock AT-command firmware.

The Arduino IDE got updated since Adafruit's tutorial was written, so I had to experiment a bit with the firmware upload settings. Following values seemed to work with my particular modules. Another thing I discovered was that the RST line on ESP8266 has to be left floating for the firmware upload to work reliably. On my previous ESP8266 project I tied it to VCC.

Arduino firmware upload settings for ESP8266 modules.

Unfortunately, the DHT11 modules are pretty bad as far as accuracy is concerned. I only discovered Robert's wonderfully in-depth comparison of hygrometer modules after the fact. I played a bit with power supply filtering, but that doesn't seem to be the source of the noise in the data. I ended up modifying Adafruit's firmware so that it reads the sensor every 5 seconds and returns the average of last 8 readings. This alleviates somewhat the problem, but I definitely recommend using some other sensor to anyone wanting to build this.

For comparison, here is the daily humidity graph recorded using DHT11 with averaging:

DHT11 example daily humidity record.

And here is humidity recorded at the same time (albeit in a different room) by a TEMPerHUM USB dongle with no extra averaging applied:

TEMPerHUM example daily humidity record.

After several months of running, the Arduino-based ESP8266 turned out to be pretty reliable. I haven't seen any big outages in the log of sensor readings. This is a nice improvement over the stock firmware that I used in my Munin display, which still regularly gets lost to the point that it requires a power cycle.

Posted by Tomaž | Categories: Digital | Comments »

Raspberry Pi Compute Module eMMC benchmarks

03.07.2016 13:52

I have a Raspberry Pi Compute Module development kit on my desk at the moment. I'm doing some testing and prototyping because we're considering using it for a project at the Institute. The Compute Module is basically a small PCB with the Broadcom's BCM2835 system-on-chip, 4 GB of flash ROM on an eMMC connection and little else. Even providing power supply at a number of different voltages is left as an exercise for the user.

Raspberry Pi Compute Module

I was wondering how the eMMC flash performs compared to the SD card on the more common Pies. I couldn't find any good benchmarks on the web. Wikipedia says that the latest eMMC standard rivals SATA speeds, but there's not much info around on what kind the Compute Module uses. I've used Samsung's ARM Chromebook with eMMC flash a while ago and that felt pretty fast. On the other hand, watching package updates scroll by on the Compute Module gave me a feeling that it's quite sluggish.

To get some more objective benchmark, I decided to compare the I/O performance with my Raspberry Pi Zero. Zero uses the same BCM2835 SoC, so the results should be somewhat comparable. I used the SD card that originally came with Zero preloaded with the Noobs distribution. It only has the raspberry logo printed on it, so I don't know the exact model or manufacturer. Both Compute Module and Zero were running the latest Raspbian Jessie.

One surprising discovery during this benchmark was that CPU on Zero runs between 700 MHz and 1 GHz while the Compute Module will only run at 700 MHz. These are the ranges detected at boot by bcm2835-cpufreq and default /boot/config.txt that came with the Raspbian image (i.e. no special overclocking). Because of this I performed the benchmarks on Zero at 700 MHz and 1 GHz.

For comparison, I also ran the same benchmark on my Cubietruck that has an Allwinner A20 system-on-chip with SATA-connected Samsung EVO 840 SSD and runs vanilla Debian Jessie.

This is the benchmark script I used. For each run, I chose the fastest result out of 5:



while [ $I -lt $N ]; do
	hdparm -t $DEVICE

while [ $I -lt $N ]; do
	hdparm -T $DEVICE

while [ $I -lt $N ]; do
	dd if=/dev/zero of=tempfile bs=1M count=128 conv=fdatasync 2>&1

while [ $I -lt $N ]; do
	echo 3 > /proc/sys/vm/drop_caches
	dd if=tempfile of=/dev/null bs=1M count=128 2>&1

while [ $I -lt $N ]; do
	dd if=tempfile of=/dev/null bs=1M count=128 2>&1

Here is write performance, as measured by dd. I wonder if dd figures are affected by filesystem fragmentation since it writes an actual file that might not be contiguous. I've been using Zero for a while with this Raspbian image while the Compute Module has been freshly re-imaged. Fragmentation shouldn't be as significant as with spinning disks, but it probably still has some effect.

Comparison of write performance.

Read performance, as measured by hdparm as well as dd. To remove the effect of cache when measuring with dd, I explicitly dropped kernel block device caches before each run.

Comparison of read performance.

From this it seems Compute Module's eMMC flash is slightly faster than the SD card, both on read and writes when comparing to Zero running at the same CPU clock frequency. It's interesting that Zero's results change significantly with CPU frequency, which seems to suggest that some part of SD card I/O is CPU bound. That said, performance seems to be somewhere roughly on the same order of magnitude. Cubietruck is significantly faster than both. In light of this result, it's sad that never versions of Cubieboard (and cheap ARM SoCs in general) dropped the SATA interface.

Finally, I tested block device cache performance. This more or less shows only RAM and CPU performance and shouldn't depend on storage speed.

Comparison of cached read performance.

Interestingly, Zero seems to be somewhat faster than the Compute Module at 700 MHz here. /proc/cpuinfo shows a different revision, although it's not clear to me whether that marks board revision or SoC revision. It might be that processors in Zero and Compute Module are not identical pieces of silicon.

In the end, I should note that these results are not super accurate. Complexities of I/O benchmarking on Linux aside, there are several things that might have affected the results. I already mentioned different filesystem state. A different SD card in Zero might give very different results (I didn't have a second empty card at hand to try that). While Raspberry Pies were idle during these tests, Cubietruck was running my web server and various other little tidbits that tend to accumulate on such machines.

Posted by Tomaž | Categories: Digital | Comments »

Ultra-narrowband and BPSK on TI CC chips

15.06.2016 20:56

Ultra-narrowband is a fancy new name for an old thing. The idea is to use a phase modulated carrier to transmit data at a very low bitrate. This saves energy and improves spectral efficiency (bits per second of data throughput per hertz of radio bandwidth). This in turn makes it convenient for battery-powered sensors and 20-billion Internet-connected toasters of tomorrow. For similar reasons, amateur radio operators have been chatting over PSK31, which is essentially the same thing as ultra-narrowband, for almost two decades now.

Currently SIGFOX seems to be the main commercial operator that's pushing this technology. They don't publish protocol details, however they've written a 3GPP proposal for C-UNB standard, which is public. The benefit of ultra-narrowband is that the simple BPSK modulation can be implemented with existing cheap and well tested integrated transceivers. Compare with the original Weightless standard for instance, which required custom silicon for its much more advanced physical layer and seems mostly forgotten these days (although it's not a completely fair comparison, since SIGFOX operates in unlicensed spectrum and Weightless had to deal with complexities of TV whitespaces, but I digress).

CC1101 transceiver on SNE-ISMTV-868

The CC-series of transceivers from Texas Instruments (like CC1101 and CC1120) has a lot of software-configurable modulation blocks built-in, but a BPSK modulator is not among them. However, you can find some references to ultra-narrowband being implemented with these chips which suggests that people are using them for this purpose. The C-UNB proposal also mentions that it can be easily implemented with modified FSK modulation, but doesn't go into more detail. I wanted to implement ultra-narrowband on CC1101 for a project we're doing at the Institute, so I looked into this possibility.

As any introductory course in telecommunications is quick to point out, frequency and phase modulation are basically the same thing. If you take a frequency modulator and feed it a time-derivative of a signal the result is identical to a phase modulator fed with the unmodified signal. In practice however it's not that simple. BPSK requires that the phase changes ±180° for each symbol change. The frequency-shift keying block in CC chips does not have a well-defined relation between frequency deviation and symbol rate. This means that it's hard to define how much signal phase changes during each symbol.

CC1101 does have a minimum-shift keying mode. This is a special form of frequency modulation that has well-defined phase shifts between symbols. Wikipedia says that the carrier phase continuously shifts by ±90° each symbol period, which does not sound useful at first:

Minimum-shift keying illustration.

In this interpretation of phase shifts, the carrier frequency fc is in the middle between frequencies for the two symbols, f0 and f1. This is the usual interpretation for frequency modulation, where you have approximately equal numbers of both symbols in a typical transmission.

However, if you transmit mostly one symbol, say f0, the receiver will consider that to be the carrier f'c. In that case, each occurrence of symbol 1 rotates the phase of the signal compared to f'c by +180°. This is exactly what you need to implement BPSK.

Alternative interpretation of phase in MSK.

BPSK requires that phase shifts are fast compared to symbol rate, so you want to encode each BPSK symbol with many MSK symbols. Ultra-narrowband uses symbol rates on the order of 100 symbols/s while CC1101 supports up to around 1 Msymbol/s. This means that you could have fast phase changes, but 10 MSK symbols per each BPSK symbol seems to suffice.

In the end, bits encoded into MSK symbols look somewhat similar to the theoretical time-derivative I mentioned above. You have an impulse of a single f1 symbol each time you have a transition from bit 0 to 1 or vice versa:

Using multiple MSK symbols as one BPSK symbol.

So far, this has been all theoretic. How well does it work in practice? The most obvious problem is frequency stability. The local oscillator on CC1101 is designed to be re-calibrated often, but you cannot calibrate it while you are transmitting. With such low bitrates, packet transmissions last for several seconds. During that time the frequency can drift quite a lot, especially compared to the very limited bandwidth of these transmissions. This is the usual problem with narrowband transmissions and CC1101 has no mechanism for compensating for it on reception. That is why I doubt a CC1101-to-CC1101 link would work in this way and I haven't tried it.

Transmission from a CC1101 to a specialized receiver however seems to work quite nicely in practice. You just have to use a SDR with a wide-enough channel for reception and compensate for frequency drifts in software. I have some lab measurements to share, but those will have to wait for another post.

Posted by Tomaž | Categories: Digital | Comments »

Measuring interrupt response times, part 2

27.04.2016 11:40

Last week I wrote about some typical interrupt response times you get from an Arduino and Raspberry Pi, if you follow basic examples from documentation or whatever comes up on Google. I got some quite unexpected results, like for instance a Python script that responds faster than a compiled C program. To check some of my guesses as to what caused those results, I did another set of measurements.

For Arduino, most response times were grouped around 9 microseconds, but there were a few outliers. I checked the Arduino library source and it indeed always enables AVR timer/counter0 overflow interrupt. If timer interrupt happens at the same time as the GPIO interrupt I was measuring, the GPIO interrupt can get delayed. Performing the measurement with the timer interrupt masked out indeed removes the outliers:

Effect of timer interrupt on Arduino response time.

With timer off, all measured response times are between 9.1986 to 8.9485 μs. This is a 0.2501 μs long interval. It fits perfectly with theory - at 16 MHz CPU clock and instruction length between 1 and 5 cycles, uncertainty for interrupt latency is 0.25 μs.

The second weird thing was the aforementioned discrepancy between Python and C on Raspberry Pi. The default Python library uses an ugly hack to bypass the kernel GPIO driver and control GPIO lines directly from user space: it mmaps a range of physical memory containing GPIO registers into its own process memory space using /dev/mem. This is similar to how X servers on Linux (used to?) access graphics hardware from user space. While this approach is very unportable, it's also much faster since you don't need to do context switches into kernel for every operation.

To check just how much faster mmap method is on Raspberry Pi, I copied the GPIO access code from the RPi.GPIO library into my test C program:

Response times using sysfs and mmap methods on Raspberry Pi.

As you can see, the native program is now faster than the interpreted Python script. This also demonstrates just how costly context switches are: the sysfs version is more than two times slower on average. It's also worth noting that both RPi.GPIO and my C program still use epoll() or select() on a sysfs file to wait for the interrupt. Just output pin change can be done with direct memory accesses.

Finally, Raspberry Pi was faster when the CPU was loaded which seemed counterintuitive. I tracked this down to automatic CPU frequency scaling. By default, Raspberry Pi Zero seems to be set to run between 700 MHz and 1000 MHz using ondemand governor. If I switch to performance governor, it keeps the CPU running at 1 GHz at all times. In that case, as expected, the CPU load increases the average response time:

Effect of cpufreq governor on Raspberry Pi response time.

It's interesting to note that Linux kernel comes with pluggable idle loop implementations (CONFIG_CPU_IDLE). The idle loop can be selected through /sys/devices/system/cpu/cpuidle in a similar way to the CPU frequency governor. The Raspbian Jessie release however has that disabled. It uses the default idle loop for ARMv6 processors. Assembly code has been patched though. The ARM Wait For Interrupt WFI instruction in the vanilla kernel has been replaced with some mcreq (write to coprocessor?) instructions. I can't find any info on the JIRA ticket referenced in the comment and the change has been added among other BCM-specific changes in a single 6400-line commit. Idle loop implementation is interesting because if it puts the CPU into a power saving mode, it can affect the interrupt latency as well.

As before, source code and raw data is on GitHub.

Posted by Tomaž | Categories: Digital | Comments »

Measuring interrupt response times

18.04.2016 15:13

Embedded systems were traditionally the domain of microcontrollers. You programmed them in C on bare metal, directly poking values into registers and hooking into interrupt vectors. Only if it was really necessary you would include some kind of a light-weight operating system. Times are changing though. These days it's becoming more and more common to see full Linux systems and high-level languages in this area. It's not surprising: if I can just pop open a shell, see what exceptions my Python script is throwing and fix them on the fly, I'm not going to bother with microcontrollers and the whole in-circuit debugger thing. Some even say it won't be long before we will all be just running web browsers on our devices.

It seems to be common knowledge that the traditional approach really excels at latency. If you're moderately careful with your code, you can get your system to react very quickly and consistently to events. Common embedded Linux systems don't have real-time features. They seem to address this deficiency with some combination of "don't care", "it's good enough" and throwing raw CPU power at the problem. Or as the author of RPi.GPIO library puts it:

If you are after true real-time performance and predictability, buy yourself an Arduino.

I was wondering what kind of performance you could expect from these modern systems. I tend to be very conservative in my work: I have a pile of embedded Linux-running boards, but they are mostly gathering dust while I stick to old-fashioned Cortex M3s and AVRs. So I thought it would be interesting to do some experiments and get some real data about these things.

Measuring interrupt response times on Arduino.

To test how fast a program can respond to an event, I chose a very simple task: Raise an output digital line whenever a rising edge happens on an input digital line. This allowed me to very simply measure response times in an automated fashion using an USB-connected oscilloscope and a signal generator.

I tested two devices: An Arduino Uno using a 16 MHz ATmega328 microcontroller and an Raspberry Pi Zero using a 1 GHz ARM-based CPU running Raspbian Jessie. I tried several approaches to implementing the task. On Arduino, I implemented it with an interrupt and a polling loop. On Raspberry Pi, I tried a kernel module, a native binary written in C and a Python program. You can see exact source code on GitHub.

Measuring interrupt response times on Raspberry Pi.

For all of these, I chose the most obvious approach possible. My implementations were based as much as possible on the preferred libraries mentioned in the documentation or whatever came up on top of my web searches. This meant that for Arduino, I was using the Arduino IDE and the library that comes with it. For Raspberry Pi, I used the RPi.GPIO Python library, the GPIO sysfs interface for native code in user space and the GPIO consumer interface for the kernel module (based on examples from Stefan Wendler). Definitely many of these could be further hand-optimized, but I was mostly interested here in out-of-the-box performance you could get in the first try.

Here is a histogram of 500 measurements for the five implementations:

Histogram of response time measurements.

As expected, Arduino and the Raspberry Pi kernel module were both significantly faster and more consistent than the two Raspberry Pi user space implementations. Somewhat shocking though, the interpreted Python program was considerably faster than my C program compiled into native code.

If you check the source, RPi.GPIO library maps the hardware registers directly into its process memory. This means that it does not need any syscalls for controlling the GPIO lines. On the other hand, my C implementation uses the kernel's sysfs interface. This is arguably a cleaner and safer way to do it, but it requires calls into the kernel to change GPIO states and these require expensive context switches. This difference is likely the reason why Python was faster.

Histogram of response time measurements (zoomed)

Here is the zoomed-in left part of the histogram. Raspberry Pi kernel module can be just as fast as the Arduino, but is less consistent. Not surprising, since the kernel has many other interrupts to service and not that impressive considering 60 times faster CPU clock.

Arduino itself is not that consistent out-of-the-box. While most interrupts are served in around 9 microseconds (so around 140 CPU cycles), occasionally they take as long as 15 microseconds. Probably Arduino library is to blame here since it uses the timer interrupt for delay functions. This interrupt seems to be always enabled, even when a delay function is not running, and hence competes with the GPIO interrupt I am using.

Also, this again shows that polling on Arduino can sometimes be faster than interrupts.

Effect of CPU load on response time.

Another interesting result was the effect of CPU load on Raspberry Pi response times. Somewhat counter intuitively, response times are smaller on average when there is some other process consuming CPU cycles. This happens even with the kernel module, which makes me think it has something to do with power saving features. Perhaps this is due to CPU frequency scaling or maybe the kernel puts an idle CPU into some sleep mode from which it takes longer to wake up.

In conclusion, I was a bit impressed how well Python scores on this test. While it's an order of magnitude slower than Arduino, 200 microseconds on average is not bad. Of course, there's no hard upper limit on that. In my test, some responses took two times as much and things really start falling apart if you increase the interrupt load (like for instance, with a process that does something with the SD card or network adapter). Some of the results on Raspberry Pi were quite surprising and they show once again that intuition can be pretty wrong when it comes to software performance.

I will likely be looking into more details regarding some of these results. If you would like to reproduce my measurements, I've put source code, raw data and a notebook with analysis on GitHub.

Posted by Tomaž | Categories: Digital | Comments »

Another hard drive failure

07.02.2015 21:41

Earlier today one of my hard drives died. It was a fairly old 750 GB "Caviar GP" drive from a Western Digital "My Book" external enclosure. All it does now is emit an impressively loud metallic clicking noise.

I should have seen this coming, of course. At this point I have a pile of failed drives stashed in a box somewhere. I remember that this particular one has been unusually slow to start and mount for the last couple of times I used it. Also, smartd has previously reported "2 Currently unreadable (pending) sectors". Both of which I ignored, because I assumed this was yet another problem with the power supply. I had a "My Book" 12V external power supply fail before with similar symptoms.

I only used this drive for backups recently, so except for some archival copies of machines I no longer own, probably nothing of value was lost. Having at least a listing of contents before it failed would be nice though.

Disassembled Western Digital "My Book" external drive.

Of course, I opened it up to see if there's anything obvious wrong with it. The "My Book" USB interface board and the power supply are not the cause, because the drive has the same problem even when it is connected directly to a SATA port. I can hear the platters spinning and the clicking noise can only be caused by heads trashing around, so those are not stuck either.

Corrosion of surface finish on the controller PCB.

The only thing that immediately looks wrong is the unusual amount of corrosion on the hard drive controller PCB. It's bad enough that one some exposed test points both the immersion gold and the copper layer are completely gone. I'm not quite sure what could have caused that. As far as I can remember, this drive was sitting somewhere around my desk for the whole time, so it hasn't been exposed to any hostile environments. It might be a manufacturing defect of some sort - maybe the board was not rinsed well enough after processing.

Bottom side of the hard drive controller PCB.

I cleaned the pads where the motor and the head connect to the circuit board, but that didn't make any difference.

The copper below the green solder mask looks fine though. The bottom side of the PCB contains one large BGA chip. Maybe that one developed some bad connections, if the problem is indeed in the controller board. Just as an experiment, I also tried the disk-in-the-freezer trick, but that did not make the disk behave any differently.

Posted by Tomaž | Categories: Digital | Comments »

CubieTruck UDMA CRC errors

18.10.2014 20:07

Last year I bought a CubieTruck, a small, low-powered ARM computer, to host this web site and a few other things. Combined with a Samsung 840 EVO SSD on the SATA bus, it proved to be a relatively decent replacement for my aging Intel box.

One thing that has been bothering me right from the start though is that every once in a while, there were problems with the SATA bus. Occasionally, isolated error messages like these appeared in the kernel log:

kernel: ata1.00: exception Emask 0x10 SAct 0x2000000 SErr 0x400100 action 0x6 frozen
kernel: ata1.00: irq_stat 0x08000000, interface fatal error
kernel: ata1: SError: { UnrecovData Handshk }
kernel: ata1.00: failed command: WRITE FPDMA QUEUED
kernel: ata1.00: cmd 61/18:c8:68:0e:49/00:00:02:00:00/40 tag 25 ncq 12288 out
kernel:          res 40/00:c8:68:0e:49/00:00:02:00:00/40 Emask 0x10 (ATA bus error)
kernel: ata1.00: status: { DRDY }
kernel: ata1: hard resetting link
kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
kernel: ata1.00: supports DRM functions and may not be fully accessible
kernel: ata1.00: supports DRM functions and may not be fully accessible
kernel: ata1.00: configured for UDMA/133
kernel: ata1: EH complete

At the same time, the SSD reported increased UDMA CRC error count through the SMART interface:

UDMA CRC weekly error count on CubieTruck.

These errors were mostly benign. Apart from the cruft in the log files they did not appear to have any adverse effects. Only once or twice in the last 10 months or so did they cause the kernel to remount filesystems on the SSD as read-only, which required some manual intervention to get the CubieTruck back on-line.

I've seen some forum discussions that suggested this might be caused by a bad power supply. However, checking the power lines with an oscilloscope did not show anything suspicious. On the other hand, I did notice during this test that the errors seemed to occur when I was touching the SATA cable. This made me think that the cable or the connectors on it might be the culprit - something that was also suggested in the forums.

Originally, CubieTruck comes with a custom SATA cable that combines both power and data lines for the hard drive and has special connectors (at least considering what you usually see in the context of the SATA cabling) on the motherboard side.

Last few weeks it appeared that the errors were getting increasingly more common, so I decided to try replacing the cable. Instead of ordering a new CubieTruck SSD kit I improvised a bit: I didn't have proper connectors for CubieTruck's power lines at hand, so I just soldered the cables directly to the motherboard. On the SSD drive I used the standard 15-pin SATA power connector.

For the data connection, I used an ordinary SATA data cable. The shortest one I could find was about three times as long as necessary, so it looks a bit uglier now. The connector on the motherboard side also needed some work with a scalpel to fit into CubieTruck's socket. The original connector on the cable that came with CubieTruck is thinner than those on standard SATA cables I tried.

Replacement SATA cables for CubieTruck.

So far it seems this fixed the CRC errors. In the past few days since I replaced the cable I haven't seen any new errors pop up, but I guess it will take a month or so to be sure.

Posted by Tomaž | Categories: Digital | Comments »

GA 7VT600 lmsensors settings

02.05.2014 17:35

Recently I've put into use a relatively ancient Gigabyte GA 7VT600 1394 motherboard that's been gathering dust on the top shelf of my wardrobe. I used it to replace an even older MSI board which, while still working perfectly, was getting a bit slow.

After replacing the dead lithium battery for RTC and NVRAM, it seems to work just fine with stock Debian Wheezy and passes a few ad-hoc stress tests.

One thing I noticed though is that sensors tool from the lm-sensors package isn't very useful by default.

Adapter: ISA adapter
in0:          +1.70 V  (min =  +0.00 V, max =  +4.08 V)
in1:          +1.33 V  (min =  +0.00 V, max =  +4.08 V)
in2:          +3.25 V  (min =  +0.00 V, max =  +4.08 V)
in3:          +2.86 V  (min =  +0.00 V, max =  +4.08 V)
in4:          +3.23 V  (min =  +0.00 V, max =  +4.08 V)
in5:          +1.89 V  (min =  +0.00 V, max =  +4.08 V)
in6:          +1.89 V  (min =  +0.00 V, max =  +4.08 V)
in7:          +3.01 V  (min =  +0.00 V, max =  +4.08 V)
Vbat:         +0.00 V  
fan1:        3308 RPM  (min =    0 RPM, div = 8)
fan2:           0 RPM  (min =    0 RPM, div = 8)
temp1:        +35.0°C  (low  = +127.0°C, high = +127.0°C)  sensor = thermistor
temp2:        +31.0°C  (low  = +127.0°C, high = +127.0°C)  sensor = thermistor
temp3:        +47.0°C  (low  = +127.0°C, high = +127.0°C)  sensor = thermal diode
intrusion0:  OK

The board obviously has a IT87 series chip that provides some hardware monitoring functionality (you need the it87 kernel module). Apart from the lack of useful labels, some voltages also seem to be divided by voltage dividers before being measured by it87. I would expect at least 5 V and 12 V lines there.

Figuring out which fan is which was trivial. For finding out other things, I compared the printout above with what the BIOS setup utility says. I picked out the most logical divider values for voltages. Since these also seem to fit the order of sensors, I'm relatively confident they are correct.

PC Health Status screen on GA 7VT600 motherboard.

in5 and in6 readings are very unstable and don't seem to be shown in the BIOS screen. I'm guessing they are not connected on this board. temp2 is also not shown, but seems to give reasonable values, so I'm guessing there is a temperature sensor connected there, but I don't know where it is.

So, for future reference, put this into /etc/sensors.d/ga-7vt600 to get a nicely labeled and properly calculated values for this hardware:

chip "it87-isa-0290"
    label temp1 "Sys Temp"
    label temp2 "Aux Temp"
    label temp3 "CPU Temp"

    label fan1 "CPU Fan"
    label fan2 "Sys Fan"

    label in0 "Vcore"
    label in1 "DDR Vtt"
    label in2 "+3.3V"
    label in3 "+5V"
    label in4 "+12V"
    label in7 "5VSB"

    compute in3 @*1.679, @/1.679
    compute in4 @*3.973, @/3.973
    compute in7 @*1.679, @/1.679
Posted by Tomaž | Categories: Digital | Comments »

CubieTruck Perl performance

23.01.2014 22:57

Two months ago I bought a CubieTruck, one of the many cheap, bare-bone ARM-based computers that keep popping-up everywhere these days. My idea was to replace the aging x86 server that is running this website with something more power-efficient. So I was looking for a reasonably powerful board with a proper SATA interface and a decent amount of RAM. Raspberry Pi was out of the question, but the latest incarnation of CubieBoard with a dual-core 1 GHz ARM Cortex-A7, 2 GB of RAM, SATA 2.0 and Gigabit Ethernet seemed to fit the bill.

Unfortunately I could not find any reliable benchmarks I could use to estimate how ARM SoCs perform in comparison with my existing setup. So before I decided to migrate I took a while to do some performance tests and get to know this hardware.


The software setup I'm interested in benchmarking is somewhat archaic in these days of Node.js and NoSQL. I'm using Perl 5 with HTML::Template doing most of the heavy lifting (at least according to Devel::NYTProf profiler). Most parts are statically generated and some are dynamic using a handful of SpeedyCGI Perl 5 scripts. These are combined into a consistent website you see here with a somewhat convoluted Apache configuration using the threaded worker.

In the following benchmarks I'm comparing:

  • An AMD Duron at 700 MHz, 1.2 GB RAM running stock x86 Debian Squeeze. Root filesystem is mounted from an IDE hard drive.
  • A CubieTruck A20 running armhf Debian Wheezy and the kernel supplied for the CubieTruck Ubuntu Server installation. Root filesystem is mounted from an SD card.

Both machines were connected through a 100 Mb/s Ethernet switch to a laptop which was running the remote end of the benchmarks.

First, to see how fast the static part of the web site is generated, I ran the full (single threaded) HTML rebuild. I measured the required user space CPU time with the time utility. This is the fastest run of three on each machine:

AMD Duron CubieTruck
CPU time to rebuild static pages 45.3 s 61.8 s

Then, to check if network was operating at the bit rate I thought it was, I ran iperf to measure TCP throughput between the server and the laptop:

AMD Duron CubieTruck
iperf throughput test 94.0 Mb/s 94.5 Mb/s

Finally, I ran a suite of tests using the Apache benchmarking tool. I measured how many requests per minute a server can handle for different types of content and different number of concurrent requests. Numbers in parentheses show size of HTTP body (without headers).

CubieTruck requests per second for a static HTML page.

CubieTruck requests per second for a dynamic HTML page.

CubieTruck requests per second for an image.

CubieTruck requests per second for API call.

The site rebuild is somewhat disappointingly almost one-third slower than on a 10 year old PC. However the single threaded Apache performance is on par with it. In the case of more concurrent users the CubieTruck of course has an advantage because of an additional CPU core. Actually in both cases with static content CubieTruck managed to saturate the line when there was more than one concurrent request.

I tried to make these tests in a way that the slow SD card in the CubieTruck would minimally affect their outcome. All of data should fit into the buffer cache, which is why in the first test I only took into account the fastest run and only user space CPU time. However I now suspect that the SD card still affected the numbers somehow (the rebuild operation is the heaviest of the tests regarding filesystem I/O). I don't know for sure how kernel computes the time returned by the time utility.

These results are good enough that I can't dismiss CubieTruck based on performance. If a proper SATA drive wouldn't speed it up, I could probably parallelize the build process with not much work. That should cut down on time if it's really Perl performance on ARM that is slowing it down. On the other hand I'm having some other concerns about using CubieTruck as a personal server so I'm not completely decided yet about putting it on my rack.

Posted by Tomaž | Categories: Digital | Comments »

Repairing the Happy Hacking Keyboard

29.09.2013 15:40

My trusty old Happy Hacking Keyboard has been working pretty reliably for the last four years. After fixing a botched plastic mold and strategically placing a piece of cardboard in its innards that is. Regarding the typing feel it is still my favorite keyboard that doesn't take a lot of space on a crowded desk and I only switch to a regular-sized Logitech when I'm working with EDA programs where I need functions keys a lot.

So I was pretty disappointed when it stopped working a week ago. Checking the kernel log revealed all sorts of random USB bus errors:

usb 6-1.1: USB disconnect, device number 14
usb 6-1: reset full-speed USB device number 13 using uhci_hcd
usb 6-1: device not accepting address 13, error -71
usb 6-1: reset full-speed USB device number 13 using uhci_hcd
usb 6-1: device firmware changed
hub 6-1:1.0: hub_port_status failed (err = -19)
hub 6-1:1.0: hub_port_status failed (err = -19)
hub 6-1:1.0: hub_port_status failed (err = -19)
hub 6-1:1.0: activate --> -19
usb 6-1: USB disconnect, device number 13
usb 6-1: new full-speed USB device number 15 using uhci_hcd
usb 6-1: string descriptor 0 read error: -71
usb 6-1: New USB device found, idVendor=04fe, idProduct=0008
usb 6-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
usb 6-1: can't set config #1, error -71
hub 6-0:1.0: port 1 disabled by hub (EMI?), re-enabling...

This looked like something systemic. Either the controller was resetting continuously or there was something wrong with the USB wiring between the controller and the computer.

A new, identical HHKB model goes for more than $110 today so I opened it up to see if there's anything I can do. After checking the cable with an ohm-meter my suspicion fell on the power supply which seems to consist of a 3.3 V LDO regulator and some capacitors. I could see no obvious transients on the power rails when the controller switched on after the keyboard was plugged in. One interesting thing I did see was that if negotiation with USB host fails the controller switches itself off completely, including its quartz oscillator.

Happy Hacking Keyboard Lite2 USB controller

Since I had some problems with flaky USB cables before, I removed the original cable and soldered a new cable directly to the circuit board. This fixed the problem! After poking around some more, it turned out that after re-soldering the connector to the PCB the original cable worked as well.

I removed the membrane before poking around the controller board since a hot soldering iron and plastics don't mix well. Re-inserting the soft matrix tails into the (non-ZIF) connectors was somewhat tricky. I resorted to using pliers plus a bit of paper to protect the delicate wires.

Re-inserting flexible cables into connectors using pliers

I also noticed that silver wires on the keyboard matrix itself seem to be developing a kind of a dark oxide on the outer edges. I don't remember for sure whether they looked like this from the start though. If something is eating away at the wires that definitely puts a kind of a definitive limit to this keyboard's longevity.

Possible oxidation on the keyboard matrix

In conclusion, just checking with a multimeter doesn't mean there's not a bad solder joint somewhere on a high-speed bus. The wisdom of checking first for bad RoHS soldering on mechanically (and thermally) stressed components confirmed itself again. Also, did you know there are a couple of alternative, open source Happy Hacking Keyboard controllers out there?

Posted by Tomaž | Categories: Digital | Comments »

Some notes about CC chips

20.05.2013 12:31

Here are two unusual things I noticed while working with Texas Instruments (used to be Chipcon) CC2500 and CC1101 integrated transceivers.

It appears that the actual bit rate in continuous synchronous serial mode can differ significantly from what Texas Instrument's SmartRF Studio 7 calculates. See for example the following measurements that were taken on a CC2500 chip using a 27 MHz crystal oscillator as a reference clock.

MDMCFG4 valueRF studio [baud]measured [baud]

These bit rates were measured using an oscilloscope attached to the clock output of the transceiver, so I trust them to be correct. Bit rates I measured on a CC1101 agree with what SmartRF Studio predicts.

Update: I revisited this issue and the problem was a bug in my code that caused MDMCFG3 register (which also affects data rate) not to be properly programmed on CC2500. Accounting for this bug, the data rates are within 1% of those calculated by SmartRF Studio 7 or from the formula given in the datasheet.

The other issue I saw is symbol mapping for 4FSK modulation in CC1101. It looks like it depends on the configured bit rate. For example, with 200 baud, the symbol to frequency mapping appears to be as follows:

00−0.33 fdev
01−1.00 fdev
10+0.33 fdev
11+1.00 fdev

However, with 45 baud, the mapping is different, with symbol bit order apparently switched around:

00−0.33 fdev
01+0.33 fdev
10−1.00 fdev
11+1.00 fdev

Update: It's possible this difference has something to do with when exactly the radio samples the data line in relation to the clock. Either I don't understand exactly what is going on or the radio isn't sampling the data when it is supposed to. Also, the factors of fdev in tables were wrong (symbol frequencies are equally spaced, with maximum deviation from central frequency equal fdev).

Of course, this doesn't matter if you are using two identically configured CC1101 chips on both ends of the radio link. But it is important if you want to use it to communicate with some other hardware.

Posted by Tomaž | Categories: Digital | Comments »

VESNA and signal synthesis

26.04.2013 20:49

Experimental radio equipment on VESNA can be used for more than spectrum sensing. Radio boards based on Texas Instruments CC2500 and CC1101 transceivers can also be used for packet transmission and reception and, perhaps somewhat surprisingly, also as flexible signal generators. These can be used in experiments when you want for instance to introduce a controlled interference in some system or to check if your spectrum sensor is working correctly.

Usually when I'm talking with people about what our hardware is capable of the conversation often starts with amazement at how small the radio part is and then inevitably turns to the question whether VESNA has a software defined radio. I have to answer no, it doesn't, and then there's usually an awkward silence because that seems like a dead-end for any serious research work these days.

CC1101 transceiver on SNE-ISMTV-868

True, these transceivers don't provide software access to the undemodulated baseband samples (like for instance USRP does). However they are still amazingly flexible. They offer plenty of reconfigurability and, after you get to know some of their quirks, can be adapted for a lot of weird use cases without hardware changes. If you skip the various high-level digital parts of the chip for proprietary packet format handling there's a flexible front-end in there that allows you to choose between a handful of frequency, amplitude and phase modulators and several options for channel filters and such.

The closest you can come to software defined radio is a transparent continuous transmit mode where you can feed the transceiver arbitrary binary data from the microcontroller and it will simply get modulated and fed into the antenna. There is a catch though, since the chips offer at most 2-bit quantization of the baseband signal before modulation (they were designed for simple digital transmissions after all). This means that you have to get creative if you want to approximate an analog modulation and be ready for plenty of quantization noise.

This is how it looks like on a spectrum analyzer when you run a direct digital synthesis algorithm on the microcontroller that generates a baseband sawtooth frequency sweep and use the amplitude modulator to up-convert it to 2.4 GHz:

RF signal synthesis using VESNA

(Click to watch RF signal synthesis using VESNA video)

Using delta-sigma modulation you can approximate arbitrary waveforms in this way. For instance, you can make a passable simulation of an (analog) wireless microphone transmission using a 4-FSK modulator in CC1101 tuned into the UHF band.

Of course, setting this up takes more work than popping a few blocks into GNU Radio Companion and part of my job is to make it more accessible to people using VESNA and our VESNA-based testbeds. If you're interested in such signal generation using Texas Instruments CC series, some platform-independent code capable of doing this should start hitting the vesna-spectrum-sensor repository on GitHub in the next few weeks.

Posted by Tomaž | Categories: Digital | Comments »

The lowly connector

11.04.2013 20:47

Here's a small addendum to my previous list of lessons regarding easy debugability of microcontroller boards.

It's not a bad idea to pay a bit of extra attention to the connectors that will be used when developing and debugging software. It's likely these will see much more use than any other connector on the board, especially if the board is also meant for teaching or research like VESNA.

The first concern is that it should be hard to connect it in a wrong way. IDC and similar pin header connectors are popular for JTAG. Use a male part that has a shroud so that it's impossible to connect it when displaced by a pin or turned 180 degrees. That might sound obvious, but when you're debugging that tricky race condition and switching the debugger between three systems on your table late in the evening, the last thing you need is a burned out board because of a misplaced connector.

The other thing that is also worth considering is the life time of the connector itself. While the life time of the part on the board might not be problematic, the part that stays with the developer can be. For VESNA one of the most common reasons to make a trip to the soldering station is a torn wire in the connector for the serial debug console. We use something similar to the Berg connector and that one doesn't really take many connect-disconnect cycles before either wires get torn out or little springs break and the metal part falls out of the plastic housing. It's not always obvious that has happened and again it's a pain to realize after a long debugging session that the reason a board is talking garbage is not due to a bug you're trying to catch but rather due to a broken ground line in your debug console.

Posted by Tomaž | Categories: Digital | Comments »

About Digi Connect ME

03.10.2012 23:32

Remember my recent story about Atmel modules? For the last two days I had a very strong déjà-vu feeling when I was trying to debug an issue that popped up in the last minute and is preventing a somewhat time critical experiment from being performed on one of our VESNA testbeds.

It turned out that while 7-bit ASCII strings were correctly transferred between VESNAs and a client calling a HTTP API, binary data sent down the same pipeline got corrupted somewhere on the way. Plus, to make things just a bit more interesting, it sometimes also made the whole network of VESNAs unreachable.

VESNA coordinator with Digi Connect ME module.

Now, problems like this are unfortunately quite a common occurrence. Before data from a sensor node reaches a client, it must pass through multiple hops in a (notoriously unreliable) ZigBee mesh network, a coordinator VESNA that serves as a proxy between the mesh and a TCP/IP tunnel, a Java application on the end of the tunnel that translates between a home-grown HTTP-like protocol used between VESNAs and proper HTTP and finally an Apache server acting as a reverse-proxy for the public HTTP API. Leaky abstractions and encapsulations are a plenty and it's not unusual to have some code somewhere along the line assume that the data it is passing is ASCII-only, valid UTF-8 or something else entirely.

I won't bore you with too many details about debugging. I opted for a top-down approach, with the Java layer being the main suspect based on previous experience. After that got a thorough check, I moved to sniffing the tunnel with Wireshark, which has an extra complication that it's SSL encrypted. Luckily, it doesn't seem to use Diffie–Hellman key exchange, meaning that the SSL dissector was quite effective once I figured out how to extract private keys from Java's keystore. That also seemed to look OK, so the next layer down was the SSL tunnel endpoint, which is a Digi Connect ME module.

This is basically a black box that takes a duplex RS-232 connection on one end and tunnels it through an encrypted TCP/IP connection. It's an deceptively simple description though. In fact Digi Connect ME is a miniature computer with an integrated web server for configuration, scripting support and a ton of other features (I have close to 500 pages of documentation for it on my computer and I only downloaded documents I though might be useful in debugging this issue).

VESNA coordinator hooked up to a logic analyzer.

Anyway, when I looked closely, the problem was quite apparent. On the RS-232 side the module was set to use XON/XOFF software flow-control. This obviously won't work when sending arbitrary binary data. Not only will the module interpret XON and XOFF bytes as special and so drop them from the pipeline, an XOFF that is not followed by XON will also halt the transmission, leading to hangs and timeouts. The fix looked simple enough: switch to hardware flow control using dedicated CTS and RTS lines.

As you might have guessed, it was not that easy. It turns out that when hardware flow control is enabled in Digi Connect ME, it will randomly duplicate characters when sending them down the serial line. Again, the suspicion first fell on our homegrown UART driver on the VESNA side, but a logic analyzer trace below confirmed that it's the actual Digi Connect module that is the culprit and not some bug on our side.

Logic analyzer trace from a Digi Connect ME module.

Now, at this point I'm seriously confused. Digi Connect ME is quite popular and, at least judging from browsing the web, used in a lot of applications. But after the ZigBit module it's also the second piece of hardware that exhibits such broken behavior that I can't believe it has passed any thorough quality check. All of my experience speaks that it must be us that are doing something wrong and in both cases I have tested just about all possibilities where VESNA could be doing something wrong. Actually, I want it to be a mistake on our end because that means I can fix it. But honestly, once you see wrong data being sent on a logic analyzer, I don't think there can be any more doubt. Must really everything turn rotten once you look into it with enough detail?

Posted by Tomaž | Categories: Digital | Comments »