Testing Galaksija's memory

26.09.2017 20:13

Before attempting to restore the damaged laminate of Mr Ivetić' Galaksija I wanted to have some more confidence that the major components are still in working order. The fact that the NAND gate in the character generator patch was still working correctly gave me hope that the board was not connected to a wrong power supply. That could do serious damage to the semiconductors. Still, I wanted to test some of the bigger integrated circuits. Memory chips are relatively straightforward to check. They are also mounted on sockets on this board, so they were easy to remove and test on a breadboard.

Character generator ROM on the Galaksija circuit board.

This is another post in the series about restoration of an original Galaksija microcomputer. Galaksija is a small home microcomputer from former Yugoslavia that was built around the Z80 microprocessor. It used EPROMs, a predecessor to modern flash memory, to store its simple operating system. As it was common at the time, Galaksija could not update its system software by itself. In fact, western home computers typically stored such software in mask ROMs. There, code and data was programmed by physically etching a pattern into the metal layer of the chip. Even though Yugoslavia had semiconductor industry that was capable of making ROMs, producing a custom chip was not economical for Galaksija, which had relatively low production numbers.

EPROMs removed from the Galaksija circuit board.

Galaksija originally came with two EPROMs. The first one, called ROM A in the manual and marked Master EPROM here, contains 4 kB of Z80 CPU machine code and data for basic operations. It includes specialized functions related to the hardware: video driver for generation of the video signal, keyboard read-out as well as modulation and demodulation routines for saving data to an audio cassette. Some higher-level functions are also included. There's a simplistic terminal emulation with a command-line interface, a stack-based floating point calculator and a BASIC interpreter based on the TRS-80. My incomplete Galaksija disassembly contains more details.

The second EPROM is the character generator ROM. Galaksija's video output is designed fundamentally around text. The frame buffer contains only references to characters that are to be drawn on the screen. How these characters look, the actual pixels you see, are stored in the character ROM. This is similar to how text mode worked on old PCs and was done to limit the RAM use. In fact, a bitmapped image of the whole screen would not fit into the 2 kB of Galaksija's RAM. Of course, this means that only very limited graphics can be displayed. By sacrificing a lot of RAM and hacking the video driver, some limitations can be worked around.

Iskra EMS6116 static RAM on a Galaksija computer.

Galaksija uses static RAM for its working memory. Using costly static RAM was obsolete even in the early 1980s and is the main reason why Galaksija originally only had 2 kB of RAM (which could be upgraded to 6 kB by inserting up to two more identical 2 kB chips). The much larger dynamic RAM would require more complicated circuitry to interface with the CPU and was only added in the later Galaksija "Plus" upgrade. Interestingly, the Z80 CPU was originally meant to use dynamic RAM and includes functionality to perform the required refresh cycles. However in Galaksija this function was instead used for video generation. This board uses a rare EMS6116 RAM chip made by Iskra Semiconductors.

Galaksija's ROM A connected to Arduino Mega.

After carefully removing all three memory chips from the board I wired them up to an Arduino Mega using a breadboard and a rat's nest of jumper wires. I used a slightly modified Oddbloke's RomReader sketch for dumping the EPROM contents. Since the 6116 RAM has an electrical interface that is very similar to 27-series EPROMs I also used this sketch as a base of my RAM test. The RAM test sketch first wrote a test pattern (bytes 00, FF, AA and 55) to all RAM addresses and then read it out to check for any bad bits. Sources for both Arduino sketches are available here.

The first few runs of the EPROM dumper showed that ROM A didn't read out correctly. Its contents differed from what I had on record and consecutive reads yielded somewhat different results. After double-checking my setup however it turned out that my Arduino Mega board only puts out around 4.5 V on the +5 V supply. This is on the lower specified limit for these EPROMs, so it could explain occasional bad bits. After supplying a more stable voltage to the EPROM from a lab power supply, ROM A read correctly. Its contents were exactly the same as what I had on record (and what I use on my Galaksija replica).

Two variants of the Galaksija character set

Similarly, the RAM and the other EPROM also checked out fine. However, in contrast to ROM A, the character ROM contents differed from what I had expected. After a closer look at the binary dump (using chargendump tool from my Galaksija tools to visualize its contents) it turned out that the difference is in characters 0 and 39 (ASCII hex codes 40 and 27 respectively). These two characters are used to draw the two halves of the logo that is displayed before the distinctive Galaksija READY command prompt.

Galaksija screenshot

The character ROM I use on my replica contains an arrow-like logo of Elektronika inženjering. The ROM in this Galaksija's image contains the game-of-life glider logo of Mipro design. Both of these logos are etched into the copper on the solder side of the Galaksija circuit board:

"design mipro" logo on Galaksija PCB.

Elektronika inženjering logo on Galaksija PCB.

I don't know why there are two versions of the character ROM in existence and how old each of them is. As far as I know, both of these companies were involved in the manufacture of the original do-it-yourself kit parts (including the PCB and the keyboard). Wikipedia currently says that later factory-built computers were built by Elektronika inženjering, so it is possible that the arrow logo version is the more recent one. The blurry screenshot from the original Galaksija manual suggests that the glider logo was used when the screenshot was made. This seems to confirm that the glider logo is older.

Figure showing Galaksija's character set from the Galaksija manual.

In any case, both versions of the ROM seem to already float around the web, so this discovery isn't terribly exciting. The Galaksija Emulator for instance comes with the glider logo version. As far as I can remember, I originally obtained my arrow logo ROM images from the Wikipedia page. The article used to contain hex dumps, but they were since then deleted due to copyright and non-encyclopedic content concerns.

In conclusion, everything worked as expected, which is great news as far as the restoration of this Galaksija is concerned and a green light to proceed to fixing the PCB. It's also a testament to the reliability of old integrated circuits. I was pretty sure at least the EPROMs have discharged. The datasheet mentions that normal office fluorescent lighting will discharge an unprotected die in around 3 years. Considering that the chips were most likely programmed more than 30 years ago, it is surprising that the content lasted this long (the bit errors at low supply voltage I've seen might be the first sign of the deterioration though). It's also surprising that the Iskra EMS6116 survived and passed all the tests I could throw at it. Domestic chips did not have the best of reputations as far as reliability was concerned, but at least this specimen seemed to survive the test of time just fine.

Posted by Tomaž | Categories: Digital | Comments »

BeagleCore Module eMMC and SD card benchmarks

12.08.2017 13:03

In my experience, slow filesystem I/O is one of the biggest disadvantages of cheap ARM-based single-board computers. It contributes a lot to the general feeling of sluggishness when you work interactively with such systems. Of course, for many applications you might not care much about the filesystem after booting. It all depends on what you want to use the computer for. But it's very rare to see one of these small ARMs that would not be several times slower than a 10-year old Intel x86 box as far as I/O is concerned.

A while ago I did some benchmarking of the eMMC flash on the old Raspberry Pi Compute Module. I compared it with the SD card performance on Raspberry Pi Zero and the SATA drive on a CubieTruck. In the mean time, the project that brought the Computer Module on my desk back then has pivoted to a BeagleCore module. Since I now have a small working system with the BCM1 I thought I might do the same thing and compare its I/O performance with the other systems I tested earlier.

BeagleCore Module mounted on the SNA-LGTC board.

The BCM1 is a small Linux-running computer that comes in the form of a surface-mount hybrid module. It is built around a Texas Instruments AM335x Sitara system-on-chip with a single-core 1 GHz ARM Cortex-A8 CPU. The module comes with 512 MB RAM and 4 GB eMMC chip. It is supported by the software from the BeagleBoard ecosystem and in my case runs Debian Jessie with the 4.4.30-ti-r64 Linux kernel. Our board has a micro SD card socket, so I was also able to benchmark the SD card as well as the eMMC flash. I was using the Samsung EVO+ 32 GB card.

To perform the benchmark I used the same script on BCM1 as I used in my previous tests. I used hdparm and dd to estimate uncached and cached read and write throughputs. I ran each test 5 times and used the best result.

Comparison of write performance for ARM systems.

The write performance is better with the SD card than the eMMC flash on BCM1. SD card on BCM1 is also faster than the SD card on Raspberry Pi Zero, although this is probably not relevant. It's likely that the Zero performance was limited by the no-name SD card that came with it. eMMC on the BCM1 is slower than on Raspberry Pi CM.

The 16.2 MB/s result for the SD card here is somewhat suspect however. After several repeats, the first run of five was always the fastest, with the later runs only yielding around 12 MB/s. It is as if some caching was involved (even though fdatasync was specified with dd).

Comparison of read performance for ARM systems.

Interestingly, things turn around with read performance. BCM1's eMMC flash is better at reading data than the SD card. In fact, BCM1 eMMC flash reads faster than both Raspberry Pi setups I tested. It is still at least 3 times slower than a SATA drive on the CubieTruck.

Comparison of cached read performance for ARM systems.

Cached read performance is the least interesting of these tests. It's more or less the benchmark of the CPU memory access rather than anything related to the storage devices. Hence both BCM1 results are more or less identical. Interestingly, BCM1 with the 1 GHz CPU does not seem to be significantly better than the Compute Module with the 700 MHz CPU.

My results for the BCM1 eMMC flash are similar to those published here for the BeagleBone Black. This is expected, since BeagleBone Black has the same hardware as BCM1, and gives me some confidence that my results are at least somewhat correct.

Posted by Tomaž | Categories: Digital | Comments »

The Galaksija character generator patch

05.07.2017 19:55

In my first overview of Mr Ivetić' Galaksija I mentioned a curious bundle of components hidden inside a yellowing cocoon of Sellotape. It was obviously not a part of the original kit and I speculated that it was likely a workaround for some timing issue connected with the character generator. In the late 1980s several articles were published in Računari and Moj Mikro magazines that attempted to help Galaksija owners fix various hardware problems. Unfortunately I couldn't find any suggested fixes that would match what I saw, so I decided to investigate this particular hardware patch a bit further.

Galaksija circuit board from Mr. Ivetić.

This is another post in the series about the possible restoration of an original Galaksija computer that I took custody of recently. Galaksija is a small home microcomputer from former Yugoslavia that was built around the Z80 microprocessor. The designs were openly published in a magazine in 1984 with the intention that readers would build their own computers from scratch. It is similar to the Sinclair ZX80 in that it uses the CPU to generate the video signal and is constructed solely from general-purpose logic chips. It is generally considered the most successful of several domestic alternatives to computers that were illegally imported from the west.

Components that we hidden under the tape.

After carefully unwrapping layers of disgusting, decaying sticky tape I found a 74LS10 triple 3-input NAND chip from National Semiconductors, a resistor and two capacitors. The circuit is connected to the rest of the computer with only four wires: a logic input, a logic output, +5V supply and ground. The green 150 nF capacitor on top of the chip is only used for decoupling the power supply. First 3-input NAND is wired as a 2-input NAND, second NAND is wired as a NOT gate and the third is left unconnected. Together they form the following functionally equivalent logic circuit:

Schematic of the monostable multivibrator circuit.

This circuit acts as a monostable multivibrator. It will take an impulse of an arbitrary length on its input and always output an impulse of a fixed length that is defined by the time constant of the RC circuit.

When the input goes from high to low, the transition is immediately propagated over the NAND gate, the capacitor and NOT gate to the other NAND input. This latches the output low regardless of any later input changes. Over time, the resistor discharges the capacitor enough that the NOT gate input falls below the logic threshold and the output goes back high. This also unlatches the circuit, allowing another input impulse to trigger it again.

The theoretical output impulse length should be around 80 ns based on the capacitor and resistor values shown above.

Monostable multivibrator demonstration, short impulse.

Monostable multivibrator demonstration, long impulse.

I carefully unsoldered the circuit from Galaksija and connected it to a signal generator using the original lengths of wire. On the screenshots above, the yellow trace is the input and the blue trace is the output. As you can see, the circuit is still working. The output impulse length correctly stays the same regardless of the input impulse length. The circuit has a propagation delay of around 32 ns and the measured output impulse length is around 60 ns. The digital signal is distorted due to ground bounce and other effects of the wires that are quite long for signals this fast.

Location of the monostable on the full schematic.

The monostable is connected in front of the shift/load input to the 74LS166 shift register that generates the video signal. See full schematic here.

Normally the shift register shifts out individual bits on its serial output as the electron beam scans the TV screen. However, once per every 8 pixels it must load new data. To do this, the CPU reads out 8 new pixels from the character ROM. During this time, the shift/load signal must go low for exactly one transition of the 6.144 MHz pixel clock to load the register.

Timing diagram for the CPU's M1 cycle with the "M1" detect signals added.

Loading the shift register in Galaksija is quite a tricky operation as several signals must be accurately synchronized. The situation is made even more complex by the original Galaksija design. To avoid using an extra chip, the circuit does not fully decode the required CPU bus states with combinatorial logic. Instead, it generates the shift/load impulse dynamically. A 74LS74 D flip-flop is cleverly wired to the CPU bus, as shown on the timing diagram above, to create the load impulse.

Normally digital circuits are designed to work even with ideal components with zero propagation time. However, this circuit depends on the fact that the pixel data will be loaded into the shift register before the CPU settles after the last clock of the M1 state. It's one of the two parts of Galaksija's circuit where signals race each other like this.

For a more in-depth explanation of the character generator, see my old blog post about the CMOS redesign and sections 3.1.5 and 4.1.2 in my diploma thesis (in Slovene - English machine translation).

Timing detail for the shift/load signal and the pixel clock.

So, why was the monostable circuit added to this Galaksija? The rough timing diagram above shows the shift/load signal in relation to the pixel clock. The specification for the SGS Z8400B (the Z80 variant used on this particular board) only gives a maximum of 100 ns for the settling after the low-to-high transition of the CPU clock. Hence, the time the shift/load signal spends low in the original circuit (without the monostable) is anywhere between 0 and 100 ns. If this time is too short, it will miss the low-to-high transition of the pixel clock and the shift register won't load. With the monostable added into the circuit, however, the shift/load will always be low for 60 ns and will always catch the pixel clock.

It was known that the original Galaksija design doesn't work with all Z80-compatible CPUs. As the CPU manufacturers improved their processes, the signal transition times were getting lower and eventually some chips were settling too fast for the unmodified character generator circuit to work correctly. The CPU on this board has a date code from 1986, around 10 years after the first Z80 CPU was introduced and 2 years after Galaksija was first published. It's not surprising that it caused timing problems.

This patch appears to be one way to make the circuit more resilient to the CPU variations. It is not perfect though. If the transition time is too short, the impulse might be too short to trigger the monostable. A better approach is to fully decode the CPU state. This is the solution I chose when designing my CMOS replica. Of course, this comes at a cost of more logic and would not be simple to add to an existing circuit board.

Posted by Tomaž | Categories: Digital | Comments »

Closer look at the original Galaksija

09.05.2017 20:34

A few weeks ago I met with Mr. Vojislav Ivetić in Maribor. He entrusted me with an old Galaksija computer circuit board. Several years ago he obtained it from Janez Stergar at the Faculty of Electrical Engineering and Computer Science, University of Maribor. He told me that the historical computer was in an unknown condition, very likely not working, and was interested in restoring it back to usable state. This post is the result of my visual inspection of the circuit to estimate the extent of the restoration that would be necessary.

Galaksija is a small home microcomputer that was designed in Belgrade by Voja Antonić around the Z80 microprocessor. The designs were openly published in a magazine in 1984 with the intention that readers would build their own computers from scratch. Do-it-yourself kits could be ordered by mail and eventually also complete, factory made computers. Galaksija was often easier to obtain than similar foreign computers due to heavy import restrictions in the former Yugoslavia. It is generally considered the most successful of several attempts at a domestic home microcomputer.

At the first glance, Mr. Ivetić' Galaksija appears to be built from one of the kits. It has a white mechanical keyboard and a factory made single-layer printed circuit board with the green solder mask and white silk screen print on top. The integrated circuits and other components were most likely gathered from various sources and soldered manually (not all are in sockets). All original Galaksija computers I've seen looked very similar to this. Some had black keyboards, but they all shared the same PCB design.

Galaksija circuit board from Mr. Ivetić.

The circuit board has the basic Galaksija configuration. Only the 4 kB ROM A is installed. This ROM contains the BASIC interpreter, video driver and the rest of Galaksija's minimalistic operating system (here marked Master EPROM). The ROM B socket is empty.

The quartz windows on UV-erasable EPROMs are only covered with a white paper sticker. If the board was stored for a long time exposed to light, it might be that the EPROMs have lost their charge due to ambient UV light and will have to reprogrammed.

Iskra EMS6116 static RAM on a Galaksija computer.

There is a single 2 kB static RAM chip installed. Interestingly, the logo suggests this is an Iskra EMS6116, a domestic integrated circuit. I was not aware that RAM was produced by Iskra. In fact, the original magazine article that gives instructions for Galaksija builders suggests ordering RAM and other chips by mail from abroad (with suggested distributors that will ship to Yugoslavia and tips on getting the shipments through customs). Sockets for additional two 2 kB RAM chips are empty.

All other chips are foreign made. The Z80 CPU and EPROMs are all from SGS (former Italian semiconductor company, later merged into STMicroelectronics). These also have the most recent date codes among the identifiable components on the board: first week of 1986. Original Galaksija design was published in January 1984, so this board was built at least 2 years later. Other logic chips I could identify are from TI and SGS. The oldest chip is the 74LS38 from 1979.

Improvised circuit on shift/load line.

There is a small bundle of components wrapped in sticky tape hanging off the PCB on four wires. It looks like it contains an IC in a DIP package and some capacitors. The circuit sits in front of the shift/load input to the 74LS166 shift register that generates the video signal. It's also connected to the ground and the power supply. Since the extra circuit is not connected to any other digital lines, I'm guessing it is most likely a delay to fix some timing problem.

Location of the improvised circuit on the schematic.

Normally, the shift/load input is driven directly by a circuit that detects when the CPU is in the M1 (opcode fetch) cycle. See full schematic here. I know from my previous research that M1 detection circuit on the original Galaksija is unreliable, since it depends on signal timings that are not guaranteed by the design of the Z80 CPU. It's possible that this was an attempt to work around this issue.

Two potentiometers for setting sync pulse lengths.

There is no RF modulator installed. The circuit has been modified so that composite video signal is directly present on a pair of improvised screw terminals. I'm guessing this Galaksija was used with a monitor or a TV with composite input. Those were quite rare at the time, but it was not uncommon for people to modify their TV sets to add a composite input.

Two potentiometers are wired in series with R12 and R13. They have been glued down, but are now hanging loose on wires. Potentiometers seem to have been installed to adjust horizontal and vertical sync pulse widths. They are not part of the original design. They affect the time constants of 74LS123 monostable multivibrators that generate synchronization impulses in the composite video signal.

Missing space key on the Galaksija.

The space keycap is missing, but the key itself is present. I guess even if a suitable replacement can't be found, one could be drawn in a CAD program and 3D-printed.

Example of a lifted track on Galaksija PCB.

A look at the bottom side reveals that the condition of the copper laminate is quite bad. Many tracks and annular rings have broken or lifted off the substrate. The PCB shows signs of old repairs to some of the damaged tracks, so at least part of this damage is not due to age. Maybe soldering was done at a too high temperature or the quality of the laminate was not particularly good. This Galaksija shows no signs that it was ever mounted in a case, so the damage might also be due to mechanical stress. Many tracks around EPROM sockets are broken, suggesting that the stress of inserting and removing the EPROMs was at least partially responsible.

Ruined annular rings under a transistor on Galaksija.

I've counted around 40 points on the PCB that would need repair. Some are hairline breaks in traces that seem easy to reliably bridge with solder. Other parts would require replacements of copper areas using foil and epoxy glue to bring them back to original condition. Fortunately this PCB has relatively large features compared to modern SMD boards. However, this extent of repair still seems like a lot of delicate work. I'm also not certain that other areas of the laminate that look fine now would not start failing during repair.

If all else fails, another possibility would be to have a whole replacement PCB made and re-solder the keyboard and other original components. This would obviously decrease the historical authenticity. While the scans of original PCB masks are available on the web, those are not precise enough to make a usable replacement board. They would need to be redrawn before they can be sent to a fab.

In conclusion, all basic components are there and look fairly well preserved. At the moment I have no reason to believe that any chips are bad. However the PCB should be repaired before attempting to power up this board. The extent of damage and the amount of fine work with the copper foil would make this repair quite time consuming. It would be nice to somehow check the state of the most critical chips before proceeding on that path. Fixing the PCB would be a big waste of time if the CPU or RAM chip will eventually turn out to be bad. On the other hand, replacements for 74LSxx series logic still seem to be relatively easy to come by.

Posted by Tomaž | Categories: Digital | Comments »

ESP8266 humidity monitoring

27.02.2017 20:11

Last year my flat developed a bit of a mold problem, or maybe I just found out about it then. It's possible the fungus already lived a long, fulfilling life before being discovered. It wouldn't be surprising for a building from an era when thermal- and hydro-isolation were pretty far down on the priority list. In any case, it made me want to monitor the relative air humidity and dew point levels a bit more closely. I had the apartment pretty well covered with sensors already, but the room with the mold in particular lacked a hygrometer.

Wireless humidity monitor based on the ESP8266 module.

Not to re-invent too much hot water, I more or less replicated the nicely documented temperature and humidity web server project from Adafruit. It was doubly appealing because I still had a full bag of old ESP8266 modules that I bought for pennies back when they were the exciting new thing. The only problem was the fact that the humidity sensors supported out-of-the-box by that project were only available from their shop, which is relatively expensive for small items with oversea shipping. Since I was in a hurry, I bought a few DHT11 modules anyway (which turned out to be a mistake).

There's not much to say about the hardware. It's the minimalistic Adafruit's circuit soldered on a perforated board. For the power supply I used a small 3.3V switch-mode converter module I had left over from another project. I was nicely surprised by how easy ESP8266 support was to install into the Arduino IDE. Another pleasant discovery was that ESP8266 with the Arduino-based firmware seems to consume much less power than with the stock AT-command firmware.

The Arduino IDE got updated since Adafruit's tutorial was written, so I had to experiment a bit with the firmware upload settings. Following values seemed to work with my particular modules. Another thing I discovered was that the RST line on ESP8266 has to be left floating for the firmware upload to work reliably. On my previous ESP8266 project I tied it to VCC.

Arduino firmware upload settings for ESP8266 modules.

Unfortunately, the DHT11 modules are pretty bad as far as accuracy is concerned. I only discovered Robert's wonderfully in-depth comparison of hygrometer modules after the fact. I played a bit with power supply filtering, but that doesn't seem to be the source of the noise in the data. I ended up modifying Adafruit's firmware so that it reads the sensor every 5 seconds and returns the average of last 8 readings. This alleviates somewhat the problem, but I definitely recommend using some other sensor to anyone wanting to build this.

For comparison, here is the daily humidity graph recorded using DHT11 with averaging:

DHT11 example daily humidity record.

And here is humidity recorded at the same time (albeit in a different room) by a TEMPerHUM USB dongle with no extra averaging applied:

TEMPerHUM example daily humidity record.

After several months of running, the Arduino-based ESP8266 turned out to be pretty reliable. I haven't seen any big outages in the log of sensor readings. This is a nice improvement over the stock firmware that I used in my Munin display, which still regularly gets lost to the point that it requires a power cycle.

Posted by Tomaž | Categories: Digital | Comments »

Raspberry Pi Compute Module eMMC benchmarks

03.07.2016 13:52

I have a Raspberry Pi Compute Module development kit on my desk at the moment. I'm doing some testing and prototyping because we're considering using it for a project at the Institute. The Compute Module is basically a small PCB with the Broadcom's BCM2835 system-on-chip, 4 GB of flash ROM on an eMMC connection and little else. Even providing power supply at a number of different voltages is left as an exercise for the user.

Raspberry Pi Compute Module

I was wondering how the eMMC flash performs compared to the SD card on the more common Pies. I couldn't find any good benchmarks on the web. Wikipedia says that the latest eMMC standard rivals SATA speeds, but there's not much info around on what kind the Compute Module uses. I've used Samsung's ARM Chromebook with eMMC flash a while ago and that felt pretty fast. On the other hand, watching package updates scroll by on the Compute Module gave me a feeling that it's quite sluggish.

To get some more objective benchmark, I decided to compare the I/O performance with my Raspberry Pi Zero. Zero uses the same BCM2835 SoC, so the results should be somewhat comparable. I used the SD card that originally came with Zero preloaded with the Noobs distribution. It only has the raspberry logo printed on it, so I don't know the exact model or manufacturer. Both Compute Module and Zero were running the latest Raspbian Jessie.

One surprising discovery during this benchmark was that CPU on Zero runs between 700 MHz and 1 GHz while the Compute Module will only run at 700 MHz. These are the ranges detected at boot by bcm2835-cpufreq and default /boot/config.txt that came with the Raspbian image (i.e. no special overclocking). Because of this I performed the benchmarks on Zero at 700 MHz and 1 GHz.

For comparison, I also ran the same benchmark on my Cubietruck that has an Allwinner A20 system-on-chip with SATA-connected Samsung EVO 840 SSD and runs vanilla Debian Jessie.

This is the benchmark script I used. For each run, I chose the fastest result out of 5:

N=5

DEVICE=/dev/sda
#DEVICE=/dev/mmcblk0

I=0
while [ $I -lt $N ]; do
	hdparm -t $DEVICE
	I=$(($I+1))
done

I=0
while [ $I -lt $N ]; do
	hdparm -T $DEVICE
	I=$(($I+1))
done

I=0
while [ $I -lt $N ]; do
	dd if=/dev/zero of=tempfile bs=1M count=128 conv=fdatasync 2>&1
	I=$(($I+1))
done

I=0
while [ $I -lt $N ]; do
	echo 3 > /proc/sys/vm/drop_caches
	dd if=tempfile of=/dev/null bs=1M count=128 2>&1
	I=$(($I+1))
done

I=0
while [ $I -lt $N ]; do
	dd if=tempfile of=/dev/null bs=1M count=128 2>&1
	I=$(($I+1))
done

Here is write performance, as measured by dd. I wonder if dd figures are affected by filesystem fragmentation since it writes an actual file that might not be contiguous. I've been using Zero for a while with this Raspbian image while the Compute Module has been freshly re-imaged. Fragmentation shouldn't be as significant as with spinning disks, but it probably still has some effect.

Comparison of write performance.

Read performance, as measured by hdparm as well as dd. To remove the effect of cache when measuring with dd, I explicitly dropped kernel block device caches before each run.

Comparison of read performance.

From this it seems Compute Module's eMMC flash is slightly faster than the SD card, both on read and writes when comparing to Zero running at the same CPU clock frequency. It's interesting that Zero's results change significantly with CPU frequency, which seems to suggest that some part of SD card I/O is CPU bound. That said, performance seems to be somewhere roughly on the same order of magnitude. Cubietruck is significantly faster than both. In light of this result, it's sad that never versions of Cubieboard (and cheap ARM SoCs in general) dropped the SATA interface.

Finally, I tested block device cache performance. This more or less shows only RAM and CPU performance and shouldn't depend on storage speed.

Comparison of cached read performance.

Interestingly, Zero seems to be somewhat faster than the Compute Module at 700 MHz here. /proc/cpuinfo shows a different revision, although it's not clear to me whether that marks board revision or SoC revision. It might be that processors in Zero and Compute Module are not identical pieces of silicon.

In the end, I should note that these results are not super accurate. Complexities of I/O benchmarking on Linux aside, there are several things that might have affected the results. I already mentioned different filesystem state. A different SD card in Zero might give very different results (I didn't have a second empty card at hand to try that). While Raspberry Pies were idle during these tests, Cubietruck was running my web server and various other little tidbits that tend to accumulate on such machines.

Posted by Tomaž | Categories: Digital | Comments »

Ultra-narrowband and BPSK on TI CC chips

15.06.2016 20:56

Ultra-narrowband is a fancy new name for an old thing. The idea is to use a phase modulated carrier to transmit data at a very low bitrate. This saves energy and improves spectral efficiency (bits per second of data throughput per hertz of radio bandwidth). This in turn makes it convenient for battery-powered sensors and 20-billion Internet-connected toasters of tomorrow. For similar reasons, amateur radio operators have been chatting over PSK31, which is essentially the same thing as ultra-narrowband, for almost two decades now.

Currently SIGFOX seems to be the main commercial operator that's pushing this technology. They don't publish protocol details, however they've written a 3GPP proposal for C-UNB standard, which is public. The benefit of ultra-narrowband is that the simple BPSK modulation can be implemented with existing cheap and well tested integrated transceivers. Compare with the original Weightless standard for instance, which required custom silicon for its much more advanced physical layer and seems mostly forgotten these days (although it's not a completely fair comparison, since SIGFOX operates in unlicensed spectrum and Weightless had to deal with complexities of TV whitespaces, but I digress).

CC1101 transceiver on SNE-ISMTV-868

The CC-series of transceivers from Texas Instruments (like CC1101 and CC1120) has a lot of software-configurable modulation blocks built-in, but a BPSK modulator is not among them. However, you can find some references to ultra-narrowband being implemented with these chips which suggests that people are using them for this purpose. The C-UNB proposal also mentions that it can be easily implemented with modified FSK modulation, but doesn't go into more detail. I wanted to implement ultra-narrowband on CC1101 for a project we're doing at the Institute, so I looked into this possibility.

As any introductory course in telecommunications is quick to point out, frequency and phase modulation are basically the same thing. If you take a frequency modulator and feed it a time-derivative of a signal the result is identical to a phase modulator fed with the unmodified signal. In practice however it's not that simple. BPSK requires that the phase changes ±180° for each symbol change. The frequency-shift keying block in CC chips does not have a well-defined relation between frequency deviation and symbol rate. This means that it's hard to define how much signal phase changes during each symbol.

CC1101 does have a minimum-shift keying mode. This is a special form of frequency modulation that has well-defined phase shifts between symbols. Wikipedia says that the carrier phase continuously shifts by ±90° each symbol period, which does not sound useful at first:

Minimum-shift keying illustration.

In this interpretation of phase shifts, the carrier frequency fc is in the middle between frequencies for the two symbols, f0 and f1. This is the usual interpretation for frequency modulation, where you have approximately equal numbers of both symbols in a typical transmission.

However, if you transmit mostly one symbol, say f0, the receiver will consider that to be the carrier f'c. In that case, each occurrence of symbol 1 rotates the phase of the signal compared to f'c by +180°. This is exactly what you need to implement BPSK.

Alternative interpretation of phase in MSK.

BPSK requires that phase shifts are fast compared to symbol rate, so you want to encode each BPSK symbol with many MSK symbols. Ultra-narrowband uses symbol rates on the order of 100 symbols/s while CC1101 supports up to around 1 Msymbol/s. This means that you could have fast phase changes, but 10 MSK symbols per each BPSK symbol seems to suffice.

In the end, bits encoded into MSK symbols look somewhat similar to the theoretical time-derivative I mentioned above. You have an impulse of a single f1 symbol each time you have a transition from bit 0 to 1 or vice versa:

Using multiple MSK symbols as one BPSK symbol.

So far, this has been all theoretic. How well does it work in practice? The most obvious problem is frequency stability. The local oscillator on CC1101 is designed to be re-calibrated often, but you cannot calibrate it while you are transmitting. With such low bitrates, packet transmissions last for several seconds. During that time the frequency can drift quite a lot, especially compared to the very limited bandwidth of these transmissions. This is the usual problem with narrowband transmissions and CC1101 has no mechanism for compensating for it on reception. That is why I doubt a CC1101-to-CC1101 link would work in this way and I haven't tried it.

Transmission from a CC1101 to a specialized receiver however seems to work quite nicely in practice. You just have to use a SDR with a wide-enough channel for reception and compensate for frequency drifts in software. I have some lab measurements to share, but those will have to wait for another post.

Posted by Tomaž | Categories: Digital | Comments »

Measuring interrupt response times, part 2

27.04.2016 11:40

Last week I wrote about some typical interrupt response times you get from an Arduino and Raspberry Pi, if you follow basic examples from documentation or whatever comes up on Google. I got some quite unexpected results, like for instance a Python script that responds faster than a compiled C program. To check some of my guesses as to what caused those results, I did another set of measurements.

For Arduino, most response times were grouped around 9 microseconds, but there were a few outliers. I checked the Arduino library source and it indeed always enables AVR timer/counter0 overflow interrupt. If timer interrupt happens at the same time as the GPIO interrupt I was measuring, the GPIO interrupt can get delayed. Performing the measurement with the timer interrupt masked out indeed removes the outliers:

Effect of timer interrupt on Arduino response time.

With timer off, all measured response times are between 9.1986 to 8.9485 μs. This is a 0.2501 μs long interval. It fits perfectly with theory - at 16 MHz CPU clock and instruction length between 1 and 5 cycles, uncertainty for interrupt latency is 0.25 μs.

The second weird thing was the aforementioned discrepancy between Python and C on Raspberry Pi. The default Python library uses an ugly hack to bypass the kernel GPIO driver and control GPIO lines directly from user space: it mmaps a range of physical memory containing GPIO registers into its own process memory space using /dev/mem. This is similar to how X servers on Linux (used to?) access graphics hardware from user space. While this approach is very unportable, it's also much faster since you don't need to do context switches into kernel for every operation.

To check just how much faster mmap method is on Raspberry Pi, I copied the GPIO access code from the RPi.GPIO library into my test C program:

Response times using sysfs and mmap methods on Raspberry Pi.

As you can see, the native program is now faster than the interpreted Python script. This also demonstrates just how costly context switches are: the sysfs version is more than two times slower on average. It's also worth noting that both RPi.GPIO and my C program still use epoll() or select() on a sysfs file to wait for the interrupt. Just output pin change can be done with direct memory accesses.

Finally, Raspberry Pi was faster when the CPU was loaded which seemed counterintuitive. I tracked this down to automatic CPU frequency scaling. By default, Raspberry Pi Zero seems to be set to run between 700 MHz and 1000 MHz using ondemand governor. If I switch to performance governor, it keeps the CPU running at 1 GHz at all times. In that case, as expected, the CPU load increases the average response time:

Effect of cpufreq governor on Raspberry Pi response time.

It's interesting to note that Linux kernel comes with pluggable idle loop implementations (CONFIG_CPU_IDLE). The idle loop can be selected through /sys/devices/system/cpu/cpuidle in a similar way to the CPU frequency governor. The Raspbian Jessie release however has that disabled. It uses the default idle loop for ARMv6 processors. Assembly code has been patched though. The ARM Wait For Interrupt WFI instruction in the vanilla kernel has been replaced with some mcreq (write to coprocessor?) instructions. I can't find any info on the JIRA ticket referenced in the comment and the change has been added among other BCM-specific changes in a single 6400-line commit. Idle loop implementation is interesting because if it puts the CPU into a power saving mode, it can affect the interrupt latency as well.

As before, source code and raw data is on GitHub.

Posted by Tomaž | Categories: Digital | Comments »

Measuring interrupt response times

18.04.2016 15:13

Embedded systems were traditionally the domain of microcontrollers. You programmed them in C on bare metal, directly poking values into registers and hooking into interrupt vectors. Only if it was really necessary you would include some kind of a light-weight operating system. Times are changing though. These days it's becoming more and more common to see full Linux systems and high-level languages in this area. It's not surprising: if I can just pop open a shell, see what exceptions my Python script is throwing and fix them on the fly, I'm not going to bother with microcontrollers and the whole in-circuit debugger thing. Some even say it won't be long before we will all be just running web browsers on our devices.

It seems to be common knowledge that the traditional approach really excels at latency. If you're moderately careful with your code, you can get your system to react very quickly and consistently to events. Common embedded Linux systems don't have real-time features. They seem to address this deficiency with some combination of "don't care", "it's good enough" and throwing raw CPU power at the problem. Or as the author of RPi.GPIO library puts it:

If you are after true real-time performance and predictability, buy yourself an Arduino.

I was wondering what kind of performance you could expect from these modern systems. I tend to be very conservative in my work: I have a pile of embedded Linux-running boards, but they are mostly gathering dust while I stick to old-fashioned Cortex M3s and AVRs. So I thought it would be interesting to do some experiments and get some real data about these things.

Measuring interrupt response times on Arduino.

To test how fast a program can respond to an event, I chose a very simple task: Raise an output digital line whenever a rising edge happens on an input digital line. This allowed me to very simply measure response times in an automated fashion using an USB-connected oscilloscope and a signal generator.

I tested two devices: An Arduino Uno using a 16 MHz ATmega328 microcontroller and an Raspberry Pi Zero using a 1 GHz ARM-based CPU running Raspbian Jessie. I tried several approaches to implementing the task. On Arduino, I implemented it with an interrupt and a polling loop. On Raspberry Pi, I tried a kernel module, a native binary written in C and a Python program. You can see exact source code on GitHub.

Measuring interrupt response times on Raspberry Pi.

For all of these, I chose the most obvious approach possible. My implementations were based as much as possible on the preferred libraries mentioned in the documentation or whatever came up on top of my web searches. This meant that for Arduino, I was using the Arduino IDE and the library that comes with it. For Raspberry Pi, I used the RPi.GPIO Python library, the GPIO sysfs interface for native code in user space and the GPIO consumer interface for the kernel module (based on examples from Stefan Wendler). Definitely many of these could be further hand-optimized, but I was mostly interested here in out-of-the-box performance you could get in the first try.

Here is a histogram of 500 measurements for the five implementations:

Histogram of response time measurements.

As expected, Arduino and the Raspberry Pi kernel module were both significantly faster and more consistent than the two Raspberry Pi user space implementations. Somewhat shocking though, the interpreted Python program was considerably faster than my C program compiled into native code.

If you check the source, RPi.GPIO library maps the hardware registers directly into its process memory. This means that it does not need any syscalls for controlling the GPIO lines. On the other hand, my C implementation uses the kernel's sysfs interface. This is arguably a cleaner and safer way to do it, but it requires calls into the kernel to change GPIO states and these require expensive context switches. This difference is likely the reason why Python was faster.

Histogram of response time measurements (zoomed)

Here is the zoomed-in left part of the histogram. Raspberry Pi kernel module can be just as fast as the Arduino, but is less consistent. Not surprising, since the kernel has many other interrupts to service and not that impressive considering 60 times faster CPU clock.

Arduino itself is not that consistent out-of-the-box. While most interrupts are served in around 9 microseconds (so around 140 CPU cycles), occasionally they take as long as 15 microseconds. Probably Arduino library is to blame here since it uses the timer interrupt for delay functions. This interrupt seems to be always enabled, even when a delay function is not running, and hence competes with the GPIO interrupt I am using.

Also, this again shows that polling on Arduino can sometimes be faster than interrupts.

Effect of CPU load on response time.

Another interesting result was the effect of CPU load on Raspberry Pi response times. Somewhat counter intuitively, response times are smaller on average when there is some other process consuming CPU cycles. This happens even with the kernel module, which makes me think it has something to do with power saving features. Perhaps this is due to CPU frequency scaling or maybe the kernel puts an idle CPU into some sleep mode from which it takes longer to wake up.

In conclusion, I was a bit impressed how well Python scores on this test. While it's an order of magnitude slower than Arduino, 200 microseconds on average is not bad. Of course, there's no hard upper limit on that. In my test, some responses took two times as much and things really start falling apart if you increase the interrupt load (like for instance, with a process that does something with the SD card or network adapter). Some of the results on Raspberry Pi were quite surprising and they show once again that intuition can be pretty wrong when it comes to software performance.

I will likely be looking into more details regarding some of these results. If you would like to reproduce my measurements, I've put source code, raw data and a notebook with analysis on GitHub.

Posted by Tomaž | Categories: Digital | Comments »

Another hard drive failure

07.02.2015 21:41

Earlier today one of my hard drives died. It was a fairly old 750 GB "Caviar GP" drive from a Western Digital "My Book" external enclosure. All it does now is emit an impressively loud metallic clicking noise.

I should have seen this coming, of course. At this point I have a pile of failed drives stashed in a box somewhere. I remember that this particular one has been unusually slow to start and mount for the last couple of times I used it. Also, smartd has previously reported "2 Currently unreadable (pending) sectors". Both of which I ignored, because I assumed this was yet another problem with the power supply. I had a "My Book" 12V external power supply fail before with similar symptoms.

I only used this drive for backups recently, so except for some archival copies of machines I no longer own, probably nothing of value was lost. Having at least a listing of contents before it failed would be nice though.

Disassembled Western Digital "My Book" external drive.

Of course, I opened it up to see if there's anything obvious wrong with it. The "My Book" USB interface board and the power supply are not the cause, because the drive has the same problem even when it is connected directly to a SATA port. I can hear the platters spinning and the clicking noise can only be caused by heads trashing around, so those are not stuck either.

Corrosion of surface finish on the controller PCB.

The only thing that immediately looks wrong is the unusual amount of corrosion on the hard drive controller PCB. It's bad enough that one some exposed test points both the immersion gold and the copper layer are completely gone. I'm not quite sure what could have caused that. As far as I can remember, this drive was sitting somewhere around my desk for the whole time, so it hasn't been exposed to any hostile environments. It might be a manufacturing defect of some sort - maybe the board was not rinsed well enough after processing.

Bottom side of the hard drive controller PCB.

I cleaned the pads where the motor and the head connect to the circuit board, but that didn't make any difference.

The copper below the green solder mask looks fine though. The bottom side of the PCB contains one large BGA chip. Maybe that one developed some bad connections, if the problem is indeed in the controller board. Just as an experiment, I also tried the disk-in-the-freezer trick, but that did not make the disk behave any differently.

Posted by Tomaž | Categories: Digital | Comments »

CubieTruck UDMA CRC errors

18.10.2014 20:07

Last year I bought a CubieTruck, a small, low-powered ARM computer, to host this web site and a few other things. Combined with a Samsung 840 EVO SSD on the SATA bus, it proved to be a relatively decent replacement for my aging Intel box.

One thing that has been bothering me right from the start though is that every once in a while, there were problems with the SATA bus. Occasionally, isolated error messages like these appeared in the kernel log:

kernel: ata1.00: exception Emask 0x10 SAct 0x2000000 SErr 0x400100 action 0x6 frozen
kernel: ata1.00: irq_stat 0x08000000, interface fatal error
kernel: ata1: SError: { UnrecovData Handshk }
kernel: ata1.00: failed command: WRITE FPDMA QUEUED
kernel: ata1.00: cmd 61/18:c8:68:0e:49/00:00:02:00:00/40 tag 25 ncq 12288 out
kernel:          res 40/00:c8:68:0e:49/00:00:02:00:00/40 Emask 0x10 (ATA bus error)
kernel: ata1.00: status: { DRDY }
kernel: ata1: hard resetting link
kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
kernel: ata1.00: supports DRM functions and may not be fully accessible
kernel: ata1.00: supports DRM functions and may not be fully accessible
kernel: ata1.00: configured for UDMA/133
kernel: ata1: EH complete

At the same time, the SSD reported increased UDMA CRC error count through the SMART interface:

UDMA CRC weekly error count on CubieTruck.

These errors were mostly benign. Apart from the cruft in the log files they did not appear to have any adverse effects. Only once or twice in the last 10 months or so did they cause the kernel to remount filesystems on the SSD as read-only, which required some manual intervention to get the CubieTruck back on-line.

I've seen some forum discussions that suggested this might be caused by a bad power supply. However, checking the power lines with an oscilloscope did not show anything suspicious. On the other hand, I did notice during this test that the errors seemed to occur when I was touching the SATA cable. This made me think that the cable or the connectors on it might be the culprit - something that was also suggested in the forums.

Originally, CubieTruck comes with a custom SATA cable that combines both power and data lines for the hard drive and has special connectors (at least considering what you usually see in the context of the SATA cabling) on the motherboard side.

Last few weeks it appeared that the errors were getting increasingly more common, so I decided to try replacing the cable. Instead of ordering a new CubieTruck SSD kit I improvised a bit: I didn't have proper connectors for CubieTruck's power lines at hand, so I just soldered the cables directly to the motherboard. On the SSD drive I used the standard 15-pin SATA power connector.

For the data connection, I used an ordinary SATA data cable. The shortest one I could find was about three times as long as necessary, so it looks a bit uglier now. The connector on the motherboard side also needed some work with a scalpel to fit into CubieTruck's socket. The original connector on the cable that came with CubieTruck is thinner than those on standard SATA cables I tried.

Replacement SATA cables for CubieTruck.

So far it seems this fixed the CRC errors. In the past few days since I replaced the cable I haven't seen any new errors pop up, but I guess it will take a month or so to be sure.

Posted by Tomaž | Categories: Digital | Comments »

GA 7VT600 lmsensors settings

02.05.2014 17:35

Recently I've put into use a relatively ancient Gigabyte GA 7VT600 1394 motherboard that's been gathering dust on the top shelf of my wardrobe. I used it to replace an even older MSI board which, while still working perfectly, was getting a bit slow.

After replacing the dead lithium battery for RTC and NVRAM, it seems to work just fine with stock Debian Wheezy and passes a few ad-hoc stress tests.

One thing I noticed though is that sensors tool from the lm-sensors package isn't very useful by default.

it87-isa-0290
Adapter: ISA adapter
in0:          +1.70 V  (min =  +0.00 V, max =  +4.08 V)
in1:          +1.33 V  (min =  +0.00 V, max =  +4.08 V)
in2:          +3.25 V  (min =  +0.00 V, max =  +4.08 V)
in3:          +2.86 V  (min =  +0.00 V, max =  +4.08 V)
in4:          +3.23 V  (min =  +0.00 V, max =  +4.08 V)
in5:          +1.89 V  (min =  +0.00 V, max =  +4.08 V)
in6:          +1.89 V  (min =  +0.00 V, max =  +4.08 V)
in7:          +3.01 V  (min =  +0.00 V, max =  +4.08 V)
Vbat:         +0.00 V  
fan1:        3308 RPM  (min =    0 RPM, div = 8)
fan2:           0 RPM  (min =    0 RPM, div = 8)
temp1:        +35.0°C  (low  = +127.0°C, high = +127.0°C)  sensor = thermistor
temp2:        +31.0°C  (low  = +127.0°C, high = +127.0°C)  sensor = thermistor
temp3:        +47.0°C  (low  = +127.0°C, high = +127.0°C)  sensor = thermal diode
intrusion0:  OK

The board obviously has a IT87 series chip that provides some hardware monitoring functionality (you need the it87 kernel module). Apart from the lack of useful labels, some voltages also seem to be divided by voltage dividers before being measured by it87. I would expect at least 5 V and 12 V lines there.

Figuring out which fan is which was trivial. For finding out other things, I compared the printout above with what the BIOS setup utility says. I picked out the most logical divider values for voltages. Since these also seem to fit the order of sensors, I'm relatively confident they are correct.

PC Health Status screen on GA 7VT600 motherboard.

in5 and in6 readings are very unstable and don't seem to be shown in the BIOS screen. I'm guessing they are not connected on this board. temp2 is also not shown, but seems to give reasonable values, so I'm guessing there is a temperature sensor connected there, but I don't know where it is.

So, for future reference, put this into /etc/sensors.d/ga-7vt600 to get a nicely labeled and properly calculated values for this hardware:

chip "it87-isa-0290"
    label temp1 "Sys Temp"
    label temp2 "Aux Temp"
    label temp3 "CPU Temp"

    label fan1 "CPU Fan"
    label fan2 "Sys Fan"

    label in0 "Vcore"
    label in1 "DDR Vtt"
    label in2 "+3.3V"
    label in3 "+5V"
    label in4 "+12V"
    label in7 "5VSB"

    compute in3 @*1.679, @/1.679
    compute in4 @*3.973, @/3.973
    compute in7 @*1.679, @/1.679
Posted by Tomaž | Categories: Digital | Comments »

CubieTruck Perl performance

23.01.2014 22:57

Two months ago I bought a CubieTruck, one of the many cheap, bare-bone ARM-based computers that keep popping-up everywhere these days. My idea was to replace the aging x86 server that is running this website with something more power-efficient. So I was looking for a reasonably powerful board with a proper SATA interface and a decent amount of RAM. Raspberry Pi was out of the question, but the latest incarnation of CubieBoard with a dual-core 1 GHz ARM Cortex-A7, 2 GB of RAM, SATA 2.0 and Gigabit Ethernet seemed to fit the bill.

Unfortunately I could not find any reliable benchmarks I could use to estimate how ARM SoCs perform in comparison with my existing setup. So before I decided to migrate I took a while to do some performance tests and get to know this hardware.

CubieTruck

The software setup I'm interested in benchmarking is somewhat archaic in these days of Node.js and NoSQL. I'm using Perl 5 with HTML::Template doing most of the heavy lifting (at least according to Devel::NYTProf profiler). Most parts are statically generated and some are dynamic using a handful of SpeedyCGI Perl 5 scripts. These are combined into a consistent website you see here with a somewhat convoluted Apache configuration using the threaded worker.

In the following benchmarks I'm comparing:

  • An AMD Duron at 700 MHz, 1.2 GB RAM running stock x86 Debian Squeeze. Root filesystem is mounted from an IDE hard drive.
  • A CubieTruck A20 running armhf Debian Wheezy and the kernel supplied for the CubieTruck Ubuntu Server installation. Root filesystem is mounted from an SD card.

Both machines were connected through a 100 Mb/s Ethernet switch to a laptop which was running the remote end of the benchmarks.


First, to see how fast the static part of the web site is generated, I ran the full (single threaded) HTML rebuild. I measured the required user space CPU time with the time utility. This is the fastest run of three on each machine:

AMD Duron CubieTruck
CPU time to rebuild static pages 45.3 s 61.8 s

Then, to check if network was operating at the bit rate I thought it was, I ran iperf to measure TCP throughput between the server and the laptop:

AMD Duron CubieTruck
iperf throughput test 94.0 Mb/s 94.5 Mb/s

Finally, I ran a suite of tests using the Apache benchmarking tool. I measured how many requests per minute a server can handle for different types of content and different number of concurrent requests. Numbers in parentheses show size of HTTP body (without headers).

CubieTruck requests per second for a static HTML page.

CubieTruck requests per second for a dynamic HTML page.

CubieTruck requests per second for an image.

CubieTruck requests per second for API call.

The site rebuild is somewhat disappointingly almost one-third slower than on a 10 year old PC. However the single threaded Apache performance is on par with it. In the case of more concurrent users the CubieTruck of course has an advantage because of an additional CPU core. Actually in both cases with static content CubieTruck managed to saturate the line when there was more than one concurrent request.

I tried to make these tests in a way that the slow SD card in the CubieTruck would minimally affect their outcome. All of data should fit into the buffer cache, which is why in the first test I only took into account the fastest run and only user space CPU time. However I now suspect that the SD card still affected the numbers somehow (the rebuild operation is the heaviest of the tests regarding filesystem I/O). I don't know for sure how kernel computes the time returned by the time utility.

These results are good enough that I can't dismiss CubieTruck based on performance. If a proper SATA drive wouldn't speed it up, I could probably parallelize the build process with not much work. That should cut down on time if it's really Perl performance on ARM that is slowing it down. On the other hand I'm having some other concerns about using CubieTruck as a personal server so I'm not completely decided yet about putting it on my rack.

Posted by Tomaž | Categories: Digital | Comments »

Repairing the Happy Hacking Keyboard

29.09.2013 15:40

My trusty old Happy Hacking Keyboard has been working pretty reliably for the last four years. After fixing a botched plastic mold and strategically placing a piece of cardboard in its innards that is. Regarding the typing feel it is still my favorite keyboard that doesn't take a lot of space on a crowded desk and I only switch to a regular-sized Logitech when I'm working with EDA programs where I need functions keys a lot.

So I was pretty disappointed when it stopped working a week ago. Checking the kernel log revealed all sorts of random USB bus errors:

usb 6-1.1: USB disconnect, device number 14
usb 6-1: reset full-speed USB device number 13 using uhci_hcd
usb 6-1: device not accepting address 13, error -71
usb 6-1: reset full-speed USB device number 13 using uhci_hcd
usb 6-1: device firmware changed
hub 6-1:1.0: hub_port_status failed (err = -19)
hub 6-1:1.0: hub_port_status failed (err = -19)
hub 6-1:1.0: hub_port_status failed (err = -19)
hub 6-1:1.0: activate --> -19
usb 6-1: USB disconnect, device number 13
usb 6-1: new full-speed USB device number 15 using uhci_hcd
usb 6-1: string descriptor 0 read error: -71
usb 6-1: New USB device found, idVendor=04fe, idProduct=0008
usb 6-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
usb 6-1: can't set config #1, error -71
hub 6-0:1.0: port 1 disabled by hub (EMI?), re-enabling...

This looked like something systemic. Either the controller was resetting continuously or there was something wrong with the USB wiring between the controller and the computer.

A new, identical HHKB model goes for more than $110 today so I opened it up to see if there's anything I can do. After checking the cable with an ohm-meter my suspicion fell on the power supply which seems to consist of a 3.3 V LDO regulator and some capacitors. I could see no obvious transients on the power rails when the controller switched on after the keyboard was plugged in. One interesting thing I did see was that if negotiation with USB host fails the controller switches itself off completely, including its quartz oscillator.

Happy Hacking Keyboard Lite2 USB controller

Since I had some problems with flaky USB cables before, I removed the original cable and soldered a new cable directly to the circuit board. This fixed the problem! After poking around some more, it turned out that after re-soldering the connector to the PCB the original cable worked as well.

I removed the membrane before poking around the controller board since a hot soldering iron and plastics don't mix well. Re-inserting the soft matrix tails into the (non-ZIF) connectors was somewhat tricky. I resorted to using pliers plus a bit of paper to protect the delicate wires.

Re-inserting flexible cables into connectors using pliers

I also noticed that silver wires on the keyboard matrix itself seem to be developing a kind of a dark oxide on the outer edges. I don't remember for sure whether they looked like this from the start though. If something is eating away at the wires that definitely puts a kind of a definitive limit to this keyboard's longevity.

Possible oxidation on the keyboard matrix

In conclusion, just checking with a multimeter doesn't mean there's not a bad solder joint somewhere on a high-speed bus. The wisdom of checking first for bad RoHS soldering on mechanically (and thermally) stressed components confirmed itself again. Also, did you know there are a couple of alternative, open source Happy Hacking Keyboard controllers out there?

Posted by Tomaž | Categories: Digital | Comments »

Some notes about CC chips

20.05.2013 12:31

Here are two unusual things I noticed while working with Texas Instruments (used to be Chipcon) CC2500 and CC1101 integrated transceivers.

It appears that the actual bit rate in continuous synchronous serial mode can differ significantly from what Texas Instrument's SmartRF Studio 7 calculates. See for example the following measurements that were taken on a CC2500 chip using a 27 MHz crystal oscillator as a reference clock.

MDMCFG4 valueRF studio [baud]measured [baud]
0x8a50.038.5
0x8b100.077.2
0x8c200.0154.0
0x8d400.0305.0

These bit rates were measured using an oscilloscope attached to the clock output of the transceiver, so I trust them to be correct. Bit rates I measured on a CC1101 agree with what SmartRF Studio predicts.

Update: I revisited this issue and the problem was a bug in my code that caused MDMCFG3 register (which also affects data rate) not to be properly programmed on CC2500. Accounting for this bug, the data rates are within 1% of those calculated by SmartRF Studio 7 or from the formula given in the datasheet.

The other issue I saw is symbol mapping for 4FSK modulation in CC1101. It looks like it depends on the configured bit rate. For example, with 200 baud, the symbol to frequency mapping appears to be as follows:

symbolΔf
00−0.33 fdev
01−1.00 fdev
10+0.33 fdev
11+1.00 fdev

However, with 45 baud, the mapping is different, with symbol bit order apparently switched around:

symbolΔf
00−0.33 fdev
01+0.33 fdev
10−1.00 fdev
11+1.00 fdev

Update: It's possible this difference has something to do with when exactly the radio samples the data line in relation to the clock. Either I don't understand exactly what is going on or the radio isn't sampling the data when it is supposed to. Also, the factors of fdev in tables were wrong (symbol frequencies are equally spaced, with maximum deviation from central frequency equal fdev).

Of course, this doesn't matter if you are using two identically configured CC1101 chips on both ends of the radio link. But it is important if you want to use it to communicate with some other hardware.

Posted by Tomaž | Categories: Digital | Comments »