Newsletters and other beasts

23.07.2016 10:23

It used to be that whenever someone wrote something particularly witty on Slashdot, it was followed by a reply along the lines of "I find your ideas intriguing and would like to subscribe to your newsletter". Ten-odd years later, it seems like the web took that old Simpsons joke seriously. These days you can hardly follow a link on Hacker News without having a pop-up thrown in your face. Most articles now end with a plea for an e-mail address, and I've even been to real-life talks where the speakers continuously advertised their newsletter to the audience.

Recently I've been asked several times why I didn't support subscriptions by e-mail, like every other normal website. The short answer is that I keep this blog in a state that I wish other websites I visit would adopt. This means no annoying advertisements, respecting your privacy by not loading third-party Javascript or tracking cookies, HTTPS and IPv6 support, valid XHTML... and good support for the Atom standard. Following the death of Google Reader, the world turned against RSS and Atom feeds. However, I still find them vastly more usable than any alternative. It annoys me that I can't follow interesting people and projects on modern sites like Medium and Hackaday.io through this channel.

Twitter printer at 32C3

That said, you now can subscribe by e-mail to my blog, should you wish to do so (see also sidebar top-right). The thing that finally convinced me to implement this was hearing that some of you use RSS-to-email services that add their own advertisements to my blog posts. I did not make this decision lightly though. I used to host mailing lists and know what an incredible time sink they can be, fighting spam, addressing complaints and so on. I don't have that kind of time anymore, so using an external mass-mailing service was the only option. Running my own mail server in this era is lunacy enough.

Mailchimp seems to be friendly, so I'm using that at the moment. If turns to the dark side and depending how popular the newsletter gets, I might move to some other service - or remove it altogether. For the time being I consider this an experiment. It's also worth mentioning that while there are no ads, Mailchimp does add mandatory tracking features (link redirects and tracking pixels). Of course, it also collects your e-mail address somewhere.

Since I'm on the topic of subscriptions and I don't like writing meta posts like this, I would also like to mention here two ways of following my posts that are not particularly well known: if you are only interested in one particular topic I write about, you can search for it. Search results page has an attached Atom feed you can follow that only contains posts related to the query. If you on the other hand believe that Twitter is the new RSS, feel free to follow @aviansblog (at least until Twitter breaks the API again).

Posted by Tomaž | Categories: Life | Comments »

Visualizing frequency allocations in Slovenia

15.07.2016 17:34

If you go to a talk about dynamic spectrum access, cognitive radio or any other topic remotely connected with the way radio spectrum is used or regulated, chances are one of the slides in the introduction will contain the following chart. The multitude of little colorful boxes is supposed to impress on the audience that the spectrum is overcrowded with existing allocations and that any future technology will have problems finding vacant frequencies.

I admit I've used it myself in that capacity a couple of times. "The only space left unallocated is beyond the top-left and bottom-right edges, below 9 kHz and above 300 GHz", I would say, "and those frequencies are not very useful for new developments.". After that I would feel free to advertise the latest crazy idea that will magically create more space, brushing away the fact that spectrum seems to be like IPv4 address space - when there's real need, powers that be always seem to find more of it.

United States frequency allocations, October 2003.

Image by U.S. Department of Commerce

I was soon getting a bit annoyed by this chart. When you study it you realize it's showing the wrong thing. The crowdiness of it does fit with the general story people are trying to tell, but the ITU categories shown are not the problematic part of the 100 year legacy of radio spectrum regulations. I only came to realize that later though. My first thought was "Why are people discussing future spectrum in Europe, using a ten year old chart from U.S. Department of Commerce showing the situation on the other side of the Atlantic?"

Although there is a handful of similar charts for other countries on the web, I couldn't find one that I would be happy with. So two years back, with a bit of free time and encouraged by the local Open Data group, I set off to make my own. It would show the right thing, be up to date and describe the situation in my home country. I downloaded public PDFs with the Slovenian Electronic Communications Act, the National Table of Frequency Allocations and also a few assorted files with individual frequency licenses. Then I started writing a parser that would turn it all into machine-readable JSON. Then, as you might imagine if you ever encountered the words "PDF", "table" and "parsing" in the same sentence, I gave up after a few days of writing increasingly convoluted and frustrating Python code.

Tabela uporabe radijskih frekvenc

Fast forward two years and I was again going through some documents that discuss these matters. I remembered this old abandoned project. In the mean time, new laws were made, the frequency allocation table was updated several times and the PDFs were structured a bit differently. This meant I had to throw away all my previous work, but on the other hand new documents looked a bit easier to parse. I again took the challenge and this time I managed to parse most of the basic NTFA into JSON after a day of work and about 350 lines of Python.

I won't dive deep into technicalities here. I started with the PDF converted to whitespace-formatted UTF-8 text using the pdftotext tool which comes with Poppler. Then I had a series of functions that successively turned text into structured data. This made it easy to inspect and debug each step. Some of the steps included were "fix typos in document" (there are several, by the way, including inconsistent use of points and commas for decimal marks), "extract column widths", "extract header hierarchy", "normalize service names", etc. If there will be interest, I might present the details in a talk at one of the future Open Data Meetups in Ljubljana.

Once I had the data in JSON, drawing a visualization much like the U.S. one above took another 250 lines using matplotlib. Writing them was much more pleasant in comparison though. In hindsight, it would actually make sense to do the visualization part first, since it was much easier to spot parsing mistakes from the graphical representation than by looking at JSON.

Uporaba radijskih frekvenc v Sloveniji glede na storitev

While it is both local and relatively up-to-date, the visualization as such isn't very good. It still only works for conveying that general idea of fragmentation, but not much else. There are way too many categories for an unaided eye to easily match the colors in the legend with the bands in the graph. It would be much better to have an interactive version where you could point to a band and the relevant information would pop-up (or maybe even point you to the relevant paragraphs in the PDFs). Unfortunately this is beyond my meager knowledge of Javascript frameworks.

My chart also still just lists the ITU categories. Not only do they have very little to do with finding space for future allocations, they are useless for spotting interesting parts of the spectrum. For example, the famous 2.4 GHz ISM band doesn't stand out in any way here - it's listed simply under "FIXED, MOBILE, AMATEUR and RADIOLOCATION" services. All such interesting details regarding licensing and technologies in individual bands is hidden in various regulations, scattered across a vast amount of tables, appendices and different PDF documents. It is often in textual form that currently seems impossible to easily extract in an automated way.

I'm still glad that I now have at least some of this data in computer-readable form. I'm sure it will come handy in other projects. For instance, I might eventually use it to add some automatic labels to my real-time UHF and VHF spectrogram from the roof of the IJS campus.

I will not be publicly publishing JSON data and parsing code at the moment. I have concerns about its correctness and the code is so specialized for the specific document that I'm sure nobody will find it useful for anything else. However, if you have some legitimate use for the data, please send me an e-mail and I will be happy to share my work.

Posted by Tomaž | Categories: Life | Comments »

Raspberry Pi Compute Module eMMC benchmarks

03.07.2016 13:52

I have a Raspberry Pi Compute Module development kit on my desk at the moment. I'm doing some testing and prototyping because we're considering using it for a project at the Institute. The Compute Module is basically a small PCB with the Broadcom's BCM2835 system-on-chip, 4 GB of flash ROM on an eMMC connection and little else. Even providing power supply at a number of different voltages is left as an exercise for the user.

Raspberry Pi Compute Module

I was wondering how the eMMC flash performs compared to the SD card on the more common Pies. I couldn't find any good benchmarks on the web. Wikipedia says that the latest eMMC standard rivals SATA speeds, but there's not much info around on what kind the Compute Module uses. I've used Samsung's ARM Chromebook with eMMC flash a while ago and that felt pretty fast. On the other hand, watching package updates scroll by on the Compute Module gave me a feeling that it's quite sluggish.

To get some more objective benchmark, I decided to compare the I/O performance with my Raspberry Pi Zero. Zero uses the same BCM2835 SoC, so the results should be somewhat comparable. I used the SD card that originally came with Zero preloaded with the Noobs distribution. It only has the raspberry logo printed on it, so I don't know the exact model or manufacturer. Both Compute Module and Zero were running the latest Raspbian Jessie.

One surprising discovery during this benchmark was that CPU on Zero runs between 700 MHz and 1 GHz while the Compute Module will only run at 700 MHz. These are the ranges detected at boot by bcm2835-cpufreq and default /boot/config.txt that came with the Raspbian image (i.e. no special overclocking). Because of this I performed the benchmarks on Zero at 700 MHz and 1 GHz.

For comparison, I also ran the same benchmark on my Cubietruck that has an Allwinner A20 system-on-chip with SATA-connected Samsung EVO 840 SSD and runs vanilla Debian Jessie.

This is the benchmark script I used. For each run, I chose the fastest result out of 5:

N=5

DEVICE=/dev/sda
#DEVICE=/dev/mmcblk0

I=0
while [ $I -lt $N ]; do
	hdparm -t $DEVICE
	I=$(($I+1))
done

I=0
while [ $I -lt $N ]; do
	hdparm -T $DEVICE
	I=$(($I+1))
done

I=0
while [ $I -lt $N ]; do
	dd if=/dev/zero of=tempfile bs=1M count=128 conv=fdatasync 2>&1
	I=$(($I+1))
done

I=0
while [ $I -lt $N ]; do
	echo 3 > /proc/sys/vm/drop_caches
	dd if=tempfile of=/dev/null bs=1M count=128 2>&1
	I=$(($I+1))
done

I=0
while [ $I -lt $N ]; do
	dd if=tempfile of=/dev/null bs=1M count=128 2>&1
	I=$(($I+1))
done

Here is write performance, as measured by dd. I wonder if dd figures are affected by filesystem fragmentation since it writes an actual file that might not be contiguous. I've been using Zero for a while with this Raspbian image while the Compute Module has been freshly re-imaged. Fragmentation shouldn't be as significant as with spinning disks, but it probably still has some effect.

Comparison of write performance.

Read performance, as measured by hdparm as well as dd. To remove the effect of cache when measuring with dd, I explicitly dropped kernel block device caches before each run.

Comparison of read performance.

From this it seems Compute Module's eMMC flash is slightly faster than the SD card, both on read and writes when comparing to Zero running at the same CPU clock frequency. It's interesting that Zero's results change significantly with CPU frequency, which seems to suggest that some part of SD card I/O is CPU bound. That said, performance seems to be somewhere roughly on the same order of magnitude. Cubietruck is significantly faster than both. In light of this result, it's sad that never versions of Cubieboard (and cheap ARM SoCs in general) dropped the SATA interface.

Finally, I tested block device cache performance. This more or less shows only RAM and CPU performance and shouldn't depend on storage speed.

Comparison of cached read performance.

Interestingly, Zero seems to be somewhat faster than the Compute Module at 700 MHz here. /proc/cpuinfo shows a different revision, although it's not clear to me whether that marks board revision or SoC revision. It might be that processors in Zero and Compute Module are not identical pieces of silicon.

In the end, I should note that these results are not super accurate. Complexities of I/O benchmarking on Linux aside, there are several things that might have affected the results. I already mentioned different filesystem state. A different SD card in Zero might give very different results (I didn't have a second empty card at hand to try that). While Raspberry Pies were idle during these tests, Cubietruck was running my web server and various other little tidbits that tend to accumulate on such machines.

Posted by Tomaž | Categories: Digital | Comments »

BPSK on TI CC chips, 2

18.06.2016 13:07

A few days ago I described how a Texas Instruments CC1101 chip can be used to transmit a low bitrate BPSK (binary phase-shift keying) signal using the minimum-shift keying (MSK) modulator block. I promised to share some practical measurements.

The following has been recorded using an USRP N200 with sampling frequency of 1 MHz. Raw I/Q samples from the USRP were then passed to a custom BPSK demodulator written in Python and NumPy.

The transmission was done using a CC1101, which was connected to the USRP using a coaxial cable and an attenuator. MSK modulator on CC1101 was setup for hardware data rate of 100 kbps. 1000 MSK symbols were used to encode one BPSK symbol, giving the BPSK bitrate of 100 bps. The packet sent was 57 bytes, which resulted in packet transmission time of around 4.5 seconds. The microcontroller firmware driving the CC1101 kept repeating the same packet with a small delay between transmissions.

Recorded signal power versus time.

This is one packet, shown as I/Q signal power versus time:

Signal power during a single captured packet.

In-phase (real) component of the recorded signal, zoomed in to reveal individual bits:

Zoomed-in in-phase signal component versus time.

Both the CC1101 and USRP were set to the same central frequency (868.2 MHz). Of course, due to tolerances in both devices their local oscillators had slightly different frequencies. This means that the carrier translated to baseband has a low, but non-zero frequency.

You can see 180° phase shifts nicely, as well as some ringing around the transitions. This has to be filtered out before carrier recovery.

After carrier recovery we can plot the carrier frequency during the time of transmission. Here it's plotted for all 4 packets there were recorded:

Recovered carrier frequency versus time for 4 packets.

You can see that the frequency shifts by around 20 Hz over the time of 4.5 seconds. This is around 20% of the 100 Hz channel occupied by the transmission. At 868.2e6 central frequency, 20 Hz drift is a bit over 0.02 ppm, which is actually not that bad. For comparison, the quartz crystal I used with CC1101 has specified ±10 ppm stability over the -20°C to 70°C range (not sure what USRP uses, but it's probably in the same ballpark). However, I think the short-term drift seen here is not due to the quartz itself but more likely due to changes in load capacitance. Perhaps the oscillator is heating slightly during transmission. In fact, just waving my arm over the PCB with the CC1101 has a noticeable effect.

Finally, this is the phase after multiplying the signal with the recovered carrier. The only thing left is digital clock recovery, bit slicing and decoding the upper layers of the protocol:

Signal phase after multiplication with recovered carrier.
Posted by Tomaž | Categories: Analog | Comments »

Ultra-narrowband and BPSK on TI CC chips

15.06.2016 20:56

Ultra-narrowband is a fancy new name for an old thing. The idea is to use a phase modulated carrier to transmit data at a very low bitrate. This saves energy and improves spectral efficiency (bits per second of data throughput per hertz of radio bandwidth). This in turn makes it convenient for battery-powered sensors and 20-billion Internet-connected toasters of tomorrow. For similar reasons, amateur radio operators have been chatting over PSK31, which is essentially the same thing as ultra-narrowband, for almost two decades now.

Currently SIGFOX seems to be the main commercial operator that's pushing this technology. They don't publish protocol details, however they've written a 3GPP proposal for C-UNB standard, which is public. The benefit of ultra-narrowband is that the simple BPSK modulation can be implemented with existing cheap and well tested integrated transceivers. Compare with the original Weightless standard for instance, which required custom silicon for its much more advanced physical layer and seems mostly forgotten these days (although it's not a completely fair comparison, since SIGFOX operates in unlicensed spectrum and Weightless had to deal with complexities of TV whitespaces, but I digress).

CC1101 transceiver on SNE-ISMTV-868

The CC-series of transceivers from Texas Instruments (like CC1101 and CC1120) has a lot of software-configurable modulation blocks built-in, but a BPSK modulator is not among them. However, you can find some references to ultra-narrowband being implemented with these chips which suggests that people are using them for this purpose. The C-UNB proposal also mentions that it can be easily implemented with modified FSK modulation, but doesn't go into more detail. I wanted to implement ultra-narrowband on CC1101 for a project we're doing at the Institute, so I looked into this possibility.

As any introductory course in telecommunications is quick to point out, frequency and phase modulation are basically the same thing. If you take a frequency modulator and feed it a time-derivative of a signal the result is identical to a phase modulator fed with the unmodified signal. In practice however it's not that simple. BPSK requires that the phase changes ±180° for each symbol change. The frequency-shift keying block in CC chips does not have a well-defined relation between frequency deviation and symbol rate. This means that it's hard to define how much signal phase changes during each symbol.

CC1101 does have a minimum-shift keying mode. This is a special form of frequency modulation that has well-defined phase shifts between symbols. Wikipedia says that the carrier phase continuously shifts by ±90° each symbol period, which does not sound useful at first:

Minimum-shift keying illustration.

In this interpretation of phase shifts, the carrier frequency fc is in the middle between frequencies for the two symbols, f0 and f1. This is the usual interpretation for frequency modulation, where you have approximately equal numbers of both symbols in a typical transmission.

However, if you transmit mostly one symbol, say f0, the receiver will consider that to be the carrier f'c. In that case, each occurrence of symbol 1 rotates the phase of the signal compared to f'c by +180°. This is exactly what you need to implement BPSK.

Alternative interpretation of phase in MSK.

BPSK requires that phase shifts are fast compared to symbol rate, so you want to encode each BPSK symbol with many MSK symbols. Ultra-narrowband uses symbol rates on the order of 100 symbols/s while CC1101 supports up to around 1 Msymbol/s. This means that you could have fast phase changes, but 10 MSK symbols per each BPSK symbol seems to suffice.

In the end, bits encoded into MSK symbols look somewhat similar to the theoretical time-derivative I mentioned above. You have an impulse of a single f1 symbol each time you have a transition from bit 0 to 1 or vice versa:

Using multiple MSK symbols as one BPSK symbol.

So far, this has been all theoretic. How well does it work in practice? The most obvious problem is frequency stability. The local oscillator on CC1101 is designed to be re-calibrated often, but you cannot calibrate it while you are transmitting. With such low bitrates, packet transmissions last for several seconds. During that time the frequency can drift quite a lot, especially compared to the very limited bandwidth of these transmissions. This is the usual problem with narrowband transmissions and CC1101 has no mechanism for compensating for it on reception. That is why I doubt a CC1101-to-CC1101 link would work in this way and I haven't tried it.

Transmission from a CC1101 to a specialized receiver however seems to work quite nicely in practice. You just have to use a SDR with a wide-enough channel for reception and compensate for frequency drifts in software. I have some lab measurements to share, but those will have to wait for another post.

Posted by Tomaž | Categories: Digital | Comments »

On "ap_pass_brigade failed"

25.05.2016 20:37

Related to my recent rant regarding the broken Apache 2.4 in Debian Jessie, another curious thing was the appearance of the following in /var/log/apache2/error.log after the upgrade:

[fcgid:warn] [pid ...:tid ...] (32)Broken pipe: [client ...] mod_fcgid: ap_pass_brigade failed in handle_request_ipc function, referer: ...

Each such error is also related to a 500 Internal Server Error HTTP response logged in the access log.

There's a lot of misinformation floating about this on the web. Contrary to the popular opinion, this is not caused by wrong values of various Fcgid... options or the PHP_FCGI_MAX_REQUESTS variable. Actually, I don't know much about PHP (which seems to be the primary use case for FCGI), but I do know how to read the mod_fcgid source code and this error seems to have a very simple cause: clients that close the connection before waiting for the server to respond.

The error is generated on line 407 of fcgid_bridge.c (mod_fcgid 2.3.9):

/* Now pass any remaining response body data to output filters */
if ((rv = ap_pass_brigade(r->output_filters,
                          brigade_stdout)) != APR_SUCCESS) {
    if (!APR_STATUS_IS_ECONNABORTED(rv)) {
        ap_log_rerror(APLOG_MARK, APLOG_WARNING, rv, r,
                      "mod_fcgid: ap_pass_brigade failed in "
                      "handle_request_ipc function");
    }

    return HTTP_INTERNAL_SERVER_ERROR;
}

The comment at the top already suggests the cause of the error message: failure to send the response generated by the FCGI script. The condition is easy to reproduce with a short Python script that sends a request and immediately closes the socket:

import socket, ssl

HOST="..."
# path to some document generated by an FCGI script
PATH="..."

ctx = ssl.create_default_context()
conn = ctx.wrap_socket(socket.socket(socket.AF_INET), server_hostname=HOST)
conn.connect((HOST, 443))
conn.sendall("GET " + PATH + " HTTP/1.0\r\nHost: " + HOST + "\r\n\r\n")
conn.close()

Actually, you can do the same with a browser by mashing refresh and stop buttons. The success somewhat depends on how long the script takes to generate the response - for very fast scripts it's hard to tear down the connection fast enough.

Probably at some point ap_pass_brigade() returned ECONNABORTED when the client broke the connection, hence the if statement in the code above. It appears that now EPIPE is returned and mod_fcgid was not properly updated. I was testing this on apache2 2.4.10-10+deb8u4.

In any case, this error message is benign. Fiddling with the FcgidOutputBufferSize might cause the response to be sent out earlier and reduce the chance that this will be triggered by buggy crawlers and such, but in the end there is nothing you can do about it on the server side. The 500 response in the log is also clearly an artifact in this case, since it's the client that caused the error, not the server, and no error page was actually delivered.

Posted by Tomaž | Categories: Code | Comments »

Jessie upgrade woes

23.05.2016 19:59

Debian 8 (Jessie) was officially released a bit over a year ago. Previous January I mentioned I plan to upgrade my CubieTruck soon which in this case meant 16 months. Doesn't time fly when you're not upgrading software? In any case, here are some assorted notes regarding the upgrade from Debian Wheezy. Most of them are not CubieTruck specific, so I guess someone else might find them useful. Or entertaining.

Jessie armhf comes with kernel 3.16, which supports CubieTruck's Allwinner SoC and most of the peripherals I care about. However, it seems you can't use the built-in NAND flash for booting. It would be nice to get away from the sunxi 3.4 kernel and enjoy kernel updates through apt, but I don't want to get back to messing with SD cards. Daniel Andersen keeps the 3.4 branch reasonably up-to-date and Jessie doesn't seem to have problems with it, so I'll stick with that for the time being.

CubieTruck

Dreaded migration to systemd didn't cause any problems, apart from having to migrate a couple of custom init.d scripts. The most noticeable change is a significant increase in the number of mounted tmpfs filesystems, which makes df output somewhat unwieldy and, by consequence, Munin's disk usage graphs a mess.

SpeedyCGI was a way of making dynamic web pages back in the olden days. In the best Perl tradition it tweaked some low-level parts of the language in order to avoid restarting the interpreter for each HTTP request - like automagically persisting global state and making exit() not actually exit. From a standpoint of a lazy web developer it was an incredibly convenient way to increase performance of plain old CGI scripts. But alas, it remained unmaintained for many years and was finally removed in Jessie.

FCGI and Apache's mod_fcgi (not to be confused with mod_fastcgi, it's non-free and slightly more broken cousin) seemed like natural replacements. While FCGI makes persistence explicit, the programming model is more or less the same and hence the migration required only some minor changes to my scripts - and working around various cases of FCGI's brain damage. Like for instance intentional ignorance of Perl's built-in Unicode support. Or the fact that gracefully stopping worker processes is more or less unsupported. In fact, FCGI's process management seems to be broken on multiple levels, as mod_fcgi has problems maintaining a stand-by pool of workers.

Perl despair

In any case, the new Apache 2.4 is a barrel of fun by itself. It changes the syntax for access control in such a way that config files need to be updated manually. It now also ignores all config files if they don't end in .conf. Incidentally, Apache will serve files from /var/www/html if it has no VirtualHosts defined. This seems to be a hard-coded default, so you can't find why it's doing that by grepping through /etc/apache2.

The default config in Jessie frequently warns about deadlocks in various places:

(35)Resource deadlock avoided: [client ...] mod_fcgid: can't lock process table in pid ...
(35)Resource deadlock avoided: AH00273: apr_proc_mutex_lock failed. Attempting to shutdown process gracefully.
(35)Resource deadlock avoided: AH01948: Failed to acquire OCSP stapling lock

I'm currently using the following in apache2.conf, which so far seems to work around this problem:

# was: Mutex file:${APACHE_LOCK_DIR} default
Mutex sem default

Apache 2.4 in Jessie breaks HTTP ETag caching mechanism. If you're using mod_deflate (it's used by default to compress text-based content like HTML, CSS, RSS), browsers won't be getting 304 Not Modified responses, which means longer load times and higher bandwidth use. The workaround I'm using is the following in mods-available/deflate.conf (you need to also enable mod_headers):

Header edit "Etag" '^"(.*)-gzip"$' '"$1"'

This differs somewhat from the solution proposed in Apache's Bugzilla, but as far as I can see restores the old and tested behavior of Apache 2.2, even if it's not exactly up to HTTP specification.

I wonder whether this state of affairs means that everyone has moved on to nginx or these are just typical problems for a new major release. Anyway, to conclude on a more positive note, Apache now supports OCSP stapling, which is pretty simple to enable.

Finally, rsyslog is slightly broken in Jessie on headless machines that don't have an X server running. It spams the log with lines like:

rsyslogd-2007: action 'action 17' suspended, next retry is Sat May 21 18:12:53 2016 [try http://www.rsyslog.com/e/2007 ]

This can be worked around by commenting-out the following lines in rsyslog.conf:

#daemon.*;mail.*;\
#       news.err;\
#       *.=debug;*.=info;\
#       *.=notice;*.=warn       |/dev/xconsole
Posted by Tomaž | Categories: Code | Comments »

Materialized Munin display

15.05.2016 21:25

Speaking of Munin, here's a thing that I've made recently: A small stand-alone display that cycles through a set of measurements from a Munin installation.

Munin display

(Click to watch Munin display video)

Back when ESP8266 chip was the big new thing I ordered a bag of them from eBay. The said bag then proceeded to gather dust in the corner of my desk for a year or so, as such things unfortunately tend to do these days. I also had a really nice white transflective display left over from another project (suffice to say, it cost around 20 £ compared to ones you can get for a tenth of the price with free shipping on DealExtreme). So something like this looked like a natural thing to make.

The hardware is not worth wasting too many words on: an ESP8266 module handles radio and the networking part. The display is a 2-line LCD panel using the common 16-pin interface. An Arduino Pro Mini acts as glue between the display and the ESP8266. There are also 3.3 V (for ESP8266) and 5 V (for LCD and Arduino) power supplies and a transistor level shifter for the serial line between ESP8266 and the Arduino.

ESP8266 runs stock firmware that exposes a modem-like AT-command interface on a serial line. I could have omitted the Arduino and ran the whole thing from the ESP8266 alone, however the lack of GPIO lines on the module I was using meant that I would have to use some kind of GPIO extender or multiplexer to run the 16-pin LCD interface. Arduino with the WeeESP8266 library just seemed less of a hassle.

Top side of the circuit in the Munin display.

From the software side, the device basically acts as a dumb display. The ESP8266 listens on a TCP socket and Arduino pushes everything that is received on that socket to the LCD. All the complexity is hidden in a Python daemon that runs on my CubieTruck. The daemon uses PyMunin to periodically query Munin nodes, renders the strings to be displayed and sends them to the display.

Speaking of ESP8266, my main complaint would be that there is basically zero official documentation about it. Just getting it to boot means reconciling conflicting information from different blog and forum posts (for me, both CH_PD and RST/GPIO16 needed to be pulled low). No one mentioned that RX pin has an internal pull-up. I also way underestimated the current consumption (it says 1 mA stand-by on the datasheet after all and the radio is mostly doing nothing in my case). It turns out that a linear regulator is out of the question and a 3.3 V switch-mode power supply is a must.

My module came with firmware that was very unreliable. Getting official firmware updates from a sticky forum post felt kind of shady and it took some time to get an image that worked with 512 kB flash on my module. That said, the module has been working without resets or hangs for a couple of weeks now which is nice and not something that all similar radio modules are capable of.

Inside the Munin display.

Finally, this is also my first 3D printed project and I learned several important lessons. It's better to leave too much clearance than too little between parts that are supposed to fit together. This box took about four hours of careful sanding and cutting before the top part could be inserted into the bottom since the 3D printer randomly decided to make some walls 1 mm thicker than planned. Also, self-tapping screws and automagically hollowed-out plastic parts don't play nice together.

With all the careful measuring and planning required to come up with a CAD drawing, I'm not sure 3D printing saved me any time compared to a simple plywood box which I could make and fit on the fly. Also, relying on the flexibility and precision of a 3D print made me kind of forget about the mechanical design of the circuit. I'm not particularly proud of the way things fit together and how it looks inside, but most of it is hidden away from view anyway and I guess it works well enough for a quick one-off project.

Posted by Tomaž | Categories: Life | Comments »

Power supply voltage shifts

02.05.2016 20:16

I'm a pretty heavy Munin user. In recent years I've developed a habit of adding a graph or two (or ten) for every service that I maintain. I also tend to monitor as many aspects of computer hardware as I can conveniently write a plugin for. At the latest count, my Munin master tracks a bit over 600 variables (not including a separate instance that monitors 50-odd VESNA sensor nodes deployed by IJS).

Monitoring everything and keeping a long history allows you to notice subtle changes that would otherwise be easy to miss. One of the things that I found interesting is the long-term behavior of power supplies. Pretty much every computer these days comes with software-accessible voltmeters on various power supply rails, so this is easy to do (using lm-sensors, for instance).

Take for example voltage on the +5 V rail of an old 500 watt HKC USP5550 ATX power supply during the last months of its operation:

Voltage on ATX +5 V rail versus time.

From the start, this power supply seemed to have a slight downward trend of around -2 mV/month. Then for some reason the voltage jumped up for around 20 mV, was stable for a while and then sharply dropped and started drifting at around -20 mV/month. At that point I replaced it, fearing that it might soon endanger the machine it was powering.

The slow drift looks like aging of some sort - perhaps a voltage reference or a voltage divider before the error amplifier. Considering that it disappeared after the PSU was changed it seems that it was indeed caused by the PSU and not by a drifting ADC reference on the motherboard or some other artifact in the measurements. Abrupt shifts are harder to explain. As far as I can see, nothing important happened at those times. An application note from Linear mentions that leakage currents due to dirt and residues on the PCB can cause output voltage shifts.

It's also interesting that the +12 V rail on the same power supply showed a bit different pattern. The last voltage drop is not apparent there, so whatever caused the drop on the +5 V line seemed to have happened after the point where regulation circuit measures the voltage. The +12 V line isn't separately regulated in this device, so if the regulation circuit would be involved, some change should have been apparent on +12 V as well.

Perhaps it was just a bad solder joint somewhere down the line or oxidation building up on connectors. At 10 A, a 50 mV step only corresponds to around 5 mΩ change in resistance.

Voltage on ATX +12 V rail versus time.

This sort of voltage jumps seem to be quite common though. For instance, here is another one I recently recorded on a 5 V, 2.5 A external power supply that came with CubieTruck. Again, as far as I can tell, there were no external reasons (for instance, power supply current shows no similar change at that time).

Voltage on CubieTruck power supply versus time.

I have the offending HKC power supply opened up on my bench at the moment and nothing looks obviously out of place except copious amounts of dust. While it would be interesting to know what the exact reasons were behind these voltage changes, I don't think I'll bother looking any deeper into this.

Posted by Tomaž | Categories: Analog | Comments »

Measuring interrupt response times, part 2

27.04.2016 11:40

Last week I wrote about some typical interrupt response times you get from an Arduino and Raspberry Pi, if you follow basic examples from documentation or whatever comes up on Google. I got some quite unexpected results, like for instance a Python script that responds faster than a compiled C program. To check some of my guesses as to what caused those results, I did another set of measurements.

For Arduino, most response times were grouped around 9 microseconds, but there were a few outliers. I checked the Arduino library source and it indeed always enables AVR timer/counter0 overflow interrupt. If timer interrupt happens at the same time as the GPIO interrupt I was measuring, the GPIO interrupt can get delayed. Performing the measurement with the timer interrupt masked out indeed removes the outliers:

Effect of timer interrupt on Arduino response time.

With timer off, all measured response times are between 9.1986 to 8.9485 μs. This is a 0.2501 μs long interval. It fits perfectly with theory - at 16 MHz CPU clock and instruction length between 1 and 5 cycles, uncertainty for interrupt latency is 0.25 μs.

The second weird thing was the aforementioned discrepancy between Python and C on Raspberry Pi. The default Python library uses an ugly hack to bypass the kernel GPIO driver and control GPIO lines directly from user space: it mmaps a range of physical memory containing GPIO registers into its own process memory space using /dev/mem. This is similar to how X servers on Linux (used to?) access graphics hardware from user space. While this approach is very unportable, it's also much faster since you don't need to do context switches into kernel for every operation.

To check just how much faster mmap method is on Raspberry Pi, I copied the GPIO access code from the RPi.GPIO library into my test C program:

Response times using sysfs and mmap methods on Raspberry Pi.

As you can see, the native program is now faster than the interpreted Python script. This also demonstrates just how costly context switches are: the sysfs version is more than two times slower on average. It's also worth noting that both RPi.GPIO and my C program still use epoll() or select() on a sysfs file to wait for the interrupt. Just output pin change can be done with direct memory accesses.

Finally, Raspberry Pi was faster when the CPU was loaded which seemed counterintuitive. I tracked this down to automatic CPU frequency scaling. By default, Raspberry Pi Zero seems to be set to run between 700 MHz and 1000 MHz using ondemand governor. If I switch to performance governor, it keeps the CPU running at 1 GHz at all times. In that case, as expected, the CPU load increases the average response time:

Effect of cpufreq governor on Raspberry Pi response time.

It's interesting to note that Linux kernel comes with pluggable idle loop implementations (CONFIG_CPU_IDLE). The idle loop can be selected through /sys/devices/system/cpu/cpuidle in a similar way to the CPU frequency governor. The Raspbian Jessie release however has that disabled. It uses the default idle loop for ARMv6 processors. Assembly code has been patched though. The ARM Wait For Interrupt WFI instruction in the vanilla kernel has been replaced with some mcreq (write to coprocessor?) instructions. I can't find any info on the JIRA ticket referenced in the comment and the change has been added among other BCM-specific changes in a single 6400-line commit. Idle loop implementation is interesting because if it puts the CPU into a power saving mode, it can affect the interrupt latency as well.

As before, source code and raw data is on GitHub.

Posted by Tomaž | Categories: Digital | Comments »