Monitoring HomePlug AV devices

23.05.2018 18:51

Some time ago I wanted to monitor the performance of a network of Devolo dLAN devices. These are power-line Ethernet adapters. Each device looks like a power brick with a standard Ethernet RJ45 port. You can plug several of these into wall sockets around a building and theoretically they will together act as an Ethernet bridge, linking all their ports as if they were connected to a single network. The power-line network in question seemed to be having intermittent problems, but without some kind of a log it was hard to pin-point exactly what was the problem.

I have very little experience with power-line networks and some quick web searches yielded conflicting information about how these things work and what protocols are at work behind the curtains. Purely from the user perspective, the experience seems to be similar to wireless LANs. While individual devices have flashy numbers written on them, such as 500 Mbps, these are just theoretical "up to" throughputs. In practice, bandwidth of individual links in the network seems to be dynamically adjusted based on signal quality and is commonly quite a bit lower than advertised.

Devolo Cockpit screenshot.

Image by devolo.com

Devolo provides an application called Cockpit that allows you to configure the devices and visualize the power-line network. The feature I was most interested in was the real-time display of the physical layer bitrate for each individual link in the network. While the Cockpit is available for Linux, it is a user friendly point-and-click graphical application and chances were small that I would be able to integrate it into some kind of an automated monitoring process. The prospect of decoding the underlying protocol seemed easier. So I did a packet capture with Wireshark while the Cockpit was fetching the bitrate data:

HomePlug AV protocol decoded in Wireshark.

Wireshark immediately showed the captured packets as part of the HomePlug AV protocol and provided a nice decode. This finally gave me a good keyword I could base my further web searches on, which revealed a helpful white paper with some interesting background technical information. HomePlug AV physical layer apparently uses frequencies in the range of 2 - 28 MHz using OFDM with adaptive number of bits per modulation symbol. The network management is centralized, using a coordinator and a mix of CSMA/CA and TDMA access.

More importantly, the fact that Wireshark decode showed bitrate information in plain text gave me confidence that replicating the process of querying the network would be relatively straightforward. Note how the 113 Mbit/sec in the decode directly corresponds to hex 0x71 in raw packet contents. It appeared that only two packets were involved, a Network Info Request and a Network Info Confirm:

HomePlug AV Network Info Confirmation packet decode.

However before diving directly into writing code from scratch I came across the Faifa command-line tool on GitHub. The repository seems to be a source code dump from a now-defunct dev.open-plc.org web site. There is very little in terms of documentation or clues to its progeny. Last commit was in 2016. However a browse through its source code revealed that it is capable of sending the 0xa038 Network Info Request packet and receiving and decoding the corresponding 0xa039 Network Info Confirm reply. This was exactly what I was looking for.

Some tweaking and a compile later I was able to get the bitrate info from my terminal. Here I am querying one device in the power-line network (its Ethernet address is in the -a parameter). The queried device returns the current network coordinator and a list of stations it is currently connected to, together with the receive and transmit bitrates for each of those connections:

# faifa -i eth4 -a xx:xx:xx:xx:xx:xx -t a038
Faifa for HomePlug AV (GIT revision master-5563f5d)

Started receive thread
Frame: Network Info Request (Vendor-Specific) (0xA038)

Dump:
Frame: Network Info Confirm (Vendor-Specific) (A039), HomePlug-AV Version: 1.0
Network ID (NID): xx xx xx xx xx xx xx
Short Network ID (SNID): 0x05
STA TEI: 0x24
STA Role: Station
CCo MAC: 
	xx:xx:xx:xx:xx:xx
CCo TEI: 0xc2
Stations: 1
Station MAC       TEI  Bridge MAC        TX   RX  
----------------- ---- ----------------- ---- ----
xx:xx:xx:xx:xx:xx 0xc2 xx:xx:xx:xx:xx:xx 0x54 0x2b
Closing receive thread

The original tool had some annoying problems that I needed to work around before deploying it to my monitoring system. Most of all, it operated by sending the query with Ethernet broadcast address as the source. It then put the local network interface into promiscuous mode to listen for broadcasted replies. This seemed like bad practice and created problems for me, least of which was log spam with repeated kernel warnings about promiscuous mode enters and exits. It's possible that the use of broadcasts was a workaround for hardware limitation on some devices, but the devices I tested (dLAN 200 and dLAN 550) seem to reply just fine to queries from non-broadcast addresses.

I also fixed a race condition that was in the original tool due to the way it received the replies. If multiple queries were running on the same network simultaneously sometimes replies from different devices became confused. Finally, I fixed some rough corners regarding libpcap usage that prevented the multi-threaded Faifa process from exiting cleanly once a reply was received. I added a -t command-line option for sending and receiving a single packet.

As usual, the improved Faifa tool is available in my fork on GitHub:

$ git clone https://github.com/avian2/faifa.git

To conclude, here is an example of bitrate data I recorded using this approach. It shows transmit bitrates reported by one device in the network to two other devices (here numbered "station 1" and "station 2"). The data was recorded over the course of 9 days and the network was operating normally during this time:

Recorded PHY bitrate for two stations.

Even this graph shows some interesting things. Some devices (like the "station 1" here) seem to enter a power saving mode. Such devices don't appear in the network info reports, which is why data is missing for some periods of time. Even out of power saving mode, devices don't seem to update their reported bitrates if there is no data being transferred on that specific link. I think this is why the "station 2" here seems to have long periods where the reported bitrate remains constant.

Posted by Tomaž | Categories: Code | Comments »

Switching window scaling in GNOME

01.05.2018 13:22

A while back I got a new work laptop: a 13" Dell XPS 9360. I was pleasantly surprised that installing the latest Debian Stretch with GNOME went smoothly and no special tweaks were needed to get everything up and running. The laptop works great and the battery life in Linux is a significant step up from my old HP EliteBook. The only real problem I noticed after a few months of use is weird behavior of the headphone jack, which often doesn't work for some unknown reason.

In any case, this is my first computer with a high-DPI screen. The 13-inch LCD panel has a resolution of 3200x1800, which means that text on a normal X window screen is almost unreadable without some kind of scaling. Thankfully, GNOME that ships with Stretch has a relatively good support for window scaling. You can set a scaling factor in the GNOME Tweak Tool and all windows will have their content scaled by an integer factor, making text and icons intelligible again.

Window scaling setting in GNOME Tweak Tool.

This setting works fine for me, at least for terminal windows and Firefox, which is what I mostly use on this computer. I've only noticed some minor cosmetic issues when I change this at run-time. Some icons and buttons in GNOME Shell (like the bar on the top of the screen or the settings menu on the upper-right) will sometimes look weird until the next reboot.

A bigger annoyance was the fact that I often use this computer with a normal (non-high-DPI) external monitor. I had to open up the Tweak Tool each time I connected or disconnected the monitor. Navigating huge scaled UI on the low-DPI external monitor or tiny UI on the high-DPI laptop panel got old really quick. It was soon obvious that changing that setting should be a matter of a single key press.

Finding a way to set window scaling programmatically was surprisingly difficult (not unlike my older effort in switching audio output device) I tried a few different approaches, like setting some dconf keys, but none worked reliably. I ended up digging into the Tweak Tool source. This revealed that the Tweak Tool is built around a nice Python library that exposes the necessary settings as functions you can call from your own scripts. The rest was simple.

I ended up with the following Python script:

#!/usr/bin/python2.7

from gtweak.utils import XSettingsOverrides

def main():
        xsettings = XSettingsOverrides()

        sf = xsettings.get_window_scaling_factor()

	if sf == 1:
		sf = 2
	else:
		sf = 1

	xsettings.set_window_scaling_factor(sf)

if __name__ == "__main__":
	main()

I have this script saved as toggle_hidpi and then a shortcut set in GNOME Keyboard Settings so that Super-F11 runs it. Note that using the laptop's built-in keyboard this means pressing the Windows logo, Fn, and F11 keys due to the weird modern practice of hiding normal function keys behind the Fn modifier. On an external USB keyboard, only Windows logo and F11 need to be pressed.

High DPI toggle shortcut in GNOME keyboard settings.

Posted by Tomaž | Categories: Code | Comments »

OpenCT on Debian Stretch

24.02.2018 10:03

I don't like replacing old technology that isn't broken, although I wonder sometimes whether that's just rationalizing the fear of change. I'm still using a bunch of old Schlumberger Cryptoflex (e-Gate) USB tokens for securely storing client-side SSL certificates. All of them are held together by black electrical tape at this point, since the plastic became brittle with age. However they still serve their purpose reasonably well, even if software support for them has been obsoleted a long time ago. So what follows is another installment of the series on keeping these hardware tokens working on the latest Debian Stable release.

Stretch upgrades the pcscd and libpcsclite1 packages (from the pcsc-lite project) to version 1.8.20. Unfortunately, this upgrade breaks the old openct driver, which is to my knowledge the only way to use these tokens on a modern system. This manifests itself as the following error when dumping the list of currently connected smart cards:

$ pkcs15-tool -D
Using reader with a card: Axalto/Schlumberger/Gemalo egate token 00 00
PKCS#15 binding failed: Unsupported card

Some trial and error and git bisect led me to commit 8eb9ea1 which apparently caused this issue. It was committed between releases 1.8.13 (which was shipped in Jessie) and 1.8.14. This commit introduces some subtle changes in the way buffers of data are exchanged between pcscd and its drivers, which break openct 0.6.20.

There are two ways around that: you can keep using pcscd and libpcsclite1 from Jessie (the 1.8.13 source package from Jessie builds fine on Stretch), or you can patch openct. I've decided on the second option.

The openct driver is no longer developed upstream and has been removed from Debian in Jessie (last official release was in 2010, although there has been some effort to modernize it). I keep my own git repository and Debian packages based on the last package shipped in Wheezy. My patched version 0.6.20 includes changes required for systemd support, and now also the patch required to support modern pcscd version on Stretch. The latter has been helpfully pointed out to me by Ludovic Rousseau on the pcsc-lite mailing list.

My openct packages for Stretch on amd64 can be found here (version 0.6.20-1.2tomaz2). The updated source is also in a git repository (with a layout compatible with git-buildpackage), should you want to built it yourself:

$ git clone http://www.tablix.org/~avian/git/openct.git

Other smart card-related packages work for me as-shipped in Stretch (e.g. opensc and opensc-pkcs11 0.16.0-3). No changes were necessary in Firefox configuration for it to be able to pull client-side certificates from the hardware tokens. It is still required however, to insert the token only when no instances of Firefox are running.

Posted by Tomaž | Categories: Code | Comments »

News from the Z80 land

05.01.2018 19:30

Here are some overdue news regarding modern development tools for the vintage Z80 architecture. I've been wanting to give this a bit of exposure, but alas other things in life interfered. I'm happy that there is still this much interest in old computers and software archaeology, even if I wasn't able to dedicate much of my time to it in the last months.


Back in 2008, Stefano Bodrato added support for Galaksija to Z88DK, a software development toolkit for Z80-based computers. Z88DK consists of a C compiler and a standard C library (with functions like printf and scanf) that interfaces with Galaksija's built-in terminal emulation routines. His work was based on my annotated ROM disassembly and development tools. The Z88DK distribution also includes a few cross-platform examples that can be compiled for Galaksija, provided they fit into the limited amount of RAM. When compiling for Galaksija, the C compiler produces a WAV audio file that can be directly loaded over an audio connection, emulating an audio cassette recording.

This November, Stefano improved support for the Galaksija compile target. The target now more fully supports the graphics functions from Z88DK library, including things like putsprite for Galaksija's 2x3 pseudographics. He also added a joystick emulation to Z88DK cross-platform joystick library. The library emulates two joysticks via the keyboard. The first one uses arrow keys on Galaksija's keyboard. The second one uses the 5-6-7-8 key combination in the top row that should be familiar to the users of ZX Spectrum. He also added a clock function that uses the accurate clock ticks provided by the ROM's video interrupt.

Z88DK clock example on Galaksija

(Click to watch Z88DK clock example on Galaksija video)

This made it possible to build more games and demos for Galaksija. I've tried the two-player snakes game and the TV clock demo and (after some tweaking) they both worked on my Galaksija replica. To try them yourself, follow the Z88DK installation instructions and then compile the demos under examples/graphics by adapting the zcc command-lines at the top of the source files.

For instance, to compile clock.c into clock.wav I used the following:

$ zcc +gal -create-app -llib3d -o clock clock.c

The second piece of Z80-related news I wanted to share relates to the z80dasm 1.1.5 release I made back in August. z80dasm is a disassembler for the Z80 machine code. The latest release significantly improves handling of undocumented instructions that was reported to me by Ast Moore. I've already written in more detail about that in a previous blog post.

Back in November I've also updated Debian packaging for z80dasm. So the new release is now also available as a Debian source or binary package that should cleanly build and install on Jessie, Stretch and Sid. Unfortunately the updated package has not yet been uploaded to the Debian archive. I have been unable to reach any of my previous sponsors (in the Debian project, a package upload must be sponsored by a Debian developer). If you're a Debian developer with an interest in vintage computing and would like to sponsor this upload, please contact me.

Update: z80dasm 1.1.5-1 package is now in Debian Unstable.

Posted by Tomaž | Categories: Code | Comments »

On piping Curl to apt-key

21.08.2017 16:52

Piping Curl to Bash is dangerous. Hopefully, this is well known at this point and many projects no longer advertise this installation method. Several popular posts by security researchers demonstrated all kinds of covert ways such instructions could be subverted by an attacker who managed to gain access to the server hosting the install script. Passing HTTP responses uninspected to other sensitive system commands is no safer, however.

This is how Docker documentation currently advertises adding their GPG key to the trusted keyring for Debian's APT:

Screenshot of instructions for adding Docker GPG key.

The APT keyring is used to authenticate packages that are installed by Debian's package manager. If an attacker can put their public key into this keyring they can for example provide compromised package updates to your system. Obviously you want to be extra sure no keys from questionable origins get added.

Docker documentation correctly prompts you to verify that the added key matches the correct fingerprint (in contrast to some other projects). However, the suggested method is flawed since the fingerprint is only checked after an arbitrary number of keys has already been imported. Even more, the command shown specifically checks only the fingerprint of the Docker signing key, not of the key (or keys) that were imported.

Consider the case where download.docker.com starts serving an evil key file that contains both the official Docker signing key and the attacker's own key. In this case, the instructions above will not reveal anything suspicious. The displayed fingerprint matches the one shown in the documentation:

$ curl -fsSL https://www.tablix.org/~avian/docker/evil_gpg | sudo apt-key add -
OK
$ sudo apt-key fingerprint 0EBFCD88
pub   4096R/0EBFCD88 2017-02-22
      Key fingerprint = 9DC8 5822 9FC7 DD38 854A  E2D8 8D81 803C 0EBF CD88
uid                  Docker Release (CE deb) <docker@docker.com>
sub   4096R/F273FCD8 2017-02-22

However, importing our key file also silently added another key to the keyring. Considering that apt-key list outputs several screen-fulls of text on a typical system, this is not easy to spot after the fact without specifically looking for it:

$ sudo apt-key list|grep -C1 evil
pub   1024R/6537017F 2017-08-21
uid                  Dr. Evil <evil@example.com>
sub   1024R/F2F8E843 2017-08-21

The solution is obviously not to pipe the key file directly into apt-key without inspecting it first. Interestingly, this is not straightforward. pgpdump tool is recommended by the first answer that comes up when asking Google about inspecting PGP key files. The version in Debian Jessie fails to find anything suspicious about our evil file:

$ pgpdump -v
pgpdump version 0.28, Copyright (C) 1998-2013 Kazu Yamamoto
$ curl -so evil_gpg https://www.tablix.org/~avian/docker/evil_gpg
$ pgpdump evil_gpg|grep -i evil

Debian developers suggest importing the key into a personal GnuPG keyring, inspecting the integrity and then exporting it into apt-key. That is more of a hassle, and not that useful for Docker's key that doesn't use the web of trust. In our case, inspecting the file with GnuPG directly is enough to show that it in fact contains two keys with different fingerprints:

$ gpg --version|head -n1
gpg (GnuPG) 1.4.18
$ gpg --with-fingerprint evil_gpg
pub  4096R/0EBFCD88 2017-02-22 Docker Release (CE deb) <docker@docker.com>
      Key fingerprint = 9DC8 5822 9FC7 DD38 854A  E2D8 8D81 803C 0EBF CD88
sub  4096R/F273FCD8 2017-02-22
pub  1024R/6537017F 2017-08-21 Dr. Evil <evil@example.com>
      Key fingerprint = 3097 2749 79E6 C35F A009  E25E 6CD6 1308 6537 017F
sub  1024R/F2F8E843 2017-08-21

The key file used in the example above was created by simply concatenating the two ASCII-armored public key blocks. It looks pretty suspicious in a text editor because it contains two ASCII headers (which is probably why pgpdump stops processing it before the end). However, two public key blocks could easily have also been exported into a single ASCII-armor block.

Posted by Tomaž | Categories: Code | Comments »

z80dasm 1.1.5

06.08.2017 12:26

This is becoming a summer of retro computing blog posts, isn't it? In any case, today I'm pleased to announce the release of z80dasm 1.1.5, the latest version of my disassembler for the Z80 CPU machine code.

Ast Moore recently noticed that z80dasm doesn't recognize several well known undocumented instructions. When I made z80dasm on the base of Jan Panteltje's dz80 several years ago my goal was to make a disassembler that would correctly produce an assembly listing for Galaksija's ROM. While I did add support for the sli instruction back then, it seems that Galaksija didn't use any of the other undocumented instructions.

z80dasm 1.1.4 did correctly include unknown instructions in the disassembly as raw hex values with defb directives, however it guessed the length of some of them wrong, which desynced the instruction decoder and produced garbage output. The output would still assemble back into the same binary, but the point of a disassembler is to produce human-readable output, so that was not very useful.

Fixing this problem was a bit more involved than I expected, but not for the usual reasons. One of the guidelines I had with z80dasm was to keep z80dasm and z80asm a matched pair. Being able to reproduce back the original binary from disassembled source is very useful when hacking on old software as well as in testing z80dasm. Unfortunately, it turned out that z80asm is quite bad at supporting these undocumented instructions as well. For instance, inc and dec with ixl and ixh register operands are not recognized at all. Some other instructions, like add with these undocumented operands produce wrong opcodes.

I guess this is not surprising. In contrast to the official instruction set there is no authoritative documentation for these instructions, so even the names differ in different sources. For example, both sli a and sll a mnemonics are commonly used for the cb 37 opcode.

I tried to fix some of the problems with z80asm I found, but it turned out to be quite time consuming. Both z80asm and z80dasm are written in, by today's standards, quite an archaic C style, with lots of pointer arithmetic and packing various unrelated things into a single int variable to save space. While I'm still fairly familiar with z80dasm code, the modifications I did to z80asm took a several sheets of paper to figure out.

$ hd
00000000  ed 70 dd 2c dd 23 23 3d  ed 71 dd 09              |.p.,.##=.q..|
0000000c
$ z80dasm example.bin
; z80dasm 1.1.5
; command line: z80dasm example.bin

	org	00100h

	defb 0edh,070h	;in f,(c)
	defb 0ddh,02ch	;inc ixl
	inc ix
	inc hl	
	dec a	
	defb 0edh,071h	;out (c),0
	add ix,bc

In the end, I gave up and solved the problem of z80asm compatibility by leaving z80dasm to decode undocumented instructions as defb directives. However, the instruction lengths should now be decoded correctly and also the mnemonic is included in a comment for better readability. There is now also a new --undoc option in 1.1.5. If it's specified on the command-line, z80dasm will place all instructions directly into the disassembly.

As before, you can get the source code for z80dasm in a release tarball or from the git repository. See the README file for install instructions and the included man page for explanations of all command-line options. z80dasm is also included in Debian, however 1.1.5 has not been uploaded yet.

Posted by Tomaž | Categories: Code | Comments »

IPv6 problems on TP-Link Archer C20

26.04.2017 12:44

Recently I replaced an old 2.4 GHz-only Linksys WRT54GL wireless router with a shiny new, dual band TP-Link Archer C20 (hardware version V1). Unfortunately, the new router brought some unusual problems. It turns out some devices are now unable to get a global IPv6 address when connected over Wi-Fi. For example, my Android 5.1 smartphone and my work laptop with Debian Jessie and Network Manager don't get IPv6 connectivity. They worked just fine when connected through the old router. At the same time, a different phone with Android 6.0 seems to have no problems with the new Archer C20 router.

First a brief note on the network setup: Archer C20 is used here as a wireless access point only. Some other host on the network acts as a gateway to the Internet. That host also provides a DHCP service for IPv4 and runs the route advertisement daemon (radvd) for IPv6 SLAAC. The setup has been quite well tested and works flawlessly on the wired Ethernet. The old WRT54GL has also been used in this way, which is why IPv6 connectivity on the Wi-Fi worked fine even though the old router's firmware had no explicit IPv6 support.

As the TP-Link FAQ entry explains, the WAN port on the C20 is unused, the network is connected to one of the LAN ports and the DHCP server on the C20 is disabled. IPv6 status tab in the configuration interface shows the following:

IPv6 status tab in Archer C20 configuration interface.

The IPv6 problem is somewhat frustrating to diagnose, since it only appears some time after the router has been restarted. For instance, I've usually seen that IPv6 stops working the next day after a reboot. Similarly, changing some unrelated settings, like the wireless SSID, also appears to temporarily fix the issue, only for it to reappear after a while.

Searching the web I can find some discussions about similar problems with TP-Link routers, with no clear conclusion. The firmware changelog does say in a vague way that the latest version fixes an IPv6 problem. However, I've tried the V1_160427 and V1_151120 firmwares and they both behave in the same (broken) way.

Modifications in Archer C20 firmware.

After much head scratching I found out that the base cause why my laptop does not get an IPv6 address over Wi-Fi is IPv6 duplicate address detection. This is apparent from the dadfailed flag on the link local address:

$ ip addr show dev wlan0
3: wlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether a0:88:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff
    inet 192.168.xx.xx/24 brd 192.168.xx.255 scope global dynamic wlan0
       valid_lft 484sec preferred_lft 484sec
    inet6 fe80::a288:xxxx:xxxx:xxxx/64 scope link tentative dadfailed
       valid_lft forever preferred_lft forever

Also, when you know what to look for, this error appears in the logs:

$ dmesg|grep duplicate
IPv6: wlan0: IPv6 duplicate address fe80::a288:xxxx:xxxx:xxxx detected!

So it seems that my laptop thinks that there is another device on the network with the same link-local (and hence Ethernet MAC) address. This is of course not true. In fact, if I disable the duplicate address detection, IPv6 starts working properly:

# sysctl net.ipv6.conf.wlan0.accept_dad=0

Investigating things a bit further, Wireshark shows the following curious packet capture immediately after the laptop connects to the wireless network:

Packet capture during IPv6 neighbor solicitation.

This appears to be the progress of a normal attempt at a IPv6 autoconfiguration from the laptop's side. The laptop (with the MAC address a0:88:...) sends some packets to a IPv6 multicast address (33:33:...). However, all these packets seem to be immediately reflected back to the laptop by Archer C20. The incoming packets highlighted in yellow are byte-by-byte identical to the preceding outgoing packets, with only the destination address in the Ethernet header changed from the multicast to the laptop's MAC address. These incoming packets are not present when the laptop is connected to one of the wired LAN ports, or when using the old wireless router.

These reflected packets trigger the duplicate address detection in the laptop's network stack and the autoconfiguration is interrupted. It seems that at some point in the Archer C20 uptime, IPv6 multicast groups stop working correctly. In fact I don't understand why it even tries to do anything special with those. WRT54GL had no concept of IPv6 and it worked fine. I've experimented with other options that looked related to multicast (like IGMP settings), but with no success. So unfortunately at the moment I don't have a good network-side solution. Any suggestions would be most welcome. Changing the stock firmware might work, but support in OpenWRT for this hardware currently seems experimental at best.

The device-side work-around is to disable DAD like I show above, but this is somewhat ugly. There might be a way to disable it on a per-network basis with Network Manager (see dad-timeout), but I have not tried this yet. It is also still not clear why some devices appear to work. It might be that Android simply disabled duplicate address detection in 6.0.

Update: This Mikrotik manual suggests that such repeated multicast packets are expected on wireless access points that are aware of multicast addresses. So it still might be that the problem is somewhere on the client side. I did find out that enabling Client Isolation on the Archer C20 fixes this problem (with the obvious side effect that wireless clients can no longer talk to each other). dad-timeout NetworkManager option is not supported on Debian Jessie.

Update: Client Isolation doesn't actually help. After a few days I'm again getting the dadfailed flag.

Posted by Tomaž | Categories: Code | Comments »

GIMP onion layers plug-in

21.02.2017 20:48

Some time ago I was playing with animation making applications on a (non-pro) iPad. I found the whole ecosystem very closed and I had to jump through some hoops to get my drawings back onto a Linux computer. However the fact that you can draw directly on the screen does make some things easier compared to a standalone Wacom tablet, even if the accuracy is significantly worse.

One other thing in particular stood out compared to my old GIMP setup. These applications make it very easy to jump frame by frame through the animation. In one touch you can display the next frame and do some quick edits and then move back with another touch. You can browse up and down the stack as a quick way to preview the animation. They also do something they call onion layering which simply means that they overlay the next and previous frames with reduced opacity so that it's easier to see how things are moving around.

This is all obviously useful. I was doing similar things in GIMP, except that changing frames there took some more effort. GIMP as such doesn't have a concept of frames. Instead you use image layers (or layer groups) as frames. You have to click to select a layer and then a few more clicks to adjust the visibility and opacity for neighboring layers if you want to have the onion layer effect. This quickly amounts to a lot of clicking around if you work on more than a handful of frames.

GIMP does offer a Python plug-in interface however, so automating quick frame jumps is relatively simple. Relatively, because GIMP Python Documentation turns out to be somewhat rudimentary if you're not already familiar with GIMP internals. I found it best to learn from the Python-Fu samples and explore the interface using the built-in interactive console.

Screenshot of the GIMP onion layers plug-in

The end result of this exercise was the GIMP onion layers plug-in, which you can now find on GitHub together with installation and usage instructions. The plug-in doesn't have much in terms of an user interface - it merely registers a handful of python-fu-onion- actions for stepping to previous or next frame, with or without the onion layer effect. The idea is that you then assign keyboard (or tablet button) shortcuts to these actions. You will have to define the shortcuts yourself though, since the plug-in can't define them for you. I like to use dot and comma keys since they don't conflict with other GIMP shortcuts and match the typical frame step buttons on video players.

If you follow the layer structure suggested by the Export layers plug-in, this all works quite nicely, including handling of background layers. The only real problem I encountered was the fact that the layer visibility and opacity operations clutter the undo history. Unfortunately, that seems to be the limitation of the plug-in API. Other plug-ins work around this by doing operations on a duplicate of the image, but obviously I can't do that here.

I should note that I was using GIMP 2.8.14 from Debian Jessie, so the code might be somewhat outdated compared to latest GIMP 2.8.20. Feedback in that regard is welcome, as always.

Posted by Tomaž | Categories: Code | Comments »

Python applications in a .zip

08.02.2017 10:20

Poking around youtube-dl I found this interesting recipe on how to package a self-contained Python application. Youtube-dl ships as a single executable file you can run immediately, or put somewhere into your PATH. This makes it very convenient to use even when you don't want to do the usual pip install dance. Of course, it comes at the cost of not resolving any dependencies for you.

I was expecting the file to be a huge, monolithic Python source file, but in fact it's a ZIP with a prepended hash-bang and nicely structured Python package inside. Simplified a bit, here is the gist of the Makefile part that builds it:

hello:
	cd src && zip ../hello hello/*.py __main__.py
	echo '#!/usr/bin/python' > hello
	cat hello.zip >> hello
	rm hello.zip
	chmod a+x hello

Now, if src/__main__.py contains:

import hello
hello.greet()

And src/hello/__init__.py contains:

def greet():
	print("Hello, World!")

Building the executable and running hello from the command-line should result in the standard greeting:

$ make
cd src && zip ../hello hello/*.py __main__.py
  adding: hello/__init__.py (stored 0%)
  adding: __main__.py (stored 0%)
echo '#!/usr/bin/python' > hello
cat hello.zip >> hello
rm hello.zip
chmod a+x hello
$ ./hello
Hello, World!

How does this work? Apparently it's quite an old trick with some added refinement. Already since version 2.3 Python knows how to import modules directly from ZIP files in the same way as from the usual directories. Python also allows executing modules from the command-line.

It sounds very much like Java JARs, doesn't it? The only missing part is the #!... line that makes the Linux kernel use the Python interpreter when executing the file. Since ZIP format ignores any junk that precedes the compressed data, the line can simply be prepended as if the whole file was a simple Bash script.

Posted by Tomaž | Categories: Code | Comments »

Moving Dovecot indexes and control files

23.12.2016 22:06

Dovecot IMAP server can use a standard Maildir for storage of messages inside user's home directories. The default in that case is to store search indexes and control files in the same directory structure, alongside mail files. That can be convenient, since no special setup is needed and everything is stored in the same place.

However, this doesn't work very well if you have disk quotas enabled on the filesystem that stores Maildirs. In case a user reaches their quota, Dovecot will not be able to write to its own files, which can lead to problems. Hence, documentation recommends that you configure a separate location for Dovecot's files in that case. This is done with INDEX and CONTROL options to the mail_location specification.

For example, after setting up appropriate directory structure and permissions under /var/lib/dovecot:

mail_location = maildir:%h/Maildir:INDEX=/var/lib/dovecot/index/%u:CONTROL=/var/lib/dovecot/control/%u

You can just set this up and leave old index and control files in place. In that case, Dovecot will automatically regenerate them. However, this is not ideal. It can take significant time to regenerate indexes if you have a lot of mail. You also lose some IMAP-related metadata, like message flags and unique IDs, which will confuse IMAP clients. It would be better to move existing files to the new location, however the documentation doesn't say how to do that.

I found that the following script works with Dovecot 2.2.13 on Debian Jessie. As always, be careful when dealing with other people's mail and double check that the script does what you want. I had my share of problems when coming up with this. Make backups.

#!/bin/bash

set -ue

# Run as "migrate_dovecot.sh user path-to-users-maildir" for each user.

# Make sure that Dovecot isn't running or that this specific IMAP user isn't
# connected (and can't connect) while this script runs!
USERNAME=$1
MAILDIR=$2

DOVECOTDIR=/var/lib/dovecot

# Correct after double-checking that this script does what you want.
MV="echo mv -i"

cd $MAILDIR

# Index files like dovecot.index, dovecot.index.cache, etc. go under the 
# INDEX directory. The directory structure should be preserved. For example,
# ~/Maildir/.Foo/dovecot.index should go to index/.Foo/dovecot.index.

# Exception are index files in the root of Maildir. Those should go under 
# .INBOX
b=$DOVECOTDIR/index/$USERNAME/.INBOX
mkdir -p "$b"
$MV *index* "$b"

find . -name "*index*"|while read a; do
	b=$DOVECOTDIR/index/$USERNAME/`dirname "$a"`
	mkdir -p "$b"
	$MV "$a" "$b"
done

# dovecot-uidlist and dovecot-keywords files should go under CONTROL, in a
# similar way to indexes. There is the same exception for .INBOX.
b=$DOVECOTDIR/control/$USERNAME/.INBOX
mkdir -p "$b"
$MV dovecot-uidlist dovecot-keywords "$b"

find . -name "*dovecot*"|while read a; do
	b=$DOVECOTDIR/control/$USERNAME/`dirname "$a"`
	mkdir -p "$b"
	$MV "$a" "$b"
done

# subscriptions file should go to the root of the control directory.

# Note that commands above also move some dovecot-* files into the root of
# the control directory. This seems to be fine.
$MV "subscriptions" "$DOVECOTDIR/control/$USERNAME"
Posted by Tomaž | Categories: Code | Comments »

Blacklisting users for inbound mail in Exim

18.09.2016 12:00

You can prevent existing local users from receiving mail by redirecting them to :fail: in /etc/aliases. For example, to make SMTP delivery to list@... fail with 550 Unrouteable address:

list: :fail:Unrouteable address

See special items section in the Redirect router documentation.

By default, Exim in Debian will attempt to deliver mail for all user accounts, even non-human system users. System users (like list above) typically don't have a traditional home directory set in /etc/passwd. This means that mail for them will get stuck in queue as Exim tries and fails to write to their mailbox. Because spam also gets sent to such addresses, mail queue will grow and various things will start to complain. Traditionally, mail for system accounts is redirected to root in /etc/aliases, but some accounts just receive a ton of spam and it's better to simply reject mail sent to them.

Another thing worth pointing out is the Handling incoming mail for local accounts with low UID section in README.Debian.gz in case you want to reject mail sent to all system accounts.

This took way too much time to figure out. There's a ton of guides on how to blacklist individual users for outgoing mail, but none I could find for cases like this. I was half-way into writing a custom router before I stumbled upon this feature.

Posted by Tomaž | Categories: Code | Comments »

Script for setting up multiple terminals.

16.09.2016 19:04

Sometimes I'm working on software that requires running a lot of different inter-dependent processes (are microservices still a thing?). Using systemd or some other init system for starting up such systems is fine for production. While debugging something on my laptop however it's useful to have each process running in its own X terminal. This allows me to inspect any debug output and to occasionally restart something. I used to have scripts that would run commands in individual GNU screen sessions, but that had a number of annoying problems.

I've recently came up with the following:

#!/bin/bash

set -ue

if [ "$#" -ne 1 ]; then
	echo "USAGE: $0 path_to_file"
	echo "File contains one command per line to be started in a terminal window."
	exit 1
fi

cat "$1" | while read CMD; do
	if [ -z "$CMD" -o "${CMD:0:1}" = "#" ]; then
		continue
	fi

	RCFILE=`tempfile`

	cat > "$RCFILE" <<END
source ~/.bashrc
history -s $CMD
echo $CMD
$CMD
END

	gnome-terminal -e "/bin/bash --rcfile \"$RCFILE\""
	rm "$RCFILE"
done

This script reads a file that contains one command per line. Empty lines and lines starting with a hash sign are ignored. For each line it opens a new gnome-terminal (adjust as needed - most terminal emulators support the -e argument) and runs the command in a way that:

  • The terminal doesn't immediately close after the command exits. Instead it drops back to bash. This allows you to inspect any output that got printed right before the process died.
  • The command is printed on top of the terminal before the command runs. This allows you to identify the terminal running a particular process in case that is not obvious from the command output. For some reason, gnome-terminal's --title doesn't work.
  • The command is appended on top of bash' history list. This allows you to easily restart the process that died (or you killed with Ctrl-C) by simply pressing the up cursor and enter keys.
Posted by Tomaž | Categories: Code | Comments »

The NumPy min and max trap

30.08.2016 20:21

There's an interesting trap that I managed to fall into a few times when doing calculations with Python. NumPy provides several functions with the same name as functions built into Python. These replacements typically provide a better integration with array types. Among them are the min() and max(). In vast majority of cases, NumPy versions are a drop-in replacement for built-ins. In a few, however, they can cause some very hard-to-spot bugs. Consider the following:

import numpy as np

print(max(-1, 0))
print(np.max(-1, 0))

This prints (at least in NumPy 1.11.1 and earlier):

0
-1

Where is the catch? The built-in max() can be used in two distinct ways: you can either pass it an iterable as the single argument (in which case the largest element of the iterable will be returned), or you can pass multiple arguments (in which case the largest argument will be returned). In NumPy, max() is an alias for amax() and that only supports the former convention. The second argument in the example above is interpreted as array axis along which to perform the maximum. It appears that NumPy thinks axis zero is a reasonable choice for a zero-dimensional input and doesn't complain.

Yes, recent versions of NumPy will complain if you have anything else than 0 or -1 in the axis argument. Having max(x, 0) in code is not that unusual though. I use it a lot as a shorthand when I need to clip negative values to 0. When moving code around between scripts that use NumPy, those that don't and IPython Notebooks (which do "from numpy import *" by default), its easy to mess things up.

I guess both sides are to blame here. I find that flexible functions that interpret arguments in multiple ways are usually bad practice and I try to leave them out of interfaces I design. Yes, they are convenient, but they also often lead to bugs. On the other hand, I would also expect NumPy to complain about the nonsensical axis argument. Axis -1 makes sense for a zero-dimensional input, axis 0 doesn't. The alias from max() to amax() is dangerous (and as far as I can see undocumented). A possible way to prevent such mistakes would be to support only the named version of the axis argument.

Posted by Tomaž | Categories: Code | Comments »

Linux loader for DOS-like .com files

25.08.2016 18:09

Back in the olden days of DOS, there was a thing called a .com executable. They were obsolete even then, being basically inherited from CP/M and replaced by DOS MZ executables. Compared to modern binary formats like ELF, .com files were exceedingly simple. There was no file header, no metadata, no division between code and data sections. The operating system would load the entire file into a 16-bit memory segment at offset 0x100, set the stack and the program segment prefix and jump to the first instruction. It was more of a convention than a file format.

While this simplicity created many limitations, it also gave rise to many fun little hacks. You could write a binary executable directly in a text editor. The absence of headers meant that you could make extremely small programs. A minimal executable in Linux these days is somewhere in the vicinity of 10 kB. With some clever hacking you might get it down to a hundred bytes or so. A small .com executable can be on the order of bytes.

A comment on Hacker News recently caught my attention. One could write a .com loader for Linux pretty easily. How easily would that be? I had to try it out.

/* Map address space from 0x00000 to 0x10000. */

void *p = mmap(	(void*)0x00000, 0x10000, 
		PROT_EXEC|PROT_READ|PROT_WRITE,
		MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS,
		-1, 0);

/* Load file at address 0x100. */

(a boring loop here reading from a file to memory.)

/* Set stack pointer to end of allocated area, jump to 0x100. */

asm(
	"mov    $0x10000, %rsp\n"
	"jmp    0x100\n"
);

This is the gist of it for Linux on x86_64. First we have to map 64 kB of memory at the bottom of our virtual address space. This is where NULL pointers and other such beasts roam, so it's unallocated by default on Linux. In fact, as a security feature the kernel will actively prevent such calls to mmap(). We have to disable this protection first with:

$ sysctl vm.mmap_min_addr=0

Memory mappings have to be aligned with page boundaries, so we can't simply map the file at address 0x100. To work around this we use an anonymous map starting at address 0 and then fill it in manually with the contents of the .com file at the correct offset. Since we have the whole 64 bit linear address space to play with, choosing 0x100 is a bit silly (code section usually lives somewhere around 0x400000 on Linux), but then again, so is this entire exercise.

Once we have everything set up, we set the stack pointer to the top of our 64 kB and jump to the code using some in-line assembly. If we were being pedantic, we could set up the PSP as well. Most of it doesn't make much sense on Linux though (except for command-line parameters maybe). I didn't bother.

Now that we have the loader ready, how do we create a .com binary that will work with it? We have to turn to assembly:

    call   greet
    mov    rax, 60
    mov    rdi, 0
    syscall

# A function call, just to show off our working stack.
greet:
    mov     rax, 1
    mov     rdi, 1
    mov     rsi, msg
    mov     rdx, 14
    syscall
    ret

    msg db      "Hello, World!", 10

This will compile nicely into an x86_64 ELF object file with NASM. We then have to link it into an ELF executable using a custom linker script that tells the linker that all sections will be placed in a chunk of memory starting at 0x100:

MEMORY
{
	RAM (rwx) : ORIGIN = 0x0100, LENGTH = 0xff00
}

Linker will create an executable which contains our code with all the proper offsets, but still has all the ELF cruft around it (it will segfault nicely if you try to run it with kernel's default ELF loader). As the final step, we must dump its contents into a bare binary file using objcopy:

$ objcopy -S --output-target=binary hello.elf hello.com

Finally, we can run our .com file with our loader:

$ ./loader hello.com
Hello, World!

As an extra convenience, we can register our new loader with the kernel, so that it will be invoked each time you try to execute a file with the .com extension (update-binfmts is part of the binfmt-support package on Debian):

$ update-binfmts --install com /path/to/loader --extension com
$ ./hello.com
Hello, World!

And there you have it, a nice 59 byte "Hello, World" binary for Linux. If you want to play with it yourself, see the complete example on GitHub that has a Makefile for your convenience. If you make something small and fun with it, please drop me a note.

One more thing. In case it's not clear at this point, this loader will not work with DOS executables (or CP/M in that case). Those expect to be run in 16-bit real mode and rely on DOS services. Needless to say, my loader makes no attempts to provide those. Code will run in the same environment as other Linux user space processes (albeit at a weird address) and must use the usual kernel syscalls. If you want to run old DOS stuff, use DOSBox or something similar.

Posted by Tomaž | Categories: Code | Comments »

Recent free software work

30.07.2016 9:25

I've done a bit of open source janitorial work recently. Here is a short recap.


jsonmerge is a Python module for merging a series of JSON documents using arbitrarily complicated rules inspired by JSON schema. I have developed the gist of it with Sarah Bird during EuroPython 2014. Since then I've been maintaining it, but not really doing any further development. 1.2.1 release fixes a bug in internal handling of JSON references, reported by chseeling. The bug caused a RefResolutionError to be raised when merging properties with slash or tilde characters in them.

I believe jsonmerge is usable in its current form. The only larger problem that I know of is the fact that automatic schema generation for merged documents would need to be rethought and probably refactored. This would address incompatibility with jsonschema 2.5.0 and improve handling of some edge cases. get_schema() seems to be rarely used however. I don't have any plans to work on this issue at the moment as I'm not using jsonmerge myself. I would be happy to look into any pull requests or work on it myself if anyone would offer a suitable bounty.


aspell-sl is the Slovenian dictionary for GNU Aspell. Its Debian package was recently orphaned. As far as I know, this is currently the only Slovenian dictionary included in Debian. I took over as the maintainer of the Debian package and fixed several long-standing packaging bugs to prevent it from disappearing from next Debian stable release. I haven't updated the dictionary however. The word list, while usable, remains as it was since the last update somewhere in 2002.

The situation with this dictionary seems complicated. The original word list appears to have been prepared in 2001 or 2002 by a diverse group of people from JSI, LUGOS, University of Ljubljana and private companies. I'm guessing they were funded by the now-defunct Ministry of Information Society which was financing localization of open source projects around that time. The upstream web page is long gone. In fact, aspell itself doesn't seem to be doing that well, although I'm still a regular user. The only free and up-to-date Slovenian dictionary I've found on-line was in the Slovenian Dictionary Pack for LibreOffice. It seems the word list from there would require relatively little work to be adapted for GNU Aspell (Hunspell dictionaries use very similar syntax). However, the upstream source of data in the pack is unclear to me and I hesitate to mess too much with things I know very little about.


z80dasm is a disassembler for the Zilog Z80 microprocessor. I forked the dz80 project by Jan Panteltje when it became apparent that no freely available disassembler was capable of correctly disassembling Galaksija's ROM. The 1.1.4 release adds options for better control of labels generated at the start and end of sections in the binaries. It also fixes a memory corruption bug that could sometimes lead to a wrong disassembly.

Actually, I committed these two changes to the public git repository three years ago. Unfortunately it seems that I have forgotten to package them into a new release at that time. Now I also took the opportunity to update and clean up the autotools setup. I'll work towards updating the z80dasm Debian package as well. z80dasm is pretty much feature complete at this point and except any further bug reports I don't plan any further development.

Posted by Tomaž | Categories: Code | Comments »