Specializing C++ templates on AVR pins

09.10.2019 19:56

AVR is a popular processor core for microcontrollers, probably most well known from the ATmega chips on the classic Arduino boards. There's an excellent, free software C and C++ compiler toolchain available for them, which also usually sits behind the well-known Arduino IDE. AVR is a Harvard architecture, which means that code and data memory (i.e. flash and RAM) have separate address spaces. On these small, 8-bit chips there is very little RAM space (can be as little as 128 bytes on ATtiny devices). On the other hand, the amount of flash is usually much less restrictive.

Sometimes you want the microcontroller to drive multiple instances of the same external peripheral. Let's say for example this peripheral uses one GPIO pin and two instances differ only in the pin they are connected to. This means that the code in the microcontroller for peripheral 1 differs from code for peripheral 2 only by the address of the special function register (SFR) it uses and a bit mask. A naive way of implementing this in clean C++ would be something like the following:

#include <avr/io.h>

class DIO {
	public:
		PORT_t * const port;
		const int pin_bm;

		DIO(PORT_t* port_, int pin_bm_) 
			: port(port_), pin_bm(pin_bm_) {}

		void setup(void) const
		{
			port->DIR |= pin_bm;
		}

		// other methods would go here ...
};

DIO dio1(&PORTB, PIN4_bm);
DIO dio2(&PORTB, PIN5_bm);

int main(void)
{
	dio1.setup();
	dio2.setup();

	// other code would go here ...

	while (1);
}

Note that I'm using the nice PORT_t struct and the bit mask constants provided by avr-gcc headers instead of messing around with register addresses and such.

Unfortunately, this isn't optimal. Even though the class members are declared as const, compiler stores them in RAM. This means that each instance of DIO class costs 4 precious bytes of RAM that, baring a stray cosmic particle, will never ever change during the course of the program.

Compiling the code above for the ATtiny416 with avr-gcc 5.4.0 confirms this:

Program Memory Usage  : 204 bytes   5,0 % Full
Data Memory Usage     : 8 bytes   3,1 % Full

We could store member variables as data in program space. This is possible by using some special instructions and variable attributes (PROGMEM, pgm_read_byte() etc.), but it's kind of ugly and doesn't play nicely with the C language. Another way is to actually have copies of the binary code for each peripheral, but only change the constants in each copy.

Obviously we don't want to have multiple copies in the source as well. In C the usual approach is to do something rude with C preprocessor and macros. Luckily, C++ offers a nicer way via templates. We can make a template class that takes non-type parameters and specialize that class on the register and bitmap it should use. Ideally, we would use something simple like this:

template <PORT_t* port, int pin_bm> class DIO {
	public:
		void setup(void) {
			port->DIR |= pin_bm;
		}
};

DIO<(&PORTB), PIN4_bm> dio1;
DIO<(&PORTB), PIN5_bm> dio1;

Sadly, this does not work. Behind the scenes, PORTB is a macro that casts a constant integer to the PORT_t struct via a pointer:

#define PORTB    (*(PORT_t *) 0x0620)

&PORTB evaluates back to an integral constant, however the compiler isn't smart enough to realize that and balks at seeing a structure and a pointer dereference:

main.cpp(8,7): error: a cast to a type other than an integral or enumeration type cannot appear in a constant-expression
DIO<(&PORTB), PIN4_bm> dio1;
main.cpp(8,7): error: '*' cannot appear in a constant-expression
main.cpp(8,7): error: '&' cannot appear in a constant-expression

I found a thread on the AVR freaks forum which talks about this exact problem. The discussion meanders around the usefulness of software abstractions and the ability of hardware engineers to follow orders, but fails to find a solution that does not involve a soup of C preprocessor macros.

Fortunately, there does seem to exist an elegant C++ solution. The compiler's restriction can be worked around without resorting to preprocessor magic and while still using the convenient constants defined in the avr-gcc headers. We only need to evaluate the macro in another expression and do the cast to PORT_t inside the template:

template <int port_addr, int pin_bm> class DIO {
	public:
		void setup(void) {
			port()->DIR |= pin_bm;

		}

		static PORT_t* port()
		{
			return (PORT_t*) port_addr;
		}
};

static const int PORTB_ADDR = (int) &PORTB; 
DIO<PORTB_ADDR, PIN4_bm> dio1;
DIO<PORTB_ADDR, PIN5_bm> dio2;

This compiles and works fine with avr-gcc. As intended, it uses zero bytes of RAM. Interestingly enough, it also significantly reduces the code size, something I did not expect:

Program Memory Usage  : 108 bytes   2,6 % Full
Data Memory Usage     : 0 bytes   0,0 % Full

I'm not sure why the first version using member variables produces so much more code. A brief glance at the disassembly suggests that the extra code comes from the constructor. I would guess that using a code-free constructor with initializer lists should generate minimal extra instructions. Apparently that is not so here. In any case, I'm not complaining. I'm also sure that in a non-toy example, the template version will use more flash space. After all, It has to duplicate the code for all methods.

You might also wonder how Arduino does this with the digitalWrite() function and friends. They store a number of data tables in program memory and then have preprocessor macros to read them out using pgm_read_byte(). As I mentioned above, this works, but isn't nicely structured and it's pretty far from clean, object-oriented C++ programming.

Posted by Tomaž | Categories: Code | Comments »

z80dasm 1.1.6

09.09.2019 20:32

z80dasm is a command-line disassembler for the Z80 CPU. I initially released it way back in 2007 when I was first exploring Galaksija's ROM and other disassemblers I could find didn't do a good enough job for me. Since then it accumulated a few more improvements and bug fixes as I received feedback and patches from various contributors.

Version 1.1.6 is a release that has been way overdue. Most importantly, it fixes a segmentation fault bug that several people have reported. The patch for the bug has actually been committed to the git repository since June last year, but somehow I forgot to bump the version and roll up a new release.

The problem appeared when feeding a symbol file that was generated by z80dasm back as input to z80dasm (possibly after editing some symbol names). This is something that the man page explicitly mentions is supported. However, when this was done together with block definitions, it caused z80dasm to segfault with a NULL pointer dereference. Some code didn't expect that the symbol automatically generated to mark a block start could already be defined via the input symbol file. Thanks to J. B. Langston for first sending me the report and analysis of the crash.

I took this opportunity to review the rest of the symbol handling code and do some further clean ups. It has also led me to implement a feature that I have been asked for in the past. z80dasm now has the ability to sort the symbol table before writing it out to the symbol file.

More specifically, there is now a --sym-order command-line option that either takes a default or frequency argument. Default leaves the ordering as it was in the previous versions - ordered by symbol value. Frequency sorts the symbol table by how frequently a symbol is used in the disassembly. The most commonly used symbols are written at the start of the symbol file. When first approaching an unknown binary, this might help you to identify the most commonly used subroutines.

Anyway, the new release is available from the usual place. See the included README file for build and installation instructions. z80dasm is also included in Debian, however the new release is not in yet (if you're a Debian developer and would like to sponsor an upload, please get in touch).

Posted by Tomaž | Categories: Code | Comments »

Quick and ugly WFM data export for Rigol DS2072A

15.08.2019 14:48

At work I use a Rigol DS2072A oscilloscope. It's quite a featureful little two-channel digital scope that mostly does the job that I need it for. It can be buggy at times though and with experience I learned to avoid some of its features. Like for example the screenshot tool that sometimes, but not always, captures a perfectly plausible PNG that actually contains something different than what was displayed on the physical screen at the time. I'm not joking - I think there's some kind of a double-buffering issue there.

Recently I was using it to capture some waveforms that I wanted to further process on my computer. On most modern digital scopes that's a simple matter of exporting a trace to a CSV file on a USB stick. DS2072A indeed has this feature, however I soon found out that it is unbearably slow. Exporting 1.4 Msamples took nearly 6 minutes. I'm guessing exporting a full 14 Msample capture would take an hour - I've never had the patience to actually wait for one to finish and the progress indicator indeed remained pegged at 0% until I reset the scope in frustration. I planned to do many captures, so that approach was clearly unusable.

Rigol DS2072A oscilloscope.

Luckily, there's also an option for a binary export that creates WFM files. Exporting to those is much faster than to the text-based CSV format, but on the other hand it creates binary blobs that apparently only the scope itself can read. I found the open source pyRigolWFM tool for reading WFM files, but unfortunately it only seems to support the DS1000 series and doesn't work with files produced by DS2072A. There's also Rigol's WFM converter, but again it only works with DS4000 and DS6000 series, so I had no luck with that either.

I noticed that the sizes of WFM files in bytes were similar to the number of samples they were supposed to contain, so I guessed extracting raw data from them wouldn't be that complicated - they can't be compressed and there are only that many ways you can shuffle bytes around. The only weird thing was that the files containing the same number of samples were all of a slightly different size. A comment on the pyRigolWFM issue tracker mentioned that the WFM files are more or less a memory dump of the scope's memory which gave me hope that their format isn't very complicated.

After some messing around in a Jupyter Notebook I came up with the following code that extracted the data I needed from WFM files into a Numpy array:

import numpy as np
import struct

def load_wfm(path):
    with open(path, 'rb') as f:
        header = f.read(0x200) 
        
    magic = header[0x000:0x002]
    assert magic == b'\xa5\xa5'
        
    offset_1 = struct.unpack('<i', header[0x044:0x048])[0]
    offset_2 = struct.unpack('<i', header[0x048:0x04c])[0]
    n_samples = struct.unpack('<i', header[0x05c:0x060])[0]
    sample_rate = struct.unpack('<i', header[0x17c:0x180])[0]
    
    assert n_samples % 2 == 0
    
    pagesize = n_samples//2
        
    data = np.fromfile(path, dtype=np.uint8)
    
    t = np.arange(n_samples)/sample_rate
    x0 = np.empty(n_samples)
    
    # Samples are interleaved on two (?) pages
    x0[0::2] = data[offset_1:offset_1+pagesize]
    x0[1::2] = data[offset_2:offset_2+pagesize]
    
    # These will depend on attenuator settings. I'm not sure
    # how to read them from the file, but it's easy to guess 
    # them manually when comparing file contents to what is
    # shown on the screen.
    n = -0.4
    k = 0.2
    
    x = x0*k + n
    
    return t, x
    
t, x = load_wfm("Newfile1.wfm")

Basically, the file consists of a header and sample buffer. The header contains metadata about the capture, like the sample rate and number of captured samples. It also contains pointers into the buffer. Each sample in a trace is represented by one byte. I'm guessing it is a raw, unsigned 8-bit value from the ADC. That value needs to be scaled according to the attenuator and probe settings to get the measured voltage level. I didn't manage to figure out how the attenuator settings were stored in the header. I calculated the scaling constants manually, by comparing the raw values with what was displayed on the scope's screen. Since I was doing all captures at the same settings that worked for me.

I also didn't bother to completely understand the layout of the file. The code above worked for exports containing only channel 1. In all my files the samples were interleaved in two memory pages: even samples were in one page, odd samples in another. I'm not sure if that's always the case and the code obviously does not attempt to cover any other layout.

Here is a plot of the data I extracted for the trace that is shown on the photograph above:

Plot of the sample data extracted from the WFM file.

I compared the trace data I extracted from the WFM file with the data from the CSV file that is generated by the oscilloscope's own slow CSV export function. The differences between the two are on the order of 10-15. That is most likely due to the floating point precision. For all practical purposes, the values from both exports are identical:

Difference between data from the WFM file and the CSV export.

Anyway, I hope that's useful for anyone else that needs to extract data from these scopes. Just please be aware that is only a minimal viable solution for what I needed to do - the code will need some further hacking if you will apply it to your own files.

Posted by Tomaž | Categories: Code | Comments »

When the terminal is not enough

19.04.2019 9:26

Sometimes I'm surprised by utilities that run fine in a terminal window on a graphical desktop but fail to do so in a remote SSH connection. Unfortunately, this seems to be a side-effect of software on desktop Linux getting more tightly integrated. These days, it's more and more common to see a command-line tool pop up a graphical dialog. It looks fancy, and there might be security benefits in the case of passphrase entries, but it also means that often the remote use case with no access to the local desktop gets overlooked.

The GNOME keyring management is one offender I bump against the most. It tries to handle all entry of sensitive credentials, like passwords and PINs on a system and integrates with the SSH and GPG agents. I remember that it used to interfere with private SSH key passphrase entry when jumping from one SSH connection to another, but that seems to be fixed in Debian Stretch.

On the other hand, running GPG in a SSH session by default still doesn't work (this might also pop up when, for example, signing git tags):

$ gpg -s gpg_test
gpg: using "0A822E7A" as default secret key for signing
gpg: signing failed: Operation cancelled
gpg: signing failed: Operation cancelled

This happens when that same user is also logged in to the graphical desktop, but the graphical session is locked. I'm not sure exactly what happens in the background, but something somewhere seems to cancel the passphrase entry request.

The solution is to set the GPG agent to use the text-based pin entry tool. Install the pinentry-tty package and put the following into ~/.gnupg/gpg-agent.conf:

pinentry-program /usr/bin/pinentry-tty

After this, the passphrase can be entered through the SSH session:

$ gpg -s gpg_test
gpg: using "0A822E7A" as default secret key for signing
Please enter the passphrase to unlock the OpenPGP secret key:
"Tomaž Šolc (Avian) <tomaz.solc@tablix.org>"
4096-bit RSA key, ID 059A0D2C0A822E7A,
created 2013-01-13.

Passphrase:

Update: Note however that with this change in place graphical programs that run GPG without a terminal, such as Thunderbird's Enigmail extension, will not work.

Another offender are PulseAudio- and systemd-related tools. For example, inspecting the state of the sound system over SSH fails with:

$ pactl list
Connection failure: Connection refused
pa_context_connect() failed: Connection refused

Here, the error message is a bit misleading. The problem is that the XDG environment variables aren't set up properly. Specifically for PulseAudio, the XDG_RUNTIME_DIR should be set to something like /run/user/1000.

This environment is usually set by pam_systemd.so PAM module. However, some overzealous system administrators disable the use of PAM for SSH connections, so it might not be set in an SSH session. To have these variables set up automatically, you should have the following in your /etc/ssh/sshd_config:

UsePAM yes
Posted by Tomaž | Categories: Code | Comments »

Booting Raspbian from Barebox

01.03.2019 18:28

Barebox is an open source bootloader intended for embedded systems. In the past few weeks I've been spending a lot of time at work polishing its support for Raspberry Pi boards and fixing some rough edges. Some of my patches have already been merged upstream, and I'm hoping the remainder will end up in one the of upcoming releases. I thought I might as well write a short tutorial on how to get it running on a Raspberry Pi 3 with the stock Raspbian Linux distribution.

The Broadcom BCM28xx SoC on Raspberry Pi has a peculiar boot process. First a proprietary binary blob is started on the VideoCore processor. This firmware reads some config files and does some hardware initialization. Finally, it brings up the ARM core, loads a kernel image (or a bootloader, as we'll see later) and jumps to it. When starting the kernel it also passes to it some hardware information, such as a device tree and boot arguments. Raspbian Linux ships with its own patched Linux kernel that tends to be relatively tightly coupled with the VideoCore firmware and the information it provides.

On the other hand, upstream Linux kernels tend to ignore the VideoCore as much as possible. This makes them much easier to use with a bootloader. In fact, a bootloader is required since they can't use the device tree provided by the VideoCore firmware. Unfortunately, upstream kernels also historically have worse support for various peripherals found on Raspberry Pi boards, so sometimes you can't use them. In the rest of this text, I'll be only talking about Raspbian kernels as most of my work focused on making the VideoCore interoperation work correctly for them.

Raspberry Pi 3 connected to a USB-to-serial adapter.

To get started, you'll need a Raspberry Pi 3 board and an SD card with a Raspbian Stretch Lite. You'll also need a USB-to-serial adapter that uses 3.3V levels (I'm using this one) and another computer to connect it to. Connect ground, pin 14 and pin 15 on Raspberry Pi header to ground, RXD and TXD on the serial adapter respectively. We will use this connection to access the Barebox serial console. For the parts where we will interact with the Linux system on the Raspberry Pi you can also connect it to a HDMI monitor and a keyboard. I prefer to work over ssh.

Run the following on the computer to bring up the serial terminal at 115200 baud (adjust the device path as necessary to access your serial adapter):

$ screen /dev/ttyUSB0 115200

It's possible to cross-compile Barebox for ARM on an Intel computer. However, for simplicity we'll just compile Barebox on the Raspberry Pi itself since all the tools are already available there. If you intend to experiment, it's worth setting up either the cross-compiler or two Raspberry Pis: one for compilation and one for testing.

Now log-in to your Raspberry Pi and open a terminal, either through ssh or the monitor and keyboard. Fetch the Barebox source. At the time of writing, not all of my patches (1, 2) are in the upstream repository yet. You can get a patched tree based on the current next branch from my GitHub fork:

$ git clone https://github.com/avian2/barebox.git
$ cd barebox

Update: Barebox v2019.04.0 release should contain everything shown in this blog post.

In the top of the Barebox source distribution, run the following to configure the build for Raspberry Pi:

$ export ARCH=arm
$ make rpi_defconfig

At this point you can also run make menuconfig and poke around. Barebox uses the same configuration and build system as the Linux kernel, so if you've ever compiled Linux the interface should be very familiar. When you are ready to compile, run:

$ make -j4

This should take around a minute on a Raspberry Pi 3. If compilation fails, check if you are missing any tools and install them. You might need to install flex and bison with apt-get.

After a successful compilation the bootloader binaries end up in the images directory. If you didn't modify the default config you should get several builds for different versions of the Raspberry Pi board. Since we're using Raspberry Pi 3, we need barebox-raspberry-pi-3.img. Copy this to the /boot partition:

$ sudo cp images/barebox-raspberry-pi-3.img /boot/barebox.img

Now put the following into /boot/config.txt. What this does is tell the VideoCore firmware to load the Barebox image instead of the Linux kernel and to bring up the serial interface hardware for us:

kernel=barebox.img
enable_uart=1

At this point you should be ready to reboot Raspberry Pi. Do a graceful shutdown with the reboot command. After a few moments you should see the following text appear on the serial terminal on your computer:

barebox 2019.02.0-00345-g0741a17e3 #183 Fri Mar 1 09:44:11 GMT 2019


Board: RaspberryPi Model 3B
bcm2835-gpio 3f200000.gpio@7e200000.of: probed gpiochip-1 with base 0
bcm2835_mci 3f300000.sdhci@7e300000.of: registered as mci0
malloc space: 0x0fe7e5c0 -> 0x1fcfcb7f (size 254.5 MiB)
mci0: detected SD card version 2.0
mci0: registered disk0
environment load /boot/barebox.env: No such file or directory
Maybe you have to create the partition.
running /env/bin/init...

Hit any key to stop autoboot:    3

Press a key to drop into the Hush shell. You'll get a prompt that is very similar to Bash. You can use cd to move around and ls to inspect the filesystem. Barebox uses the concept of filesystem mounts just like Linux. By default, the boot partition on the SD card gets mounted under /boot. The root is a virtual RAM-only filesystem that is not persistent.

The shell supports simple scripting with local and global variables. Barebox calls the shell scripts that get executed on boot the bootloader environment. You can find those under /env. The default environment on Raspberry Pi currently doesn't do much, so if you intend to use Barebox in some application, you will need to do the scripting yourself.

To finally boot the Raspbian system, enter the following commands:

barebox@RaspberryPi Model 3B:/ global linux.bootargs.vc="$global.vc.bootargs"
barebox@RaspberryPi Model 3B:/ bootm -o /vc.dtb /boot/kernel7.img

Loading ARM Linux zImage '/boot/kernel7.img'
Loading devicetree from '/vc.dtb'
commandline: console=ttyAMA0,115200 console=ttyS1,115200n8   8250.nr_uarts=1 bcm2708_fb.fbwidth=656 bcm2708_fb.fbheight=416 bcm2708_fb.fbswap=1 vc_mem.mem_base=0x3ec00000 vc_mem.mem_size=0x40000000  dwc_otg.lpm_enable=0 console=tty1 root=PARTUUID=1c5963cc-02 rootfstype=ext4 elevator=deadline fsck.repair=yes rootwait
Starting kernel in hyp mode

The global command adds the kernel command-line from VideoCore to the final kernel boot arguments. Barebox saves the command-line that was passed to it from the firmware into the vc.bootargs variable. When booting the kernel, Barebox constructs the final boot arguments by concatenating all global variables that start with linux.bootargs.

The bootm command actually boots the ARMv7 kernel (/boot/kernel7.img) from the SD card using the device tree from the VideoCore. Each time it is started, Barebox saves the VideoCore device tree to its own root filesystem at /vc.dtb. A few moments after issuing the bootm command, the Raspbian system should come up and you should be able to log into it. Depending on whether the serial console has been enabled in the Linux kernel, you might also see kernel boot messages and a log-in prompt in the serial terminal.

You can see that the system was booted from Barebox by inspecting the device tree:

$ cat /sys/firmware/devicetree/base/chosen/barebox-version
barebox-2019.02.0-00345-g0741a17e3

Since we passed both the VideoCore device tree contents and the command-line to the kernel, the system should work as if no bootloader was actually used. This means that settings in cmdline.txt and config.txt, such as device tree overlays and HDMI configuration, should be correctly honored.

In the end, why would you want to use such a bootloader? Increasing reliability and resistance to bad system updates is one reason. For example, Barebox can enable the BCM28xx hardware watchdog early in the boot process. This means that booting a bad kernel is less likely to result in a system that needs a manual reset. A related Barebox feature is the bootchooser which can be used to implement the A/B root partition scheme, so that a failed boot from one root automatically results in the bootloader falling back to the other root. Note however that one thing still missing in Barebox is the support for reading reset-source on Raspberry Pi, so you can't yet distinguish between a power-on reset and a watchdog running out.

As it was pointed out on the Barebox mailing list, this is not perfect. Discussion on the design of the hardware watchdog aside, the bootloader also can't help with VideoCore firmware updates, since the firmware gets started before the bootloader. Also since the device tree and kernel in Raspbian are coupled with the VideoCore firmware, this means that kernel updates in Raspbian can be problematic as well, even with a bootchooser setup. However a step towards a more reliable system, even if not perfect, can still solve some real-life problems.

Posted by Tomaž | Categories: Code | Comments »

The case of disappearing PulseAudio daemon

03.11.2018 19:39

Recently I was debugging an unusual problem with the PulseAudio daemon. PulseAudio handles high-level details of audio recording, playback and processing. These days it is used by default in many desktop Linux distributions. The problem I was investigating was on an embedded device using Linux kernel 4.9. I relatively quickly I found a way to reproduce it. However finding the actual cause was surprisingly difficult and led me into the kernel source and learning about the real-time scheduler. I thought I might share this story in case someone else finds themselves in a similar situation.

The basic issue was that PulseAudio daemon occasionally restarted. This reset some run-time configuration that should not be lost while the device was operating and also broke connections to clients that were talking to the daemon. Restarts were seemingly connected to the daemon or the CPU being under load. For example, the daemon would restart if many audio files were played simultaneously on it. I could reproduce the problem on PulseAudio 12.0 using a shell script similar to the following:

n=0
while [ $n -lt 100 ]; do
	pacmd play-file foo.wav 0
	n=$(($n+1))
done

This triggers 100 playbacks of the foo.wav file at almost the same instant and would reliably make the daemon restart on the device. However I was sometimes seeing restarts with less than ten simultaneous audio plays. A similar script with 1000 plays would sometimes also cause a restart on my Intel-based laptop using the same PulseAudio version. This made it easier to investigate the problem since I didn't need to do the debugging remotely on the device.

syslog held some clues what was happening. PulseAudio process was apparently being sent the SIGKILL signal. systemd detected that and restarted the service shortly after:

systemd[1]: pulseaudio.service: Main process exited, code=killed, status=9/KILL
systemd[1]: pulseaudio.service: Unit entered failed state.
systemd[1]: pulseaudio.service: Failed with result 'signal'.

However, there were no other clues as to what sent the SIGKILL or why. Kernel log from dmesg had nothing whatsoever related to this. Increasing logging verbosity in systemd and PulseAudio showed nothing relevant. Attaching a debugger and strace to the PulseAudio process showed that the signal was received at seemingly random points in the execution of the daemon. This showed that the problem was not directly related to any specific line of code, but otherwise led me nowhere.

When I searched the web, all suggestions seemed to point to the kernel killing the process due to an out-of-memory condition. The kernel being the cause of the signal seemed reasonable, however OOM condition is usually clearly logged. Another guess was that systemd itself was killing the process after I learned about its resource control feature. This turned out to be a dead end as well.

The first real useful clue was when Gašper gave me a crash course on using perf. After setting up debugging symbols for the kernel and the PulseAudio binary, I used the following:

$ perf_4.9 record -g -a -e 'signal:*'
(trigger the restart here)
$ perf_4.9 script

This printed out a nice backtrace of the code that generated the problematic SIGKILL signal (among noise about uninteresting other signals being sent between processes on the system):

alsa-sink-ALC32 12510 [001] 24535.773432: signal:signal_generate: sig=9 errno=0 code=128 comm=alsa-sink-ALC32 pid=12510 grp=1 res=0
            7fffbae8982d __send_signal+0x80004520223d ([kernel.kallsyms])
            7fffbae8982d __send_signal+0x80004520223d ([kernel.kallsyms])
            7fffbaef0652 run_posix_cpu_timers+0x800045202522 ([kernel.kallsyms])
            7fffbaea7aa9 scheduler_tick+0x800045202079 ([kernel.kallsyms])
            7fffbaefaa80 tick_sched_timer+0x800045202000 ([kernel.kallsyms])
            7fffbaefa480 tick_sched_handle.isra.12+0x800045202020 ([kernel.kallsyms])
            7fffbaefaab8 tick_sched_timer+0x800045202038 ([kernel.kallsyms])
            7fffbaeebbfe __hrtimer_run_queues+0x8000452020de ([kernel.kallsyms])
            7fffbaeec2dc hrtimer_interrupt+0x80004520209c ([kernel.kallsyms])
            7fffbb41b1c7 smp_apic_timer_interrupt+0x800045202047 ([kernel.kallsyms])
            7fffbb419a66 __irqentry_text_start+0x800045202096 ([kernel.kallsyms])

alsa-sink-ALC32 is the thread in the PulseAudio daemon that is handling the interface between the daemon and the ALSA driver for the audio hardware. The stack trace shows that the signal was generated in the context of that thread, however the originating code was called from a timer interrupt, not a syscall. Specifically, the run_posix_cpu_timers function seemed to be the culprit. This was consistent with the random debugger results I saw before, since interrupts were not in sync with the code running in the thread.

Some digging later I found the following code that is reached from run_posix_cpu_timers via some static functions. Intermediate static functions probably got optimized away by the compiler and don't appear in the perf stack trace:

if (hard != RLIM_INFINITY &&
    tsk->rt.timeout > DIV_ROUND_UP(hard, USEC_PER_SEC/HZ)) {
	/*
	 * At the hard limit, we just die.
	 * No need to calculate anything else now.
	 */
	__group_send_sig_info(SIGKILL, SEND_SIG_PRIV, tsk);
	return;
}

Now things started to make sense. Linux kernel implements some limits on how much time a thread with real-time scheduling priority can use before cooperatively yielding the CPU to other threads and processes (via a blocking syscall for instance). If a thread hits this time limit it is silently sent the SIGKILL signal by the kernel. Kernel resource limits are documented in the setrlimit man page (the relevant limit here is RLIMIT_RTTIME). The PulseAudio daemon was setting the ALSA thread to real-time priority and it was getting killed under load.

Using real-time scheduling seems to be the default in PulseAudio 12.0 and the time limit set for the process is 200 ms. The limit for a running daemon can be inspected from shell using prlimit:

$ prlimit --pid PID | grep RTTIME
RTTIME     timeout for real-time tasks           200000    200000 microsecs

Details of real-time scheduling can be adjusted in /etc/pulse/daemon.conf, for instance with:

realtime-scheduling = yes
rlimit-rttime = 400000

Just to make sure that an RTTIME over-run indeed produces such symptoms I made the following C program that intentionally triggers it. Running the program showed that indeed the cause for the SIGKILL in this case isn't logged anywhere and produces a similar perf backtrace:

#include <stdio.h>
#include <sched.h>
#include <sys/resource.h>

int main(int argc, char** argv)
{
        struct sched_param sched_param;
        if (sched_getparam(0, &sched_param) < 0) {
                printf("sched_getparam() failed\n");
                return 1;
        }
        sched_param.sched_priority = sched_get_priority_max(SCHED_RR);
        if (sched_setscheduler(0, SCHED_RR, &sched_param)) {
		printf("sched_setscheduler() failed\n");
		return 1;
	}
        printf("Scheduler set to Round Robin with priority %d...\n",
			sched_param.sched_priority);
        fflush(stdout);

	struct rlimit rlimit;
	rlimit.rlim_cur = 500000;
	rlimit.rlim_max = rlimit.rlim_cur;

	if (setrlimit(RLIMIT_RTTIME, &rlimit)) {
		printf("setrlimit() failed\n");
		return 1;
	}
	printf("RTTIME limit set to %ld us...\n",
			rlimit.rlim_max);

	printf("Hogging the CPU...\n");
	while(1);
}

It would be really nice if kernel would log somewhere the reason for this signal, like it does with OOM. It might be that in this particular case developers wanted to avoid calling possibly expensive logging functions from interrupt context. On the other hand, it seems that kernel by default doesn't log any process kills at all due to resource limit over-runs. Failed syscalls can be logged using auditd, but that wouldn't help here as no syscalls actually failed.

As far as this particular PulseAudio application was concerned, there weren't really any perfect solutions. This didn't look like a bug in PulseAudio but rather a fact of life in a constrained environment with real-time tasks. The PulseAudio man page discusses some trade-offs of real-time scheduling (which is nice in hindsight, but you first have to know where to look). In my specific case, there were more or less only three possibilities of how to proceed:

  1. Disable RTTIME limit and accept that PulseAudio might freeze other processes on the device for an arbitrary amount of time,
  2. disable real-time scheduling and accept occasional skips in the audio due to other processes taking the CPU from PulseAudio for too long, or
  3. accept the fact that PulseAudio will restart occasionally and make other software on the device recover from this case as best as possible.

After considering implications to the functionality of the device I went with the last one in the end. I also slightly increased the default RTTIME limit so that restarts would be less common while still having an acceptable maximum response time for other processes.

Posted by Tomaž | Categories: Code | Comments »

Getting ALSA sound levels from a command-line

01.09.2018 14:33

Sometimes I need to get an idea of signal levels coming from an ALSA capture device, like a line-in on a sound card, and I only have a ssh session available. For example, I've recently been exploring a case of radio-frequency interference causing audible noise in an audio circuit of an embedded device. It's straightforward to load up raw samples into a Jupyter notebook or Audacity and run some quick analysis (or simply listen to the recording on headphones). But what I've really been missing is just a simple command-line tool that would show me some RMS numbers. This way I wouldn't need to transfer potentially large wav files around quite so often.

I felt like a command-line VU meter was something that should have already existed, but finding anything like that on Google turned out elusive. Overtime I've ended up with a bunch of custom half-finished Python scripts. However Python is slow, especially on low-powered ARM devices, so I was considering writing a more optimized version in C. Luckily I've recently come across this recipe for sox which does exactly what I want. Sox is easily apt-gettable on all Debian flavors I commonly care about and even on slow CPUs the following doesn't take noticeable longer than it takes to record the data:

$ sox -q -t alsa -d -n stats trim 0 5
             Overall     Left      Right
DC offset  -0.000059 -0.000059 -0.000046
Min level  -0.333710 -0.268066 -0.333710
Max level   0.273834  0.271820  0.273834
Pk lev dB      -9.53    -11.31     -9.53
RMS lev dB    -25.87    -26.02    -25.73
RMS Pk dB     -20.77    -20.77    -21.09
RMS Tr dB     -32.44    -32.28    -32.44
Crest factor       -      5.44      6.45
Flat factor     0.00      0.00      0.00
Pk count           2         2         2
Bit-depth      15/16     15/16     15/16
Num samples     242k
Length s       5.035
Scale max   1.000000
Window s       0.050

This captures 5 seconds of audio (trim 0 5) from the default ALSA device (-t alsa -d), runs it through the stats filter and discards the samples without saving them to a file (-n). The stats filter calculates some useful statistics and dumps them to standard error. A different capture device can be selected through the AUDIODEV environment variable.

The displayed values are pretty self-explanatory: min and max levels show extremes in the observed samples, scaled to ±1 (so they are independent of the number of bits in ADC samples). Root-mean-square (RMS) statistics are scaled so that 0 dB is the full-scale of the ADC. In addition to overall mean RMS level you also get peak and trough values measured over a short sliding window (length of this window is configurable, and is shown on the last line of the output). This gives you some idea of the dynamics in the signal as well as the overall volume. Description of other fields can be found in the sox(1) man page.

Posted by Tomaž | Categories: Code | Comments »

Monitoring HomePlug AV devices

23.05.2018 18:51

Some time ago I wanted to monitor the performance of a network of Devolo dLAN devices. These are power-line Ethernet adapters. Each device looks like a power brick with a standard Ethernet RJ45 port. You can plug several of these into wall sockets around a building and theoretically they will together act as an Ethernet bridge, linking all their ports as if they were connected to a single network. The power-line network in question seemed to be having intermittent problems, but without some kind of a log it was hard to pin-point exactly what was the problem.

I have very little experience with power-line networks and some quick web searches yielded conflicting information about how these things work and what protocols are at work behind the curtains. Purely from the user perspective, the experience seems to be similar to wireless LANs. While individual devices have flashy numbers written on them, such as 500 Mbps, these are just theoretical "up to" throughputs. In practice, bandwidth of individual links in the network seems to be dynamically adjusted based on signal quality and is commonly quite a bit lower than advertised.

Devolo Cockpit screenshot.

Image by devolo.com

Devolo provides an application called Cockpit that allows you to configure the devices and visualize the power-line network. The feature I was most interested in was the real-time display of the physical layer bitrate for each individual link in the network. While the Cockpit is available for Linux, it is a user friendly point-and-click graphical application and chances were small that I would be able to integrate it into some kind of an automated monitoring process. The prospect of decoding the underlying protocol seemed easier. So I did a packet capture with Wireshark while the Cockpit was fetching the bitrate data:

HomePlug AV protocol decoded in Wireshark.

Wireshark immediately showed the captured packets as part of the HomePlug AV protocol and provided a nice decode. This finally gave me a good keyword I could base my further web searches on, which revealed a helpful white paper with some interesting background technical information. HomePlug AV physical layer apparently uses frequencies in the range of 2 - 28 MHz using OFDM with adaptive number of bits per modulation symbol. The network management is centralized, using a coordinator and a mix of CSMA/CA and TDMA access.

More importantly, the fact that Wireshark decode showed bitrate information in plain text gave me confidence that replicating the process of querying the network would be relatively straightforward. Note how the 113 Mbit/sec in the decode directly corresponds to hex 0x71 in raw packet contents. It appeared that only two packets were involved, a Network Info Request and a Network Info Confirm:

HomePlug AV Network Info Confirmation packet decode.

However before diving directly into writing code from scratch I came across the Faifa command-line tool on GitHub. The repository seems to be a source code dump from a now-defunct dev.open-plc.org web site. There is very little in terms of documentation or clues to its progeny. Last commit was in 2016. However a browse through its source code revealed that it is capable of sending the 0xa038 Network Info Request packet and receiving and decoding the corresponding 0xa039 Network Info Confirm reply. This was exactly what I was looking for.

Some tweaking and a compile later I was able to get the bitrate info from my terminal. Here I am querying one device in the power-line network (its Ethernet address is in the -a parameter). The queried device returns the current network coordinator and a list of stations it is currently connected to, together with the receive and transmit bitrates for each of those connections:

# faifa -i eth4 -a xx:xx:xx:xx:xx:xx -t a038
Faifa for HomePlug AV (GIT revision master-5563f5d)

Started receive thread
Frame: Network Info Request (Vendor-Specific) (0xA038)

Dump:
Frame: Network Info Confirm (Vendor-Specific) (A039), HomePlug-AV Version: 1.0
Network ID (NID): xx xx xx xx xx xx xx
Short Network ID (SNID): 0x05
STA TEI: 0x24
STA Role: Station
CCo MAC: 
	xx:xx:xx:xx:xx:xx
CCo TEI: 0xc2
Stations: 1
Station MAC       TEI  Bridge MAC        TX   RX  
----------------- ---- ----------------- ---- ----
xx:xx:xx:xx:xx:xx 0xc2 xx:xx:xx:xx:xx:xx 0x54 0x2b
Closing receive thread

The original tool had some annoying problems that I needed to work around before deploying it to my monitoring system. Most of all, it operated by sending the query with Ethernet broadcast address as the source. It then put the local network interface into promiscuous mode to listen for broadcasted replies. This seemed like bad practice and created problems for me, least of which was log spam with repeated kernel warnings about promiscuous mode enters and exits. It's possible that the use of broadcasts was a workaround for hardware limitation on some devices, but the devices I tested (dLAN 200 and dLAN 550) seem to reply just fine to queries from non-broadcast addresses.

I also fixed a race condition that was in the original tool due to the way it received the replies. If multiple queries were running on the same network simultaneously sometimes replies from different devices became confused. Finally, I fixed some rough corners regarding libpcap usage that prevented the multi-threaded Faifa process from exiting cleanly once a reply was received. I added a -t command-line option for sending and receiving a single packet.

As usual, the improved Faifa tool is available in my fork on GitHub:

$ git clone https://github.com/avian2/faifa.git

To conclude, here is an example of bitrate data I recorded using this approach. It shows transmit bitrates reported by one device in the network to two other devices (here numbered "station 1" and "station 2"). The data was recorded over the course of 9 days and the network was operating normally during this time:

Recorded PHY bitrate for two stations.

Even this graph shows some interesting things. Some devices (like the "station 1" here) seem to enter a power saving mode. Such devices don't appear in the network info reports, which is why data is missing for some periods of time. Even out of power saving mode, devices don't seem to update their reported bitrates if there is no data being transferred on that specific link. I think this is why the "station 2" here seems to have long periods where the reported bitrate remains constant.

Posted by Tomaž | Categories: Code | Comments »

Switching window scaling in GNOME

01.05.2018 13:22

A while back I got a new work laptop: a 13" Dell XPS 9360. I was pleasantly surprised that installing the latest Debian Stretch with GNOME went smoothly and no special tweaks were needed to get everything up and running. The laptop works great and the battery life in Linux is a significant step up from my old HP EliteBook. The only real problem I noticed after a few months of use is weird behavior of the headphone jack, which often doesn't work for some unknown reason.

In any case, this is my first computer with a high-DPI screen. The 13-inch LCD panel has a resolution of 3200x1800, which means that text on a normal X window screen is almost unreadable without some kind of scaling. Thankfully, GNOME that ships with Stretch has a relatively good support for window scaling. You can set a scaling factor in the GNOME Tweak Tool and all windows will have their content scaled by an integer factor, making text and icons intelligible again.

Window scaling setting in GNOME Tweak Tool.

This setting works fine for me, at least for terminal windows and Firefox, which is what I mostly use on this computer. I've only noticed some minor cosmetic issues when I change this at run-time. Some icons and buttons in GNOME Shell (like the bar on the top of the screen or the settings menu on the upper-right) will sometimes look weird until the next reboot.

A bigger annoyance was the fact that I often use this computer with a normal (non-high-DPI) external monitor. I had to open up the Tweak Tool each time I connected or disconnected the monitor. Navigating huge scaled UI on the low-DPI external monitor or tiny UI on the high-DPI laptop panel got old really quick. It was soon obvious that changing that setting should be a matter of a single key press.

Finding a way to set window scaling programmatically was surprisingly difficult (not unlike my older effort in switching audio output device) I tried a few different approaches, like setting some dconf keys, but none worked reliably. I ended up digging into the Tweak Tool source. This revealed that the Tweak Tool is built around a nice Python library that exposes the necessary settings as functions you can call from your own scripts. The rest was simple.

I ended up with the following Python script:

#!/usr/bin/python2.7

from gtweak.utils import XSettingsOverrides

def main():
        xsettings = XSettingsOverrides()

        sf = xsettings.get_window_scaling_factor()

	if sf == 1:
		sf = 2
	else:
		sf = 1

	xsettings.set_window_scaling_factor(sf)

if __name__ == "__main__":
	main()

I have this script saved as toggle_hidpi and then a shortcut set in GNOME Keyboard Settings so that Super-F11 runs it. Note that using the laptop's built-in keyboard this means pressing the Windows logo, Fn, and F11 keys due to the weird modern practice of hiding normal function keys behind the Fn modifier. On an external USB keyboard, only Windows logo and F11 need to be pressed.

High DPI toggle shortcut in GNOME keyboard settings.

Posted by Tomaž | Categories: Code | Comments »

OpenCT on Debian Stretch

24.02.2018 10:03

I don't like replacing old technology that isn't broken, although I wonder sometimes whether that's just rationalizing the fear of change. I'm still using a bunch of old Schlumberger Cryptoflex (e-Gate) USB tokens for securely storing client-side SSL certificates. All of them are held together by black electrical tape at this point, since the plastic became brittle with age. However they still serve their purpose reasonably well, even if software support for them has been obsoleted a long time ago. So what follows is another installment of the series on keeping these hardware tokens working on the latest Debian Stable release.

Stretch upgrades the pcscd and libpcsclite1 packages (from the pcsc-lite project) to version 1.8.20. Unfortunately, this upgrade breaks the old openct driver, which is to my knowledge the only way to use these tokens on a modern system. This manifests itself as the following error when dumping the list of currently connected smart cards:

$ pkcs15-tool -D
Using reader with a card: Axalto/Schlumberger/Gemalo egate token 00 00
PKCS#15 binding failed: Unsupported card

Some trial and error and git bisect led me to commit 8eb9ea1 which apparently caused this issue. It was committed between releases 1.8.13 (which was shipped in Jessie) and 1.8.14. This commit introduces some subtle changes in the way buffers of data are exchanged between pcscd and its drivers, which break openct 0.6.20.

There are two ways around that: you can keep using pcscd and libpcsclite1 from Jessie (the 1.8.13 source package from Jessie builds fine on Stretch), or you can patch openct. I've decided on the second option.

The openct driver is no longer developed upstream and has been removed from Debian in Jessie (last official release was in 2010, although there has been some effort to modernize it). I keep my own git repository and Debian packages based on the last package shipped in Wheezy. My patched version 0.6.20 includes changes required for systemd support, and now also the patch required to support modern pcscd version on Stretch. The latter has been helpfully pointed out to me by Ludovic Rousseau on the pcsc-lite mailing list.

My openct packages for Stretch on amd64 can be found here (version 0.6.20-1.2tomaz2). The updated source is also in a git repository (with a layout compatible with git-buildpackage), should you want to built it yourself:

$ git clone http://www.tablix.org/~avian/git/openct.git

Other smart card-related packages work for me as-shipped in Stretch (e.g. opensc and opensc-pkcs11 0.16.0-3). No changes were necessary in Firefox configuration for it to be able to pull client-side certificates from the hardware tokens. It is still required however, to insert the token only when no instances of Firefox are running.

Posted by Tomaž | Categories: Code | Comments »

News from the Z80 land

05.01.2018 19:30

Here are some overdue news regarding modern development tools for the vintage Z80 architecture. I've been wanting to give this a bit of exposure, but alas other things in life interfered. I'm happy that there is still this much interest in old computers and software archaeology, even if I wasn't able to dedicate much of my time to it in the last months.


Back in 2008, Stefano Bodrato added support for Galaksija to Z88DK, a software development toolkit for Z80-based computers. Z88DK consists of a C compiler and a standard C library (with functions like printf and scanf) that interfaces with Galaksija's built-in terminal emulation routines. His work was based on my annotated ROM disassembly and development tools. The Z88DK distribution also includes a few cross-platform examples that can be compiled for Galaksija, provided they fit into the limited amount of RAM. When compiling for Galaksija, the C compiler produces a WAV audio file that can be directly loaded over an audio connection, emulating an audio cassette recording.

This November, Stefano improved support for the Galaksija compile target. The target now more fully supports the graphics functions from Z88DK library, including things like putsprite for Galaksija's 2x3 pseudographics. He also added a joystick emulation to Z88DK cross-platform joystick library. The library emulates two joysticks via the keyboard. The first one uses arrow keys on Galaksija's keyboard. The second one uses the 5-6-7-8 key combination in the top row that should be familiar to the users of ZX Spectrum. He also added a clock function that uses the accurate clock ticks provided by the ROM's video interrupt.

Z88DK clock example on Galaksija

(Click to watch Z88DK clock example on Galaksija video)

This made it possible to build more games and demos for Galaksija. I've tried the two-player snakes game and the TV clock demo and (after some tweaking) they both worked on my Galaksija replica. To try them yourself, follow the Z88DK installation instructions and then compile the demos under examples/graphics by adapting the zcc command-lines at the top of the source files.

For instance, to compile clock.c into clock.wav I used the following:

$ zcc +gal -create-app -llib3d -o clock clock.c

The second piece of Z80-related news I wanted to share relates to the z80dasm 1.1.5 release I made back in August. z80dasm is a disassembler for the Z80 machine code. The latest release significantly improves handling of undocumented instructions that was reported to me by Ast Moore. I've already written in more detail about that in a previous blog post.

Back in November I've also updated Debian packaging for z80dasm. So the new release is now also available as a Debian source or binary package that should cleanly build and install on Jessie, Stretch and Sid. Unfortunately the updated package has not yet been uploaded to the Debian archive. I have been unable to reach any of my previous sponsors (in the Debian project, a package upload must be sponsored by a Debian developer). If you're a Debian developer with an interest in vintage computing and would like to sponsor this upload, please contact me.

Update: z80dasm 1.1.5-1 package is now in Debian Unstable.

Posted by Tomaž | Categories: Code | Comments »

On piping Curl to apt-key

21.08.2017 16:52

Piping Curl to Bash is dangerous. Hopefully, this is well known at this point and many projects no longer advertise this installation method. Several popular posts by security researchers demonstrated all kinds of covert ways such instructions could be subverted by an attacker who managed to gain access to the server hosting the install script. Passing HTTP responses uninspected to other sensitive system commands is no safer, however.

This is how Docker documentation currently advertises adding their GPG key to the trusted keyring for Debian's APT:

Screenshot of instructions for adding Docker GPG key.

The APT keyring is used to authenticate packages that are installed by Debian's package manager. If an attacker can put their public key into this keyring they can for example provide compromised package updates to your system. Obviously you want to be extra sure no keys from questionable origins get added.

Docker documentation correctly prompts you to verify that the added key matches the correct fingerprint (in contrast to some other projects). However, the suggested method is flawed since the fingerprint is only checked after an arbitrary number of keys has already been imported. Even more, the command shown specifically checks only the fingerprint of the Docker signing key, not of the key (or keys) that were imported.

Consider the case where download.docker.com starts serving an evil key file that contains both the official Docker signing key and the attacker's own key. In this case, the instructions above will not reveal anything suspicious. The displayed fingerprint matches the one shown in the documentation:

$ curl -fsSL https://www.tablix.org/~avian/docker/evil_gpg | sudo apt-key add -
OK
$ sudo apt-key fingerprint 0EBFCD88
pub   4096R/0EBFCD88 2017-02-22
      Key fingerprint = 9DC8 5822 9FC7 DD38 854A  E2D8 8D81 803C 0EBF CD88
uid                  Docker Release (CE deb) <docker@docker.com>
sub   4096R/F273FCD8 2017-02-22

However, importing our key file also silently added another key to the keyring. Considering that apt-key list outputs several screen-fulls of text on a typical system, this is not easy to spot after the fact without specifically looking for it:

$ sudo apt-key list|grep -C1 evil
pub   1024R/6537017F 2017-08-21
uid                  Dr. Evil <evil@example.com>
sub   1024R/F2F8E843 2017-08-21

The solution is obviously not to pipe the key file directly into apt-key without inspecting it first. Interestingly, this is not straightforward. pgpdump tool is recommended by the first answer that comes up when asking Google about inspecting PGP key files. The version in Debian Jessie fails to find anything suspicious about our evil file:

$ pgpdump -v
pgpdump version 0.28, Copyright (C) 1998-2013 Kazu Yamamoto
$ curl -so evil_gpg https://www.tablix.org/~avian/docker/evil_gpg
$ pgpdump evil_gpg|grep -i evil

Debian developers suggest importing the key into a personal GnuPG keyring, inspecting the integrity and then exporting it into apt-key. That is more of a hassle, and not that useful for Docker's key that doesn't use the web of trust. In our case, inspecting the file with GnuPG directly is enough to show that it in fact contains two keys with different fingerprints:

$ gpg --version|head -n1
gpg (GnuPG) 1.4.18
$ gpg --with-fingerprint evil_gpg
pub  4096R/0EBFCD88 2017-02-22 Docker Release (CE deb) <docker@docker.com>
      Key fingerprint = 9DC8 5822 9FC7 DD38 854A  E2D8 8D81 803C 0EBF CD88
sub  4096R/F273FCD8 2017-02-22
pub  1024R/6537017F 2017-08-21 Dr. Evil <evil@example.com>
      Key fingerprint = 3097 2749 79E6 C35F A009  E25E 6CD6 1308 6537 017F
sub  1024R/F2F8E843 2017-08-21

The key file used in the example above was created by simply concatenating the two ASCII-armored public key blocks. It looks pretty suspicious in a text editor because it contains two ASCII headers (which is probably why pgpdump stops processing it before the end). However, two public key blocks could easily have also been exported into a single ASCII-armor block.

Posted by Tomaž | Categories: Code | Comments »

z80dasm 1.1.5

06.08.2017 12:26

This is becoming a summer of retro computing blog posts, isn't it? In any case, today I'm pleased to announce the release of z80dasm 1.1.5, the latest version of my disassembler for the Z80 CPU machine code.

Ast Moore recently noticed that z80dasm doesn't recognize several well known undocumented instructions. When I made z80dasm on the base of Jan Panteltje's dz80 several years ago my goal was to make a disassembler that would correctly produce an assembly listing for Galaksija's ROM. While I did add support for the sli instruction back then, it seems that Galaksija didn't use any of the other undocumented instructions.

z80dasm 1.1.4 did correctly include unknown instructions in the disassembly as raw hex values with defb directives, however it guessed the length of some of them wrong, which desynced the instruction decoder and produced garbage output. The output would still assemble back into the same binary, but the point of a disassembler is to produce human-readable output, so that was not very useful.

Fixing this problem was a bit more involved than I expected, but not for the usual reasons. One of the guidelines I had with z80dasm was to keep z80dasm and z80asm a matched pair. Being able to reproduce back the original binary from disassembled source is very useful when hacking on old software as well as in testing z80dasm. Unfortunately, it turned out that z80asm is quite bad at supporting these undocumented instructions as well. For instance, inc and dec with ixl and ixh register operands are not recognized at all. Some other instructions, like add with these undocumented operands produce wrong opcodes.

I guess this is not surprising. In contrast to the official instruction set there is no authoritative documentation for these instructions, so even the names differ in different sources. For example, both sli a and sll a mnemonics are commonly used for the cb 37 opcode.

I tried to fix some of the problems with z80asm I found, but it turned out to be quite time consuming. Both z80asm and z80dasm are written in, by today's standards, quite an archaic C style, with lots of pointer arithmetic and packing various unrelated things into a single int variable to save space. While I'm still fairly familiar with z80dasm code, the modifications I did to z80asm took a several sheets of paper to figure out.

$ hd
00000000  ed 70 dd 2c dd 23 23 3d  ed 71 dd 09              |.p.,.##=.q..|
0000000c
$ z80dasm example.bin
; z80dasm 1.1.5
; command line: z80dasm example.bin

	org	00100h

	defb 0edh,070h	;in f,(c)
	defb 0ddh,02ch	;inc ixl
	inc ix
	inc hl	
	dec a	
	defb 0edh,071h	;out (c),0
	add ix,bc

In the end, I gave up and solved the problem of z80asm compatibility by leaving z80dasm to decode undocumented instructions as defb directives. However, the instruction lengths should now be decoded correctly and also the mnemonic is included in a comment for better readability. There is now also a new --undoc option in 1.1.5. If it's specified on the command-line, z80dasm will place all instructions directly into the disassembly.

As before, you can get the source code for z80dasm in a release tarball or from the git repository. See the README file for install instructions and the included man page for explanations of all command-line options. z80dasm is also included in Debian, however 1.1.5 has not been uploaded yet.

Posted by Tomaž | Categories: Code | Comments »

IPv6 problems on TP-Link Archer C20

26.04.2017 12:44

Recently I replaced an old 2.4 GHz-only Linksys WRT54GL wireless router with a shiny new, dual band TP-Link Archer C20 (hardware version V1). Unfortunately, the new router brought some unusual problems. It turns out some devices are now unable to get a global IPv6 address when connected over Wi-Fi. For example, my Android 5.1 smartphone and my work laptop with Debian Jessie and Network Manager don't get IPv6 connectivity. They worked just fine when connected through the old router. At the same time, a different phone with Android 6.0 seems to have no problems with the new Archer C20 router.

First a brief note on the network setup: Archer C20 is used here as a wireless access point only. Some other host on the network acts as a gateway to the Internet. That host also provides a DHCP service for IPv4 and runs the route advertisement daemon (radvd) for IPv6 SLAAC. The setup has been quite well tested and works flawlessly on the wired Ethernet. The old WRT54GL has also been used in this way, which is why IPv6 connectivity on the Wi-Fi worked fine even though the old router's firmware had no explicit IPv6 support.

As the TP-Link FAQ entry explains, the WAN port on the C20 is unused, the network is connected to one of the LAN ports and the DHCP server on the C20 is disabled. IPv6 status tab in the configuration interface shows the following:

IPv6 status tab in Archer C20 configuration interface.

The IPv6 problem is somewhat frustrating to diagnose, since it only appears some time after the router has been restarted. For instance, I've usually seen that IPv6 stops working the next day after a reboot. Similarly, changing some unrelated settings, like the wireless SSID, also appears to temporarily fix the issue, only for it to reappear after a while.

Searching the web I can find some discussions about similar problems with TP-Link routers, with no clear conclusion. The firmware changelog does say in a vague way that the latest version fixes an IPv6 problem. However, I've tried the V1_160427 and V1_151120 firmwares and they both behave in the same (broken) way.

Modifications in Archer C20 firmware.

After much head scratching I found out that the base cause why my laptop does not get an IPv6 address over Wi-Fi is IPv6 duplicate address detection. This is apparent from the dadfailed flag on the link local address:

$ ip addr show dev wlan0
3: wlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether a0:88:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff
    inet 192.168.xx.xx/24 brd 192.168.xx.255 scope global dynamic wlan0
       valid_lft 484sec preferred_lft 484sec
    inet6 fe80::a288:xxxx:xxxx:xxxx/64 scope link tentative dadfailed
       valid_lft forever preferred_lft forever

Also, when you know what to look for, this error appears in the logs:

$ dmesg|grep duplicate
IPv6: wlan0: IPv6 duplicate address fe80::a288:xxxx:xxxx:xxxx detected!

So it seems that my laptop thinks that there is another device on the network with the same link-local (and hence Ethernet MAC) address. This is of course not true. In fact, if I disable the duplicate address detection, IPv6 starts working properly:

# sysctl net.ipv6.conf.wlan0.accept_dad=0

Investigating things a bit further, Wireshark shows the following curious packet capture immediately after the laptop connects to the wireless network:

Packet capture during IPv6 neighbor solicitation.

This appears to be the progress of a normal attempt at a IPv6 autoconfiguration from the laptop's side. The laptop (with the MAC address a0:88:...) sends some packets to a IPv6 multicast address (33:33:...). However, all these packets seem to be immediately reflected back to the laptop by Archer C20. The incoming packets highlighted in yellow are byte-by-byte identical to the preceding outgoing packets, with only the destination address in the Ethernet header changed from the multicast to the laptop's MAC address. These incoming packets are not present when the laptop is connected to one of the wired LAN ports, or when using the old wireless router.

These reflected packets trigger the duplicate address detection in the laptop's network stack and the autoconfiguration is interrupted. It seems that at some point in the Archer C20 uptime, IPv6 multicast groups stop working correctly. In fact I don't understand why it even tries to do anything special with those. WRT54GL had no concept of IPv6 and it worked fine. I've experimented with other options that looked related to multicast (like IGMP settings), but with no success. So unfortunately at the moment I don't have a good network-side solution. Any suggestions would be most welcome. Changing the stock firmware might work, but support in OpenWRT for this hardware currently seems experimental at best.

The device-side work-around is to disable DAD like I show above, but this is somewhat ugly. There might be a way to disable it on a per-network basis with Network Manager (see dad-timeout), but I have not tried this yet. It is also still not clear why some devices appear to work. It might be that Android simply disabled duplicate address detection in 6.0.

Update: This Mikrotik manual suggests that such repeated multicast packets are expected on wireless access points that are aware of multicast addresses. So it still might be that the problem is somewhere on the client side. I did find out that enabling Client Isolation on the Archer C20 fixes this problem (with the obvious side effect that wireless clients can no longer talk to each other). dad-timeout NetworkManager option is not supported on Debian Jessie.

Update: Client Isolation doesn't actually help. After a few days I'm again getting the dadfailed flag.

Posted by Tomaž | Categories: Code | Comments »

GIMP onion layers plug-in

21.02.2017 20:48

Some time ago I was playing with animation making applications on a (non-pro) iPad. I found the whole ecosystem very closed and I had to jump through some hoops to get my drawings back onto a Linux computer. However the fact that you can draw directly on the screen does make some things easier compared to a standalone Wacom tablet, even if the accuracy is significantly worse.

One other thing in particular stood out compared to my old GIMP setup. These applications make it very easy to jump frame by frame through the animation. In one touch you can display the next frame and do some quick edits and then move back with another touch. You can browse up and down the stack as a quick way to preview the animation. They also do something they call onion layering which simply means that they overlay the next and previous frames with reduced opacity so that it's easier to see how things are moving around.

This is all obviously useful. I was doing similar things in GIMP, except that changing frames there took some more effort. GIMP as such doesn't have a concept of frames. Instead you use image layers (or layer groups) as frames. You have to click to select a layer and then a few more clicks to adjust the visibility and opacity for neighboring layers if you want to have the onion layer effect. This quickly amounts to a lot of clicking around if you work on more than a handful of frames.

GIMP does offer a Python plug-in interface however, so automating quick frame jumps is relatively simple. Relatively, because GIMP Python Documentation turns out to be somewhat rudimentary if you're not already familiar with GIMP internals. I found it best to learn from the Python-Fu samples and explore the interface using the built-in interactive console.

Screenshot of the GIMP onion layers plug-in

The end result of this exercise was the GIMP onion layers plug-in, which you can now find on GitHub together with installation and usage instructions. The plug-in doesn't have much in terms of an user interface - it merely registers a handful of python-fu-onion- actions for stepping to previous or next frame, with or without the onion layer effect. The idea is that you then assign keyboard (or tablet button) shortcuts to these actions. You will have to define the shortcuts yourself though, since the plug-in can't define them for you. I like to use dot and comma keys since they don't conflict with other GIMP shortcuts and match the typical frame step buttons on video players.

If you follow the layer structure suggested by the Export layers plug-in, this all works quite nicely, including handling of background layers. The only real problem I encountered was the fact that the layer visibility and opacity operations clutter the undo history. Unfortunately, that seems to be the limitation of the plug-in API. Other plug-ins work around this by doing operations on a duplicate of the image, but obviously I can't do that here.

I should note that I was using GIMP 2.8.14 from Debian Jessie, so the code might be somewhat outdated compared to latest GIMP 2.8.20. Feedback in that regard is welcome, as always.

Posted by Tomaž | Categories: Code | Comments »