z80dasm 1.2.0

02.03.2023 18:48

Today I've released z80dasm 1.2.0, a new version of the disassembler for the Zilog Z80 and compatible microprocessors. My thanks for this release goes to two kind contributors that sent in patches that added new functionality.

Back in March 2021 Ben Hildred submitted a patch through the Debian bug tracking system adding partial support for the Zilog Z180 instruction set. Zilog Z180 is a lesser-known successor to the Z80 that was released by Zilog around 10 years after the popular Z80. I don't have any hands-on experience with it, but I thought Ben's patch would be a useful addition to the z80dasm functionality. Since Z180 largely shares the instruction set with Z80 not much new code was needed. Using the Z180 datasheet I added support for the instructions that Ben missed, so baring any undocumented instructions or mistakes on my part, the Z180 support in z80dasm should be complete. It can be enabled using the new --z180 command-line option.

More recently, Michael Riviera sent me a patch by email that added the possibility of copying comments from the symbol file to the disassembly. To produce a more readable output, z80dasm allows you to define symbol names in a separate symbol file. The symbol names are used to define labels in the disassembly so that, for example, function names instead of memory addresses are used in calls. If you also have some comments above the definitions in the symbol file, Michael's new code can now copy them to the output, above the label definition. This can be used to get function documentation in the disassembly. The feature can be enabled using the new --sym-comments command-line option.

Finally, Michael also submitted a patch that improved the alignment of columns in the disassembly. z80dasm's output consists of several columns, depending on the settings. The old code attempted to keep the columns somewhat aligned, but did a bad job at it with a mix of spaces and tabs. Especially if you defined symbols with longer names the output required manual editing if you wanted to have a nice looking disassembly.

I didn't keep Michael's implementation, but the discussion about it did prompt me to clean up and refactor the code that deals with the output formatting. The new z80dasm should now do a much better job at aligning the columns. The new code is also configurable at run time, to allow for different personal preferences regarding column widths and tab sizes. See the new --tab-width and --tab-stops command-line options in the man page.

The new source package is available in the same place as before. See the enclosed README and the man page for build and usage instructions.

I will work on getting the updated package into Debian as my time permits. It will definitely be updated after Debian 12/Bookworm, since there is already a freeze in place.

Posted by Tomaž | Categories: Code | Comments »


23.02.2023 8:52

I needed to make a LED blink on network activity on an embedded Linux device. In modern Linux kernels drivers can control LEDs directly through the LED trigger mechanism. Unfortunately many network drivers don't implement this, or only implement link up/down not link activity triggers. On embedded systems you're often even locked into some magical patched kernel provided by the system vendor, so it might be tricky to install a kernel-space solution like OpenWRT's ledtrig-netdev. Hence the proper way of implementing this was out of reach for me. To get this functionality I created net-led-blinker, a daemon that does a similar thing in user-space, at the cost of a tiny bit of extra CPU time.

When I was looking for existing solutions, I found PiLEDlights, a collection of daemons by Ragnar Jensen. These daemons are made for Raspberry Pi and, among other things, can blink a LED on network activity. Since Jansen's code uses a Raspberry Pi specific way of accessing a LED it was not directly usable for me, but I took the source as a base to create my own daemon.

I modified the code so that it accesses the LED through the user-space LED trigger mechanism. This means that the new daemon is pretty much platform-independent, but has two requirements:

  • The LED on the system must be accessible through /sys/class/leds file system. You can't just use any GPIO, although a GPIO on the system can usually be defined as a LED though the device tree without recompiling the kernel if you have gpio-leds available.

  • The kernel must be compiled with the one-shot LED trigger. This frees a few cycles in the user-space daemon, since it only needs to wake once per period (not twice - you turn the LED on and the kernel turns it off by itself). It also provides a nice mechanism for setting on and off time through the /sys filesystem for creating different blinking patterns.

Ragnar's code detects network activity on all interfaces on the system. A single LED is used to show network activity on any of them. This is another benefit of net-led-blinker over the existing kernel solutions I'm aware of. Triggering a LED from a network driver or ledtrig-neddev means that you need a separate LED for each network interface.

I packaged the net-led-blinker daemon in a Debian package for easy installation. The package includes systemd configuration that runs net-led-blinker on boot and reads configuration from /etc/default.

net-led-blinker is available from my git repository with the following command:

git clone https://www.tablix.org/~avian/git/net-led-blinker.git

Consult the included README for building and usage instructions.

I retained the original Unlicense that was used by Ragnar in their project, so the code is pretty much unencumbered with licensing terms.

Posted by Tomaž | Categories: Code | Comments »

Ignoring pending interrupts in the Linux kernel

29.12.2022 15:34

Recently I was writing a Linux kernel driver for some hardware I made that interacts with a serial bus. The hardware consists of a transmitter and a receiver. The bus is half-duplex and shares the same physical line for sending and receiving data. For simplicity there is no hardware way to unhook the receiver from the bus. Because of this, anything my transmitter sends on the bus will be simultaneously received by the receiver. I don't want my own outgoing data to end up in the software receive pipeline. Hence the function for sending data on the bus must somehow ignore everything that is received while the transmission is ongoing.

The code for sending data looks roughly like this:

static int foo_send(struct device *dev, const char *buf, size_t count)
	struct foo_drvdata *ddata = drv_get_drvdata(dev);


	(code to feed data to the transmitter and wait until the data is sent...)


	return 0;

Here recv_irq is a hardware interrupt line that is triggered by my receiver when it has received some data from the line. It is setup to call a handler function (interrupt service routine) that fetches the received data from the receiver and puts it into the receive FIFO. It's hooked to a GPIO line with some code in the driver's initialization function that looks roughly like this:

static int foo_probe(struct platform_device *pdev)
	struct gpio_desc* desc = ...

	ddata->recv_irq = gpiod_to_irq(desc);
	devm_request_any_context_irq(dev, ddata->recv_irq, foo_recv_isr, 
			IRQF_TRIGGER_FALLING, "foo-recv", data);

	(a bunch of other initialization code here...)

	return 0;

This is simplified quite a bit to hide unimportant details. For example, the send function itself is split up into parts called from various interrupts to avoid hogging the CPU while the transmission is happening and so on.

The gist of it however is that I'm trying to ignore the received data by disabling the receive interrupt while the transmission is happening. I know that during that time any data I receive will just be my own transmission reflected back to me. So I just want to ignore all interrupts from the receiver during that time. As soon as the transmission is completed I need to start processing the interrupts again so that I can capture any reply.

For the sake of completeness, I was developing this on the i.MX35 platform and using kernel 4.14.78, but the code should be reasonably platform independent.

Of course, things are not this simple and this doesn't work. When sending data, I would receive part of my own data back through the interrupt handler despite disabling the interrupt. After a transmit, I would receive zero, one or two receiver interrupts immediately after calling enable_irq(). The number of interrupts depended on the amount of data I sent.

I knew this had something to do with interrupts being held in pending state while they are disabled and then being acted upon once they are enabled again. The fact that I received up to two interrupts made me suspect that interrupts were being held in pending state at two different layers. Figuring how this all works in the kernel and how to actually ignore interrupts on purpose took quite some time.

I found some related questions on Stack Overflow, but they weren't particularly helpful. One comment suggested that this an unreasonable thing to do (it's not) and the other suggested recompiling the kernel without CONFIG_HARDIRQS_SW_RESEND. This would affect things outside of my driver, which even if it worked, wasn't something I wanted to do. I also found a paragraph in the GPIO Driver Interface documentation that talks about a fringe use case of enabling and disabling interrupts in the CEC driver that sounded like what I wanted to do, even though I'm not using the same GPIO lines for input and output. This too turned out to be a dead-end, since I did not want to modify the i.MX35 GPIO driver.

The first clue I got was this comment above the irq_disable() declaration in kernel/irq/chip.c:

 * If the chip does not implement the irq_disable callback, we
 * use a lazy disable approach. That means we mark the interrupt
 * disabled, but leave the hardware unmasked. That's an
 * optimization because we avoid the hardware access for the
 * common case where no interrupt happens after we marked it
 * disabled. If an interrupt happens, then the interrupt flow
 * handler masks the line at the hardware level and marks it
 * pending.
 * If the interrupt chip does not implement the irq_disable callback,
 * a driver can disable the lazy approach for a particular irq line by
 * calling 'irq_set_status_flags(irq, IRQ_DISABLE_UNLAZY)'. This can
 * be used for devices which cannot disable the interrupt at the
 * device level under certain circumstances and have to use
 * disable_irq[_nosync] instead.

The imx35-gpio driver (gpio-mxc.c) that implements the native GPIO functions on i.MX35 indeed does not implement the irq_disable() callback in struct irq_chip. As the comment explains, in this case the kernel only marks the interrupt as disabled in its own data structures when disable_irq() function is called. It does not touch the hardware at all, since the expectation is that interrupt will only be disabled for a short amount of time and in most cases the interrupt will not happen at all. Apparently hardware register access is slow and developers wanted to avoid it if possible.

If the interrupt does happen, the kernel does not run the disabled handler, but flags the interrupt as pending in the internal kernel data structures. Only then does the kernel actually disable the interrupt in hardware. The pending handler will be later called by the kernel from enable_irq(). This was one way I was getting a delayed call to my receiver interrupt handler!

Helpfully, the comment also hints that this delayed mechanism of touching the actual hardware can be disabled by setting a flag after we request the interrupt line:

irq_set_status_flags(ddata->recv_irq, IRQ_DISABLE_UNLAZY);

Doesn't the double negation in "disable unlazy" resolve to "enable lazy"? That seems counterintuitive, since setting the flag disables the lazy behavior. But I digress. Setting IRQ_DISABLE_UNLAZY in my code after requesting the interrupt line fixed one spurious call of the interrupt handler. However I was still getting one call even after this fix.

The answer to that remaining call was hidden in careful reading of that comment above disable_irq(). The hardware interrupt is masked by the kernel, not disabled.

On i.MX35 the interrupt hardware has a interrupt mask register (IMR) and an interrupt status register (ISR). When an interrupt source is triggered, the corresponding bit in the ISR is set. Setting the ISR bit interrupts the processor and runs the interrupt handler, unless the same bit is set in the IMR. As long as interrupt is masked by the IMR bit, the processor will keep running and the interrupt will be deferred. The moment the IMR bit is cleared, the processor will run the pending interrupt. The ISR bit is later cleared by the kernel as part of the interrupt handler mechanism. The kernel knows about all this through the struct irq_chip setup by gpio-mxc.c.

This is the second, hardware level, of pending interrupts that was responsible for my receive handler to be called. On the hardware level the interrupt was still happening, but not acted upon because disable_irq() masked it in the IMR. As soon as the bit was unmasked in enable_irq() the interrupt triggered and my handler was called.

The best way to fix this would be to prevent the interrupt from happening in the first place: to actually disable the interrupt on the hardware level in the GPIO subsystem, not just masking it with disable_irq(). Unfortunately, I could not find a clean way to do this through the GPIO driver layer and poking the i.MX35 GPIO registers directly from my otherwise platform-independent driver seemed like an obviously bad idea. I'm guessing the best way would be if gpio-mxc.c would actually implement that irq_disable() callback mentioned in kernel/irq/chip.c.

In the end I settled to clearing the ISR before calling enable_irq(). A way to do that in what I believe is a reasonably cross-platform way is to call the interrupt acknowledgment callback like this:

struct irq_desc *desc = irq_to_desc(ddata->recv_irq);

if(desc->irq_data.chip->irq_ack) {

Adding this to the send function finally removed the last spurious receiver interrupt handler call in my driver.

What is the take away from all this? When you disable an interrupt in the Linux kernel, the interrupt typically isn't disabled on the hardware level. What disable_irq() actually does is disable the call to the interrupt handler. From the perspective of kernel developers it is primary a software synchronization mechanism. For example, the enable_irq() / disable_irq() pair is used to protect critical parts of the code to prevent race conditions when the main kernel thread and the interrupt handler are accessing the same data structure. The kernel tries hard to serve all interrupts, even those that happen while the handler cannot run immediately.

Actively trying to throw away interrupts as they happen is discouraged and seen as an unusual use of the interrupt system. The kernel requires you to jump through some hoops to work around all the layers where pending interrupts are held. The preferred way is to not make the interrupts happen in the first place. However, this may not be possible: the hardware may not support disabling the reason the interrupts happen and some GPIO drivers lack the hooks that could be used to disable and enable GPIO interrupt lines on the fly.

Posted by Tomaž | Categories: Code | Comments »

ESP8266 web server is slow to close connections

14.08.2022 16:48

A side note to my previous post about using delays to save power on an ESP8266 web server. When I was using Apache Bench to measure the server response times, I noticed that the total time for a request to the ESP8266 always seemed to be about 2 seconds:

$ ab -n10 http://esp8266.lan/
This is ApacheBench, Version 2.3 <$Revision: 1843412 $>


Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        2   16  36.6      4     120
Processing:  2006 2008   4.0   2007    2017
Waiting:        5    7   3.9      6      16
Total:       2009 2024  39.7   2010    2137

This looked odd. When opened in a browser, there was no noticeable 2 second delay and the page from the ESP8266 web server displayed immediately.

The way Apache Bench displays the Connection Times numbers is a bit odd to begin with. This post tells how to interpret the Connect, Processing and Waiting rows. In short, Processing is time from start to end of the TCP connection. Waiting is time from when the request was sent to when the first byte of the response was received.

Since Waiting time was short, this led me to believe that the whole response is sent quickly, but the connection is not closed until after a 2 second delay. A quick tcpdump confirmed this:

(client opens connection)
17:43:27.892032 IP client.lan:55422 > esp8266.lan:80: Flags [S], seq 3077448494, win 64240, options [mss 1460,sackOK,TS val 3187045141 ecr 0,nop,wscale 7], length 0
17:43:27.923811 IP esp8266.lan:80 > client.lan:55422: Flags [S.], seq 6555, ack 3077448495, win 2144, options [mss 536], length 0
17:43:27.923855 IP client.lan:55422 > esp8266.lan:80: Flags [.], ack 1, win 64240, length 0
(client sends request)
17:43:27.923968 IP client.lan:55422 > esp8266.lan:80: Flags [P.], seq 1:83, ack 1, win 64240, length 82: HTTP: GET / HTTP/1.0
(esp8266 sends response)
17:43:27.929828 IP esp8266.lan:80 > client.lan:55422: Flags [P.], seq 1:85, ack 83, win 2062, length 84: HTTP: HTTP/1.0 200 OK
17:43:27.929861 IP client.lan:55422 > esp8266.lan:80: Flags [.], ack 85, win 64156, length 0
17:43:27.931314 IP esp8266.lan:80 > client.lan:55422: Flags [P.], seq 85:97, ack 83, win 2062, length 12: HTTP
17:43:27.931337 IP client.lan:55422 > esp8266.lan:80: Flags [.], ack 97, win 64144, length 0
(esp8266 closes connection - note 2 s delay in timestamps)
17:43:29.934077 IP esp8266.lan:80 > client.lan:55422: Flags [F.], seq 97, ack 83, win 2062, length 0
17:43:29.934227 IP client.lan:55422 > esp8266.lan:80: Flags [F.], seq 83, ack 98, win 64143, length 0
17:43:29.936098 IP esp8266.lan:80 > client.lan:55422: Flags [.], ack 84, win 2061, length 0

In HTTP/1.0, the server should close the connection as soon as the complete response has been sent. It seems that ESP8266WebServer does this differently for some reason. Instead of the server closing the connection, it waits for the client to close it. There's also a timeout that closes the connection from the server side if it's not closed by the client. The timeout is controlled by the HTTP_MAX_CLOSE_WAIT constant, which is set to 2 seconds.

This is weird, because ESP8266 even sends a Connection: close header in the response, explicitly signaling to the client that the server will close the connection.

I think this is due to a benign bug in the ESP8266WebServer's state machine. After spending some time looking at the source, it seems like the current code intends to properly implement both keep-alive and non-keep-alive connections, but for some reason still uses the client timeout in the later case.

I wanted to dig a bit deeper into this and perhaps send a pull request, but instead I spent a bunch of time trying to get the current git version of ESP8266 core for Arduino to work with my ESP8266 modules. This turned out to be a whole new can of worms.

Posted by Tomaž | Categories: Code | Comments »

Waveform display bug in the LA104

29.03.2022 18:44

I recently picked up a Miniware LA104, a small 4 channel logic analyzer. I thought a stand-alone, battery powered logic analyzer might be useful in some cases, especially since it comes with built-in protocol decoding for I2C. I have a few random USB-connected logic analyzers in a drawer somewhere, but I found that I rarely use them. When I do need a logic analyzer I tend not to have a PC at hand with all the necessary software setup, I don't want to risk the laptop by connecting it to a possibly malfunctioning device or there's a suitable oscilloscope nearby that works well enough for the purpose.

The first test I did with LA104 fresh out of the box was to connect all of its 4 channels to a 100 kHz square-wave signal and run a capture. I was disappointed to see that the displayed digital waveform had occasional random glitches that were certainly not there in the real signal. 100 kHz frequency of the input signal was well below the 100 MHz sampling frequency of the LA104, so no undersampling should occur either:

Miniware LA104 displaying waveform with glitches.

Interestingly, when exporting the captured waveform to a comma-separated value (CSV) file using the function built into the stock firmware and displaying it in PulseView, the glitches were not visible anymore:

The captured 100 kHz waveform displayed in PulseView.

This suggested that the waveform capture was working correctly and that the glitches were somehow only introduced when drawing the waveform onto the LA104's LCD screen. Considering that this device was released back in 2018 it's unlikely there will be another firmware update from the manufacturer. However since the source for the firmware is publicly available, I thought I might look into this and see if there's a simple fix for this bug.

For the sake of completeness, I was using firmware version 1.03, which came preloaded on my device and is also the latest publicly available version from Miniware at this time. I'm aware that there exists also an alternative open source firmware for LA104 by Gabonator on GitHub. However that project seems to be focused on various other, non-logic-analyzer uses for the hardware. A quick diff of the logic analyzer app in Gabonator's firmware against the Miniware's source release showed that logic analyzer functionality hasn't received any significant changes or bug fixes.

Looking at the Save_Csv() function in the firmware source, it appears that the CSV export directly dumps the internal representation of the waveform into a file. The exported file for the 100 kHz waveform looked something like the following:

Time(nS), CH1, CH2, CH3, CH4,

First column is the time interval in nanoseconds while the rest of the columns show the logic state of the four input channels. It's interesting that the first column is not a timestamp. The CSV also does not contain all input samples, rather it only lists the detected changes in input states. The first column contains the time interval for which the channels were held in the listed state after the last change. In other words, the LA104 internally compresses the captured samples using run-length encoding.

In the example above, it means that all four channels were in the logic 0 state for 4990 ns, followed by all four channels in the logic 1 state for 4990 ns, and so on. This timing checks out since it corresponds roughly to the 100 kHz frequency of the input signal. Note also that all intervals are multiples of 10 ns - this makes sense since the 100 MHz sampling rate makes for a sampling period of 10 ns. The export function simply multiplies the sample run length by 10 to get the time interval in ns.

Inspecting the part of the CSV export where the LA104 displays the glitches gives a hint to where the problem might be:


The highlighted lines show that the logic analyzer detected that some channels changed state one sample time before others. That is fine and not very surprising. I'm not expecting all channels to change state perfectly in sync down to the nanosecond scale. This can be due to slightly different wire lengths, stray capacitances and so on.

It seems however that the waveform drawing routines on the LA104 do not handle this situation well. At the scale of the display, the 10 ns delay is much less than one display pixel, so it should not even be visible. However for some reason LA104 chooses to display this short transitional state and then ignores the next, much longer stable state altogether. This leads to the waveform seemingly skipping cycles on the display: in the first instance channels 3 and 4 stay in logic 1 state for a cycle when they should go to logic 0; in the second instance channels 1 and 3 stay in logic 0 state for a cycle when they should go to logic 1.

Waveform on the LA104 with highlighted glitches.

In fact, the glitch disappears if I zoom enough into the part of the waveform where the missing edge should be. This suggests that this is indeed some kind of a failed attempt at skipping over signal transitions that are shorter than one display pixel:

Glitch disappears when zoomed into the missing edge.

Looking deeper into what might be causing this led me to the ShowWaveToLCD() function, which seems to be called whenever the waveform on the screen needs to be refreshed. Unfortunately, this function only seems to send some instructions to the FPGA and not much else.

LA104 consists of a STM32F103-series ARM microcontroller and an FPGA. The FPGA is used for waveform capture while the microcontroller handles the user interface. According to the public schematic, both the FPGA and the ARM are connected via a shared bus to the LCD. Hence it seems feasible that the waveform drawing is done directly from the FPGA. This was still a bit surprising to me, since everything else on the screen seems to be drawn by the code running on the ARM. Probably drawing the waveform directly from the FPGA was faster and led to a more responsive user interface.

Unfortunately, this means that fixing the waveform display bug would require modifying the FPGA bitstream. Its source is not public as far as I know. Even if it were, apparently building the bitstream requires a bunch of proprietary software. In any case, there's not much hope in getting this fixed in a simple way. It should be possible to write a new function that does the waveform drawing from ARM, but that would require more effort than I'm prepared to sink into fixing this issue. In the end it might turn out to be too slow anyway.

I'm afraid LA104 will end up gathering dust in a drawer with my other logic analyzers. A logic analyzer for me is primarily a debugging tool, but a tool that misleads me by showing problems that are not there is worse than having no tool at all. I might find some use for it for capturing signal traces for later off-line analysis on a PC, since so far that seems to work reliably. Other than that, sadly I don't see myself using it much in the future.

Posted by Tomaž | Categories: Code | Comments »

hackrf_tcp, a rtl_tcp for HackRF

13.03.2021 19:03

rtl_tcp is a small utility that exposes the functionality of a rtl-sdr receiver over a TCP socket. It can be interfaced with a simple Python script. I find it convenient to use when I need to grab some IQ samples from the radio and I don't want to go into the complexity of interfacing with the GNU Radio or the C library. Using it is also much faster compared to repeatedly running the rtl-sdr command-line utility to grab samples into temporary files at different frequencies or gain settings.

I wanted to use something similar for the HackRF. Unfortunately the stock software that comes with it doesn't include anything comparable. After some searching however I did find an old fork of the HackRF software repository by Zefie on GitHub that included a file named hackrf_tcp.c. Upon closer inspection it seemed to be a direct port of rtl_tcp from librtlsdr to libhackrf. It didn't compile out of the box and merging the fork with the latest upstream produced some conflicts, but it did look promising.

I resolved the merge conflicts and fixed the code so that it now compiles cleanly with the latest libhackrf. I also added a few small improvements.

Just like with rtl_tcp, the protocol on the socket is very simple. Upon receiving the client's connection the server initializes the radio and starts sending raw IQ samples. The client optionally sends commands back to the server in the form of a simple structure:

struct command{
	unsigned char cmd;
	unsigned int param;

cmd is the command id and param is a parameter for the command. Commands are things like SET_FREQUENCY = 0x01 for setting frequency, SET_SAMPLERATE = 0x02 for setting ADC sample rate and so on.

The original code attempted to keep some backwards compatibility by mapping HackRF's functionality to existing rtl-sdr commands. This included things like using the frequency correction command to enable or disable the HackRF's RF preamplifier. I dropped most of that. Obviously I kept the things that were direct equivalents, like center frequency and sample rate setting. The rest seemed like a dangerous hack. Enabling HackRF's preamp by mistake can damage it if there's a strong signal present on the antenna input.

Instead, there is now a new set of commands starting at 0xb0 that is HackRF exclusive. Unsupported rtl-sdr commands are ignored.

Even with hacks backwards compatibility wasn't that good in the first place and I wasn't interested in keeping it. HackRF produces IQ samples as signed 8 bit values while rtl-sdr uses unsigned. The code makes no attempt to do the conversion. There is also a problem that the 32 bit unsigned parameter to the SET_FREQUENCY = 0x01 command can only be used for frequencies up to around 4.3 GHz, which is less than what HackRF can do. To work around that limitation I added a new command SET_FREQUENCY_HI = 0xb4 that sets the central frequency to parameter value plus 0x100000000.

My updated version of hackrf_tcp is in my hackrf fork on GitHub. It seems reasonably stable, but I've seen it hang occasionally when a client disconnects. I haven't looked into this yet. In that case it usually requires a kill -9 to stop it. In hindsight, separating hackrf_tcp out into its own repository instead of keeping it with the rest of the upstream tools might have been a better idea.

As it is right now, you need to compile the whole libhackrf and the rest of the host tools to get hackrf_tcp. The basic instructions in the README still apply. After installation you can just run hackrf_tcp from a shell without any arguments:

$ hackrf_tcp
Using HackRF HackRF One with firmware 2018.01.1
Tuned to 100000000 Hz.

You can also specify some initial radio settings and socket settings on the command-line. See what's listed with --help.

Posted by Tomaž | Categories: Code | Comments »

Reading RAID stride and stripe_width with dumpe2fs

20.02.2021 20:08

Just a quick note, because I found this confusing today. stride and stripe_width are extended options for ext filesystems that can be used to tune their performance on RAID devices. Many sources on the Internet claim that the values for these settings on existing filesystems can be read out using tune2fs or dumpe2fs.

However it is possible that the output of these commands will simply contain no information that looks related to RAID settings. For example:

$ tune2fs -l /dev/... | grep -i 'raid\|stripe\|stride'
$ dumpe2fs -h /dev/... | grep -i 'raid\|stripe\|stride'
dumpe2fs 1.44.5 (15-Dec-2018)

It turns out that the absence of any lines relating to RAID means that these extended options are simply not defined for the filesystem in question. It means that the filesystem is not tuned to any specific RAID layout and was probably created without the -E stripe=...,stripe_width=... option to mke2fs.

However I've also seen some filesystems that were created without this option still display a default value of 1. I'm guessing this depends on the version of mke2fs that was used to create the filesystem:

$ dumpe2fs -h /dev/... |grep -i 'raid\|stripe\|stride'
dumpe2fs 1.44.5 (15-Dec-2018)
RAID stride:              1

For comparison, here is how the output looks like when these settings have actually been defined:

$ dumpe2fs -h /dev/md/orion\:home |grep -i 'raid\|stripe\|stride'
dumpe2fs 1.44.5 (15-Dec-2018)
RAID stride:              16
RAID stripe width:        32
Posted by Tomaž | Categories: Code | Comments »

Showing printf calls in AtmelStudio debugger window

11.02.2021 16:26

Writing debugging information to a serial port is common practice in embedded development. One problem however is that sometimes you can't connect to the serial port. Either the design lacks a spare GPIO pin or you can't physically access it. In those cases it can be useful to emulate such a character-based output stream over the in-circuit debugger connection.

A few years back I've written how to monitor the serial console on the ARM-based VESNA system over JTAG. Back then I used a small GNU debugger script to intercept strings that were intended for the system's UART and copy them to the gdb console. This time I found myself with a similar problem on an AVR-based system and using AtmelStudio 7 IDE for development. I wanted the debugger window to display the output of various printf statements strewn around the code. I only had the single wire UPDI connection to the AVR microcontroller using an mEDBG debugger. Following is the recipe I came up with. Note that, in contrast to my earlier instructions for ARM, these steps require preparing the source code in advance and making a debug build.

Define a function that wraps around the printf function that is built into avr-libc. It should render the format string and any arguments into a temporary memory buffer and then discard it. Something similar to the following should work. Adjust buf_size depending on the length of lines you need to print out and the amount of spare RAM you have available.

int tp_printf_P(const char *__fmt, ...)
	const int buf_size = 32;
	char buf[buf_size];

	va_list args;

	va_start(args, __fmt);
	vsnprintf_P(buf, buf_size, __fmt, args);

	// <-- put a tracepoint here
	return 0;

We will now define a tracepoint in the IDE that will be called whenever tp_printf_P is called. The tracepoint will read out the contents of the temporary memory buffer and display it in the debugger window. The wrapper is necessary because the built-in printf function in avr-libc outputs strings character-by-character. As far as I know there's is no existing buffer where we could find the entire rendered string like this.

The tracepoint is set up by right-clicking on the marked source line, selecting Breakpoint and Insert Tracepoint in the context menu. This should open Breakpoint settings in the source code view. You should set it up like in the following screenshot and click Close:

Setting up a tracepoint to print out the temporary buffer.

The ,s after the variable name is important. It makes the debugger print out the contents of the buffer as a string instead of just giving you a useless pointer value. This took me a while to figure out. AtmelStudio is just a customized and rebranded version of Microsoft Visual Studio. The section of the manual about tracepoints doesn't mention it, but it turns out that the same format specifiers that can be used in the watch list can also be used in tracepoint messages.

Another thing worth noting is that compiler optimizations may make it impossible to set the tracepoint at this specific point. I haven't seen this happen with the exact code I shown above. It seems my compiler will not optimize out the code even though the temporary buffer isn't used anywhere. However I've encountered this problem elsewhere. If the tracepoint icon on the left of the source code line is an outlined diamond instead of the filled diamond, and you get The breakpoint will not currently be hit message when you hover the mouse over it, this will not work. You will either have to disable some optimization options or modify the code somehow.

Example of a tracepoint that will not work.

To integrate tp_printf_P function into the rest of the code, I suggest defining a macro like the one below. My kprintf can be switched at build time between the true serial output (or whatever else is hooked to the avr-libc to act as stdout), the tracepoint output or it can be turned off for non-debug builds:

#  define kprintf(fmt, ...) printf_P(PSTR(fmt), ##__VA_ARGS__);
#    define kprintf(fmt, ...) tp_printf_P(PSTR(fmt), ##__VA_ARGS__);
#  else
#    define kprintf(fmt, ...)
#  endif

With DEBUG_TRACEPOINT preprocessor macro defined during the build and the tracepoint set up as described above, a print statement like the following:

kprintf("Hello, world!\n");

...will result in the string appearing in the Output window of the debugger like this:

"Hello, world!" string appearing in the debug output window.

Unfortunately the extra double quotes and a newline seem to be mandatory. The Visual Studio documentation suggests that using a ,sb format specifier should print out just the bare string. However this doesn't seem to work in my version of AtmelStudio.

It's certainly better than nothing, but if possible I would still recommend using a true serial port instead of this solution. Apart from the extra RAM required for the string buffer, the tracepoints are quite slow. Each print stops the execution for a few 100s of milliseconds in my case. I find that I can usually get away with prints over a 9600 baud UART in most code that is not particularly time sensitive. However with prints over tracepoints I have to be much more careful not to trigger various timeouts or watchdogs.

I also found this StackExchange question about the same topic. The answer suggests just replacing prints with tracepoints. Indeed "print debugging" has kind of a bad reputation and certainly using tracepoints to monitor specific variables has its place when debugging an issue. However I find that with a well instrumented code that has print statements in strategic places it is hard to beat when you need to understand the big picture of what the code is doing. Prints can often point out problems in places where you wouldn't otherwise think of putting a tracepoint. They also have a benefit of being stored with the code and are not just an ephemeral setting in the IDE.

Posted by Tomaž | Categories: Code | Comments »

My experience with Firefox containers

25.07.2020 19:17

For the past months I've been using the Firefox Multi-Account Containers (MAC) extension. This extension makes it possible to maintain several isolated browser states at a time. By a browser state I mean everything that websites store in the browser: cookies, local storage and so on. In practical terms that means logins on websites with "remember me" functionality, shopping baskets, advertisement network IDs and other similar things. You can setup the extension so that certain websites always open in a certain container. The Temporary Containers (TC) extension further builds upon MAC by dynamically creating and deleting containers as you browse in an attempt to keep the browser from accumulating long-term cookies.

Screenshot of the Firefox Multi-Account Containers extension.

I have a few reasons to use such a setup: First is that I commonly use company and personal accounts on websites. I want to keep these logins completely separate for convenience (no need to constantly log-out and log-in to change accounts). I've also once had an instance where a web shop silently merged accounts based on some hidden browser state. Even though that was most certainly unacceptable behavior on that web shop's end I would like to avoid another case where my personal projects end up on corporate order history.

The second reason is privacy and security. I think is harder to exploit weaknesses in browser's cross-site isolation, or do phishing attacks, if the default browser instance I'm using doesn't store authentication cookies for any important accounts. The fact that most websites see a fresh browser state without cookies also slightly raises the bar for tracking my browsing habits between different websites.

Cookies and Site Data preferences in Firefox.

I used to have Firefox set so that it cleared all cookies after every session. This took care of accumulating cookies, but meant that I needed to continuously re-login to every website. This wasn't such a inconvenience on my end. However recently more and more websites started treating new logins from a cookie-less browser as a security breach. At best I would constantly get mails about it, at worst I would get accounts blocked or thrown into a captcha-hell for unusual behavior.

A typical "sign-in from a new device detected" mail.

I would still prefer to have this setting enabled for default browsing, combined with a few permanent containers for websites that do this sort of unusual behavior detection. However MAC doesn't allow you to set this independently for each container. In theory, using TC fixes that problem. Opening up a website in a fresh temporary container that is used once and then deleted after closing the browser tab has the same effect as clearing cookies.

My foremost problem with containers is that in practical use they don't really contain the state between websites. It's trivial to make a mistake and use the wrong container. If I click a link and open it in a new tab, that tab will inherit the container of the original tab. The same also happens if you enter a new URL manually into the address bar. It's very easy, for example, to follow a link a coworker shared in a web chat and then spend an hour researching related websites. If I forget to explicitly open the link in "a new Temporary Container" that browsing will all happen in the permanent container that I would prefer to only be used for the web chat service. The tab titles get a colored underline that shows what container they are using, but it's easy to overlook that.

All it takes is one such mistake and the container is permanently polluted with cookies and logins from unrelated websites that I would not like to have in there. These will persist, since to retain the web chat login I have to set the browser to retain cookies indefinitely. Over time I found that all permanent containers tend to accumulate cookies and persisting logins for websites I frequent which defeats most of the benefits of using them.

There is the "Always Open This Site in..." option, but it works the other way I would want it to. You can define a list of websites that need to be opened in a certain container, but you can't say that a website outside this list needs to be opened outside of a container (or in a Temporary Container). The "Always Open This Site in..." also has the additional problem that it's annoying if I want to use a website in two containers (e.g. work and personal ones). In that case I have to constantly click through warnings such as this:

Firefox warning about assigned containers.

Again, the Temporary Containers extension attempts to address this. There is a group of preferences called "Isolation". In theory you can set this up so that a new temporary container is automatically opened up when you navigate away to a different website. It also takes effect when using the permanent containers.

It's possible that I don't understand exactly how this works, but I found any setting other than "Never" to not be useful in practice. The problem is with websites that use third-party logins (e.g. a website where you log in with your Google account, but is not hosted under google.com, I'm guessing through a mechanism like OpenID). Isolation completely breaks this authentication flow since the log-in form then opens up in a new container and whatever authentication tokens it sets aren't visible in the original one.

Isolation preferences of the Temporary Containers extension.

Finally, both extensions are somewhat buggy in my experience. For example, I'm occasionally seeing tabs opened in no container at all even though I have Automatic Mode turned on in TC, which should automatically re-open any tab that's not in a container in a temporary one. I can't find a reliable way to reproduce this, but it seems to only happen with tabs that I open from a different application and might be related to this issue.

Cleaning old temporary containers often doesn't work reliably. Again, it's hard to say what exactly isn't working and if that's one of the known bugs or not. I often find that the browser has accumulated tens of temporary containers that I then need to clean up by hand. This is especially annoying since a recent update to MAC made container deletion unnecessarily tedious. It takes 4 mouse clicks to delete one container, so deleting a few tens is a quite a chore.

There is also a very annoying couple of bugs related to Firefox sync. MAC allows the container settings to be synchronized between all Firefox installations linked to the same Firefox account using Mozilla's servers. This is an incredibly useful feature to me since I commonly use multiple computers and often also multiple operating systems on each one. Needless to say, synchronizing browser settings manually is annoying.

Unfortunately, running MAC and TC with sync enabled runs the risk of permanently breaking the Firefox account. Because of the bugs I linked above it's very easy to accumulate so many temporary accounts that you exceeded the storage quota on the sync server. It seems that once that happens the sync server will not even let you delete the offending data that's stored there before erroring out. The result is that add-on sync will no longer work on that account and even cleaning up your local setup afterwards will not fix it.

Errors about insufficient storage in Firefox sync logs.

In conclusion, I'm not very happy with this setup (or the modern web in general, but I digress). Multi-Account Containers is certainly an improvement over a setup with multiple separate Firefox profiles that I was using previously. It does work well enough for keeping work and personal accounts separate. On the other hand, it doesn't seem to be very effective in isolating cookies and other state between browsing sessions. I'm not sure what exactly a better solution would be. I feel like I'm leaning more and more towards a setup where I would just use two completely separate browsers. One for heavy web apps that require a login, persistent local storage, Javascript and another one that's more aggressive at cleaning up after itself for everything else.

Posted by Tomaž | Categories: Code | Comments »

On missing IPv6 router advertisements

03.05.2020 16:58

I've been having problems with Internet connectivity for the past week or so. Randomly connections would timeout and some things would work very slowly or not at all. In the end it turned out to be a problem with IPv6 routing. It seems my Internet service provider is having problems with sending out periodic Router Advertisements and the default route on my router often times out. I've temporarily worked around it by manually adding a route.

I'm running a simple, dual-stack network setup. There's a router serving a LAN. The router is connected over an optical link to the ISP that's doing Prefix Delegation. Problems appeared as intermittent. A lot of software seems to gracefully fall back onto IPv4 if IPv6 stops working, but there's usually a more or less annoying delay before it does that. On the other hand some programs don't and seem to assume that there's global connectivity as long as a host has a globally-routable IPv6 address.

The most apparent and reproducible symptom was that IPv6 pings to hosts outside of LAN often weren't working. At the same time, hosts on the LAN had valid, globally-routable IPv6 addresses, and pings inside the LAN would work fine:

$ ping -6 -n3 host-on-the-internet
connect: Network is unreachable
$ ping -6 -n3 host-on-the-LAN
PING ...(... (2a01:...)) 56 data bytes
64 bytes from ... (2a01:...): icmp_seq=1 ttl=64 time=0.404 ms
64 bytes from ... (2a01:...): icmp_seq=2 ttl=64 time=0.353 ms
64 bytes from ... (2a01:...): icmp_seq=3 ttl=64 time=0.355 ms

--- ... ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2026ms
rtt min/avg/max/mdev = 0.353/0.370/0.404/0.032 ms

Rebooting my router seemed to help for a while, but then the problem would reappear. After some debugging I've found out that the immediate cause of the problems was that the default route on my router would disappear approximately 30 minutes after it has been rebooted. It would then randomly re-appear and disappear a few times a day.

On my router, the following command would return empty most of the time:

$ ip -6 route | grep default

But immediately after a reboot, or if I got lucky, I would get a route. I'm not sure why there are two identical entries here, but the only difference is the from field:

$ ip -6 route | grep default
default from 2a01::... via fe80::... dev eth0 proto static metric 512 pref medium
default from 2a01::... via fe80::... dev eth0 proto static metric 512 pref medium

The following graph shows the number of entries returned by the command above over time. You can see that most of the day router didn't have a default route:

Number of valid routes obtained from RA over time.

The thing that was confusing me the most was the fact that the mechanism for getting the default IPv6 route is distinct from the the way the prefix delegation is done. This means that every device in the LAN can get a perfectly valid, globally-routable IPv6 address, but at the same time there can be no configured route for packets going outside of the LAN.

The route is automatically configured via Router Advertisement (RA) packets, which are part of the Neighbor Discovery Protocol. When my router first connects to the ISP, it sends out a Router Solicitation (RS). In response to the RS, the ISP sends back a RA. The RA contains the link-local address to which the traffic intended for the Internet should be directed to, as well as a Router Lifetime. Router Lifetime sets a time interval for which this route is valid. This lifetime appears to be 30 minutes in my case, which is why rebooting the router seemed to fix the problems for a short while.

The trick is that the ISP should later periodically re-send the RA by itself, refreshing the information and lifetime, hence pushing back the deadline at which the route times out. Normally, a new RA should arrive well before the lifetime of the first one runs out. However in my case, it seemed that for some reason the ISP suddenly started sending out RA's only sporadically. Hence the route would timeout in most cases, and my router wouldn't know where to send the packets that were going outside of my LAN.

To monitor RA packets on the router using tcpdump:

$ tcpdump -v -n -i eth0 "icmp6 && ip6[40] == 134"

This should show packets like the following arriving in intervals that should be much shorter than the advertised router lifetime. On a different, correctly working network, I've seen packets arriving roughly once every 10 minutes with lifetime of 30 minutes:

18:52:01.080280 IP6 (flowlabel 0xb42b9, hlim 255, next-header ICMPv6 (58) payload length: 176)
fe80::... < ff02::1: [icmp6 sum ok] ICMP6, router advertisement, length 176
	hop limit 64, Flags [managed, other stateful], pref medium, router lifetime 1800s, reachable time 0ms, retrans timer 0ms
19:00:51.599538 IP6 (flowlabel 0xb42b9, hlim 255, next-header ICMPv6 (58) payload length: 176) 
fe80::... < ff02::1: [icmp6 sum ok] ICMP6, router advertisement, length 176
	hop limit 64, Flags [managed, other stateful], pref medium, router lifetime 1800s, reachable time 0ms, retrans timer 0ms

However in this case this wasn't happening. Similarly to what the graph above shows, these packets only arrive sporadically. As far as I know, this is an indication that something is wrong on the ISP side. Sending a RA in response to RS seems to work, but periodic RA sending doesn't. Strictly speaking there's nothing that can be done to fix this on my end. My understanding of RFC 4861 is that a downstream host should only send out RS once, after connecting to the link.

Once the host sends a Router Solicitation, and receives a valid Router Advertisement with a non-zero Router Lifetime, the host MUST desist from sending additional solicitations on that interface, until the next time one of the above events occurs.

Indeed, as far as I can see, Linux doesn't have any provisions for re-sending RS in case all routes from a previously received RAs time out. This answer argues that it should, but I can find no references that would confirm this. On the other hand, this answer agrees with me that RS should only be sent when connecting to a link. On that note, I've also found a discussion that mentions blocking multicast packets as a cause of similar problems. I don't believe that is the case here.

In the end I've used an ugly workaround so that things kept working. I've manually added a permanent route that is identical to what is randomly advertised in RA packets:

$ ip -6 route add default via fe80::... dev eth0

Compared to entries originating from RA this manual entry in the routing table won't time out - at least not until my router gets rebooted. It also doesn't hurt anything if additional, identical routes get occasionally added via RA. Of course, it still goes completely against the IPv6 neighbor discovery mechanism. If anything changes on the ISP side, for example if the link-local address of the router changes, the entry won't get updated and the network will break again. However it does seem fix my issues at the moment. The fact that it's working also seems to confirm my suspicion that something is only wrong with RA transmissions on the ISP side, and that actual routing on their end works correctly. I've reported my findings to the ISP and hopefully things will get fixed on their end, but in the mean time, this will have to do.

Posted by Tomaž | Categories: Code | Comments »

Printing .lto_priv symbols in GDB

14.02.2020 16:08

Here's a stupid little GNU debugger detail I've learned recently - you have to quote the names of some variables. When debugging a binary that was compiled with link time optimization, it sometimes appears like you can't inspect certain global variables.

GNU gdb (Debian 7.12-6)
(gdb) print usable_arenas
No symbol "usable_arenas" in current context.

The general internet wisdom seems to be that if a variable is subject to link time optimization it can't be inspected in the debugger. I guess this comes from the similar problem of inspecting private variables that are subject to compiler optimization. In some cases private variables get assigned to a register and don't appear in memory at all.

However, if it's a global variable, accessed from various places in the code, then its value must be stored somewhere, regardless of what tricks the linker does with its location. It's unlikely it would get assigned to a register, even if it's theoretically possible. So after some mucking about in the disassembly to find the address of the usable_arenas variable I was interested in, I was surprised to find out that gdb does indeed know about it:

(gdb) x 0x5617171d2b80
0x5617171d2b80 <usable_arenas.lto_priv.2074>:	0x17215410
(gdb) info symbol 0x5617171d2b80
usable_arenas.lto_priv in section .bss of /usr/bin/python3.5

This suggests that the name has a .lto_priv or a .lto_priv.2074 suffix (Perhaps meaning LTO private variable? It is declared a static variable in C). However I still can't print it:

(gdb) print usable_arenas.lto_priv
No symbol "usable_arenas" in current context.
(gdb) print usable_arenas.lto_priv.2074
No symbol "usable_arenas" in current context.

The trick is not that this is some kind of a special variable or anything. It just has a tricky name. You have to put it in quotes so that gdb doesn't try to interpret the dot as an operator:

(gdb) print 'usable_arenas.lto_priv.2074'
$3 = (struct arena_object *) 0x561717215410

TAB completion also works against you here, since it happily completes the name without the quotes and without the .2074 at the end, giving the impression that it should work that way. It doesn't. If you use completion, you have to add the quotes and the number suffix manually around the completed name (or only press TAB after inputting the leading quote, which works correctly).

Finally, I don't know what the '2074' means, but it seems you need to find that number in order to use the symbol name in gdb. Every LTO-affected variable seems to get a different number assigned. You can find the one you're interested in via a regexp search through the symbol table like this:

(gdb) info variables usable_arenas
All variables matching regular expression "usable_arenas":

File ../Objects/obmalloc.c:
struct arena_object *usable_arenas.lto_priv.2074;
Posted by Tomaž | Categories: Code | Comments »

Checking Webmention adoption rate

25.01.2020 14:42

Webmention is a standard that attempts to give plain old web pages some of the attractions of big, centralized social media. The idea is that web servers can automatically inform each other about related content and actions. In this way a post on a self-hosted blog, like this one, can display backlinks to a post on another server that mentions it. It also makes it possible to implement gimmicks such as a like counter. Webmention is kind of a successor to pingbacks that were popularized some time ago by Wordpress. Work on standardizing Webmention seem to date back to at least 2014 and it has been first published as a working draft by W3C in 2016.

I've first read about Webmention on jlelse's blog. I was wondering what the adoption of this standard is nowadays. Some searching revealed conflicting amounts of enthusiasm for it, but not much recent information. Glenn Dixon wrote in 2017 about giving up on it due to lack of adoption. On the other hand, Ryan Barrett celebrated 1 million sent Webmentions in 2018.

To get a better feel of what the state is in my local web bubble, I've extracted all external links from my blog posts in the last two years (January 2018 to January 2020). That yielded 271 unique URLs on 145 domains from 44 blog posts. I've then used Web::Mention to discover any Webmention endpoints for these URLs. Endpoint discovery is a first step in sending a notification to a remote server about related content. If that fails it likely means that the host doesn't implement the protocol.

The results weren't encouraging. None of the URLs had discoverable endpoints. That means that even if I would implement the sending part of the Webmention protocol on my blog, I wouldn't have sent any mentions in the last two years.

Another thing I wanted to check is if anyone was doing the same in the other direction. Were there any failed incoming attempts to discover an endpoint on my end? Unfortunately there is no good way of determining that from the logs I keep. In theory endpoint discovery can look just like a normal HTTP request. Many Webmention implementations seem to have "webmention" in their User agent header however. According to this heuristic I did likely receive at least 3 distinct requests for endpoint discovery in the last year. It's likely there were more (for example, I know that my log aggregates don't include requests from Wordpress plug-ins due to some filter regexps).

So it seems that implementing this protocol doesn't look particularly inviting from the network effect standpoint. I also wonder if Webmentions would become the spam magnet that pingbacks were back in the day if they reached any kind of wide-spread use. The standard does include a provision for endpoints to verify that the source page indeed links to the destination URL the Webmention request says it does. However to me that protection seems trivial to circumvent and only creates a little more work for someone wanting to send out millions of spammy mentions across the web.

Posted by Tomaž | Categories: Code | Comments »

Radeon performance problem after suspend

08.01.2020 20:07

This is a problem I've encountered on my old desktop box that's still running Debian Stretch. Its most noticeable effect is that large GNOME terminal windows get very laggy and editing files in GVim is almost unusable due to the slow refresh rate. I'm not sure when this first started happening. I suspect it was after I upgraded the kernel to get support for the Wacom Cintiq. However I've only started noticing it much later, so it's possible that some other package upgrade triggered it. Apart from the kernel I can't find anything else (like recent Intel microcode updates) affecting this issue though. On the other hand the hardware here is almost a decade old at this point and way past due for an upgrade, so I'm not completely ruling out that something physical broke.

The ATI Radeon graphic card and the kernel that I'm using:

$ lspci | grep VGA
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] RV710 [Radeon HD 4350/4550]
$ cat /proc/version
Linux version 4.19.0-0.bpo.6-amd64 (debian-kernel@lists.debian.org) (gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1)) #1 SMP Debian 4.19.67-2+deb10u2~bpo9+1 (2019-11-12)

The Radeon-related kernel parameters:

radeon.audio=1 radeon.hard_reset=1

I think I've added hard_reset because of some occasional hangs I was seeing a while ago. I'm not sure if it's still needed with this kernel version and I don't remember having problems with X hanging in recent times. I've also seen this exact performance problem with kernel 4.19.12 from Stretch backports. I can't reproduce the problem on stock Stretch kernel 4.9.189. Other than the kernel from backports I'm using a stock Stretch install of GNOME 3 and X.Org (xserver-xorg-video-radeon version 1:7.8.0-1+b1).

To reproduce the problem, open a GNOME terminal and resize it, say to 132x64 or something similar. The exact size isn't important. Fill the scrollback, for example by catting a large file or running yes for a second. After that, scroll the terminal contents by holding enter on the shell prompt. If everything is working correctly, the scrolling will be smooth. If this bug manifests itself, terminal contents will scroll in large, random increments, seemingly refreshing around once per second.

The second way is to open a largish (say 1000 line) text file in GVim and try to edit it or scroll through it. Again, the cursor will lag significantly after the keyboard input. Interestingly, some applications aren't affected. For example, scrolling in Firefox or Thunderbird will remain smooth. GIMP doesn't seem to be affected much either.

On the affected kernels, I can reliably reproduce this by putting the computer to sleep (suspend to RAM - alt-click on the power button in the GNOME menu) and waking it up. After a fresh reboot, things will run normally. After a suspend, the performance problems described above manifest themselves. There is no indication in dmesg or syslog that anything went wrong at wake up.

I've tracked this down to a problem with Radeon's dynamic power saving feature. It seems that after sleep it gets stuck in its lowest performance setting and doesn't automatically adjust when some application starts actively using the GPU. I can verify that by running the following in a new terminal:

# watch cat /sys/kernel/debug/dri/0/radeon_pm_info

On an idle computer, this should display something like:

uvd    vclk: 0 dclk: 0
power level 0    sclk: 11000 mclk: 25000 vddc: 1100

After a fresh reboot, when scrolling the terminal or contents of a GVim buffer, the numbers normally jump up:

uvd    vclk: 0 dclk: 0
power level 2    sclk: 60000 mclk: 40000 vddc: 1100

However after waking the computer from sleep, the numbers in radeon_pm_info stay constant, regardless of any activity in the terminal window or GVim.

I've found a workaround to get the power management working again. The following script forces the DPM into the high profile and then resets it to whatever it was before (it's auto on my system). This seems to fix the problem and it can be verified through the radeon_pm_info method I described above. Most importantly, this indeed seems to restore the automatic adjustment. According to radeon_pm_info the card doesn't just get stuck again at the highest setting.

$ cat /usr/local/bin/radeon_dpm_workaround.sh

set -eu


CUR=`cat "$DPM_FORCE"`
echo high > "$DPM_FORCE"
sleep 1
echo "$CUR" > "$DPM_FORCE"

To get this to automatically run each time the computer wakes from sleep, I've used the following systemd service file:

$ cat /etc/systemd/system/radeon_dpm_workaround.service



It needs to be enabled via:

# systemctl enable radeon_dpm_workaround.service

As I said in the introduction, this is a fairly old setup (it even still has a 3.5" floppy drive!). However after many years it's still doing its job reasonably well and hence I never seem to find the motivation to upgrade it. This series of Radeon cards does seem to have a somewhat buggy support in open source drivers. I've always had some degree of problems with it. For a long time HDMI audio was very unreliable and another problem that I still see sometimes is that shutting down X hangs the system for several minutes.

Posted by Tomaž | Categories: Code | Comments »

Dropping the "publicsuffix" Python package

02.12.2019 10:50

I have just released version 1.1.1 of the publicsuffix Python package. Baring any major bugs that would affect some popular software package using it, this will be the last release. I've released v1.1.1 because I received a report that a bug in publicsuffix package is preventing installation of GNU Mailman.

In the grand scheme of things, it's not a big deal. It's a small library with a modest number of users. I haven't done any work, short of answering mail about it, since 2015. Drop-in alternatives exist. People that care strongly about the issues I cover below have most likely already switched to one of the forks and rewrites that popped up over the years. For those that don't care, nothing will change. The code still works and the library is still normally installable from PyPi. Debian package continues to exist. The purpose of this post is more to give some closure and to sum up a few mail threads that started back in 2015 and never reached a conclusion.

Screenshot of the publicsuffix package page on PyPi.

I've first released the publicsuffix library back in 2011, two employers and a life ago. Back then there was no easily accessible Python implementation of Mozilla's Public Suffix List. Since I needed one for my work, I've picked up a source file from an abandoned open source project on Google Code (which was just being abandoned by Google around that time). I did some minor work on it to make it usable as a standalone library and published it on PyPi.

I've not used publicsuffix myself for years. Looking back, most of my open source projects that I still maintain seem to be like that. Even though I don't use them, I feel some obligation to do basic maintenance on them and answer support mail. If not for other reasons, then out of a sense that I should give back to the body of free software that I depend so much on in my professional career. Some technical problems are also simply fun to work on and most of the time there's not much pressure.

However one thing that was a source of long discussions about publicsuffix is the way the PSL data is distributed. I've written previously about the issue. In summary, you either distribute stale data with the code or fetch an up-to-date copy via the network, which is a privacy problem. These two are the only options possible and going with one or the other or both was always going to be a problem for someone. I hate software that phones home (well, phones Mozilla in this case) as much as anyone, but it's a problem that me as a mere maintainer of a Python library had no hope of solving, even if I got CC'd in all the threads discussing it.

The Public Suffix List is a funny thing. Ideally, software either should not care about the semantic meaning of domain names or this meaning should be embedded in the basic infrastructure of the Internet (e.g. DNS or something). But alas we don't live in either of those worlds and hence we have a magic text file that lives on a HTTP server somewhere and some software needs to have access to it if it wants to do its thing. No amount of worrying on my part was going to change that.

Screenshot of publicsuffix forks on GitHub.

The other issue that sparked at least one fork of publicsuffix was the fact that I refused to publish the source on GitHub. Even tough there are usually several copies of the publicsuffix code on the GitHub at any time, none of them are mine. I was instead hosting my own git repo and was accepting bug reports and other comments only over email.

Some time ago already GitHub became synonymous with open source. People simply expect a PyPi package to have a GitHub (or GitLab, or BitBucket) point-and-click interface somewhere on the web. The practical problem I have with that is that it hugely increases the amount of effort I have to spend on a project (subjectively speaking - keep in mind this is something I do in my free time). Yes, it makes it trivial for someone to contribute a patch. However in practice I find that it does not result in greater quantity of meaningful patches or bug reports. What it does do is create more work for me dealing with low-effort contributions I must reject.

I'm talking about a daunting asymmetry in communication. Writing two sentences in a hurry in a GitHub issue or pushing a bunch of untested code my way in a pull request can take all of a minute for the submitter. On the other hand, I don't want to discourage people from contributing to free software and I know how frustrating it can be to contribute to open source projects (see my post about drive by contributions). So I try to take some time to study the pull request and write an intelligible and useful answer. However this is simply not sustainable. Looking back I also seem to often fail at not letting my frustration show through in my answer. Hence I feel like requiring contributors to at least know how to use git format-patch and write an email forms a useful barrier to entry. It prevents frustration at both ends and I believe that for a well thought-out contribution, the overhead of opening a mail client should be negligible.

Of course, if the project is not officially present on GitHub you get the current situation, where multiple public copies of the project still exist on GitHub, made by random people for their own use. These copies often keep in my contact details and don't obviously state that the code has been modified and/or is not related to the PyPi releases. This causes confusion, since code on GitHub is not the same as the one on PyPi. People also sometimes reuse version numbers for their own private use that conflict with version numbers on PyPi and so on and so on. It is kind of a damned if you do and damned if you don't situation really.

How can I sum this up? I've maintained this software for around 8 years, well after I left the company for which it was originally developed. During that time people have forked and rewrote it for various, largely non-technical reasons. That's fine. It's how free software is supposed to work and my own package was based on another one that got abandoned. I might still be happy to work on technical issues, but the part that turned out much more exhausting than working on the code was dealing with the social and ideological issues people had with it. It's probably my failing that I've spent so much thought on those. In the end, my own interests have changed as well during that time and finally letting it go does also feel like a stone off my shoulders.

Posted by Tomaž | Categories: Code | Comments »

ZX81 LPRINT bug and software archaeology

04.11.2019 19:07

By some coincidence I happened to stumble upon a week-old, unanswered question posted to Hacker News regarding a bug in Sinclair BASIC on a Timex Sinclair 1000 microcomputer. While I never owned a TS1000, the post attracted my interest. I've studied ZX81, an almost identical microcomputer, extensively when I was doing my research on Galaksija. It also reminded me of a now almost forgotten idea to write a post on some obscure BASIC bugs in Galaksija's ROM that I found mentioned in contemporary literature.

ZX81 exhibited at the Frisk festival.

The question on Hacker News is about the cause of a bug where the computer, when attached to a printer, would print out certain floating point numbers incorrectly. The most famous example, mentioned in the Wikipedia article on Timex Sinclair 1000, is the printout of 0.00001. The BASIC statement:

LPRINT 0.00001

unexpectedly types out the following on paper:


This bug occurs both on Timex Sinclair 1000 as well as on Sinclair ZX81, since both computers share the same ROM code. Only the first zero after the decimal point is printed correctly while the subsequent zeros seem to be replaced with random alphanumeric characters. The non-zero digit at the end is again printed correctly. Interestingly, this only happens when using the LPRINT (line-printer print) statement that makes a hard-copy of the output on paper using a printer. The similar PRINT statement that displays the output on the TV screen works correctly (you can try it out on JtyOne's Online Emulator).

The cause of the bug lies in the code that takes a numerical value in the internal format of the BASIC's floating point calculator and prints out individual characters. One particular part of the code determines the number of zeros after the decimal point and uses a loop to print them out:

L16B2:  NEG                     ; Prepare number of zeros
        LD      B,A             ; to print in B.

        LD      A,$1B           ; Print out character '.'
        RST     10H             ; 

        LD      A,$1C           ; Prepare character '0' 
				; to print out in A.

L16BA:  RST     10H             ; Call "print character" routine
        DJNZ    L16BA           ; and loop back B times.

(This assembly listing is taken from Geoff Wearmouth's disassembly. Comments are mine.)

The restart 10h takes a character code in register A and either prints it out on the screen or sends it to the printer. Restarts are a bit like simple system calls - they are an efficient way to call an often-used routine on the Z80 CPU. The problem lies in the fact that this restart doesn't preserve the contents of the A register. It does preserve the contents of register B and other main registers through the use of the EXX instruction and the shadow registers, however the original contents of A is lost after the call returns.

Since the code above doesn't reset the contents of the A register after each iteration, only the first zero after the decimal point is printed correctly. Subsequent zeros are replaced with whatever was junk left in the A register by the 10h restart code. Solution is to simply adjust the DJNZ instruction to loop back two bytes earlier, to the LD instruction, so that the character code is stored to A in each iteration. You can see this fix in Geoff's customized ZX81 ROM, or in Timex Sinclair 1500 ROM (see line 3835 in this diff between TS1500 and TS1000).

This exact same code is also used when displaying numbers on the TV screen, however in that case it works correctly. The reason is that when set to print to screen, printing character 0 via the 10h restart actually preserves the contents of register A. Looking at the disassembly I suspect that was simply a lucky coincidence and not a conscious decision by the programmer. Any code calling 10h doesn't know whether the printer or the screen is used, and hence must assume that A isn't preserved anyway.

Of course, I'm far from being the first person to write about this particular Sinclair bug. Why then does the post on Hacker News say that there's little information to be found about it? The Wikipedia article doesn't cite a reference for this bug either.

It turns out that during my search for the answer, the three most useful pages were no longer on-line. Paul Farrow's ZX resource centre, S. C. Agate's ZX81 ROMs page and Geoff Wearmouth's Sinclair ROM disassemblies are wonderful historical resources that must have taken a lot of love and effort to put together. Sadly, they are now only accessible through the snapshots on the Internet Archive's Wayback Machine. If I wouldn't know about them beforehand, I probably wouldn't find them now. For the last one you even need to know what particular time range to look at on Archive.org, since the domain was taken over by squatters and recent snapshots only show ads (incidentally, this is also the reason why I'm re-hosting some of its former content).

I feel like we can still learn a lot from these early home computers and I'm happy that questions about them still pop-up in various forums. This LPRINT bug seems to be a case of a faulty generalization. It's a well known type of a mistake where the programmer wrongly generalizes an assumption (10h preserves A) that is in fact only true in a special case (displaying character on screen). History tends to repeat itself and I believe that many of the blunders in modern software wouldn't happen if software developers would be more aware of the history of their trade.

It's sad that these old devices are disappearing and that primary literature sources about them are hard to find, but I find it even more concerning that now it seems also these secondary sources are slowly fading out from general accessibility on the web.

Posted by Tomaž | Categories: Code | Comments »