Double pendulum simulation

16.05.2019 21:05

One evening a few years ago I was sitting behind a desk with a new, fairly powerful computer at my fingertips. The kind of a system where you run top and the list of CPUs doesn't fit in the default terminal window. Whatever serious business I was using it for at the time didn't parallelize very well and I felt most of its potential remained unused. I was curious how well the hardware would truly perform if I could throw at it some computing problem that would be better suited for a massively parallel machine.

Somewhere around that time I also stumbled upon a gallery of some nice videos of chaotic pendulums. These were done in Mathematica and simulated a group of double-pendulums with slightly different initial conditions. I really liked the visualization. Each pendulum is given a different color. They first move in sync, but after a while their movements deviate and the line they trace falls apart into a rainbow.

Simulation of 42 double pendulums.

Image by aWakeOfBuzzards

The simulations published by aWakeOfBuzzards included only 42 pendulums. I guess it's a reference to the Hitchhiker's Guide, but I thought, why not do better than that? Would it be possible to eliminate visual gaps between the traces? Since each simulation of a pendulum is independent, this should be a really nice, embarrassingly parallel problem I was looking for.

I didn't want to spend a lot of time writing code. This was just another crazy idea and I could only rationalize avoiding more important tasks for so long. Since I couldn't run Mathematica on that machine, I couldn't re-use aWakeOfBuzzards's code and rewriting it to Numpy seemed non-trivial. Nonetheless, I still managed to shamelessly copy most of the code from various other sources on the web. For a start, I found a usable piece of physics simulation code in a Matplotlib example.

aWakeOfBuzzards' simulations simply draw the pendulum traces opaquely on top of each other. It appears that the code draws the red trace last, since when all the pendulums move closely together, all other traces get covered and the trail appears red. I wanted to do better. I had CPU cycles to spare after all.

Instead of rendering animation frames in standard red-green-blue color planes, I instead worked with wavelengths of visible light. I assigned each pendulum a specific wavelength and added that emission line to the spectrum for each pixel it occupied. Only when I had a complete spectrum for each pixel I converted that to a RGB tuple. This meant that when all the pendulums were on top of each other, they would be seen as white, since white light is a sum of all wavelengths. When they diverged, the white line would naturally break into a rainbow.

Frames from an early attempt at the pendulum simulation.

For parallelization, I simply used a process pool from Python's multiprocessing package with N - 1 worker processes, where N was the number of processors in the system. The worker processes solved the Runge-Kutta and returned a list of vertex positions. The master process then rendered the pendulums and wavelength data to an RGB framebuffer by abusing the ImageDraw.line from the Pillow library. Since drawing traces behind the pendulums meant that animation frames were not independent of each other, I dropped that idea and instead only rendered the pendulums themselves.

For 30 seconds of simulation this resulted in an approximately 10 GB binary .npy file with raw framebuffer data. I then used another, non-parallel step that used Pillow and FFmpeg to compress it to a more reasonably sized MPEG video file.

Double pendulum Monte Carlo

(Click to watch Double pendulum Monte Carlo video)

Of course, it took several attempts to fine-tune various simulation parameters to get a nice looking result you can find above. This final video is rendered from 200.000 individual pendulum simulations. Initial conditions only differed in the angular velocity of the second pendulum segment, which was chosen from a uniform distribution.

200.000 is not an insanely high number. It manages to blur most of the gaps between the pendulums, but you can still see the cloud occasionally fall apart into individual lines. Unfortunately I didn't seem to note down at the time what bottleneck exactly caused me not to go higher than that. Looking at the code now, it was most likely the non-parallel rendering of the final frames. I was also beyond the point of diminishing returns and probably something like interpolation between the individual pendulum solutions would yield better results than just increasing the number of solutions.

I was recently reminded of this old hack I did and I thought I might share it. It was a reminder of a different time and a trip down the memories to piece the story back together. The project that funded that machine is long concluded and I now spend evenings behind a different desk. I guess using symmetric multiprocessing was getting out of fashion even back then. I would like to imagine that these days someone else is sitting in that lab and wondering similar things next to a GPU cluster.

Posted by Tomaž | Categories: Life | Comments »

Google is eating our mail

25.04.2019 20:06

I've been running a small SMTP and IMAP mail server for many years, hosting a handful of individual mailboxes. It's hard to say when exactly I started. whois says I registered the tablix.org domain in 2005 and I remember hosting a mailing list for my colleagues at the university a bit before that, so I think it's safe to say it's been around 15 years.

Although I don't jump right away on every email-related novelty, I've tried to keep the server up-to-date with well accepted standards over the years. Some of these came for free with Debian updates. Others needed some manual work. For example, I have SPF records and DKIM message signing setup on the domains I use. The server is hosted on commercial static IP space (with the very same IP it first went on-line) and I've made sure with the ISP that correct reverse DNS records are in place.

Homing pigeon

Image by Andreas Trepte CC BY-SA 2.5

From the beginning I've been worrying that my server would be used for sending spam. So I always made sure I did not have an open relay and put in place throughput restrictions and monitoring that would alert me about unusual traffic. In any case, the amount of outgoing mail has stayed pretty minimal over the years. Since I'm hosting just a few personal accounts these days, there have been less than 1000 messages sent to remote servers over SMTP in the last 12 months. I've given up on hosting mailing lists many years ago.

All of this effort paid off and, as far as I'm aware, my server was never listed on any of the public spam black lists.

So why am I writing all of this? Unfortunately, email is starting to become synonymous with Google's mail, and Google's machines have decided that mail from my server is simply not worth receiving. Being a good administrator and a well-behaved player on the network is no longer enough:

550-5.7.1 [...] Our system has detected that this
550-5.7.1 message is likely unsolicited mail. To reduce the amount of spam sent
550-5.7.1 to Gmail, this message has been blocked. Please visit
550-5.7.1  https://support.google.com/mail/?p=UnsolicitedMessageError
550 5.7.1  for more information. ... - gsmtp

Since mid-December last year, I'm regularly seeing SMTP errors like these. Sometimes the same message re-sent right away will not bounce again. Sometimes rephrasing the subject will fix it. Sometimes all mail from all accounts gets blocked for weeks on end until some lucky bit flips somewhere and mail mysteriously gets through again. Since many organizations use Gmail for mail hosting this doesn't happen just for ...@gmail.com addresses. Now every time I write a mail I wonder whether Google's AI will let it through or not. Only when something like this happens you realize just how impossible it is to talk to someone on the modern internet without having Google somewhere in the middle.

Of course, the 550 SMTP error helpfully links to a wholly unhelpful troubleshooting page. It vaguely refers to suspicious looking text and IP history. It points to Bulk Sender Guidelines, but I have trouble seeing myself as a bulk sender with 10 messages sent last week in total. It points to the Postmaster Tools which, after letting me jump through some hoops to authenticate, tells me I'm too small a fish and has no actual data to show.

Screenshot of Google Postmaster Tools.

So far Google has blocked personal messages to friends and family in multiple languages, as well as business mail. I stopped guessing what text their algorithms deem suspicious. What kind of intelligence sees a reply, with the original message referenced in the In-Reply-To header and part quoted, and considers it unsolicited? I don't discount the possibility that there is something misconfigured at my end, but since Google gives no hint and various third-party tools I've tried don't report anything suspicious I've ran out of ideas where else to look.

My server isn't alone with this problem. At work we use Google's mail hosting and I've seen this trigger happy filter from the other end. Just recently I've overlooked an important mail because it ended up in the spam folder. I guess it was pure luck it didn't get rejected at the SMTP layer. With my work email address I'm subscribed to several mailing lists of open source software projects and regularly Google will decide to block this traffic. I know since Mailman sends me a notification that my address caused excessive bounces. What system decides, after months of watching me read these messages and not once seeing me mark one as spam, that I suddenly don't want to receive them ever again?

Screenshot of the mailing list probe message.

I wonder. Google as a company is famously focused on machine learning through automated analytics and bare minimum of human contact. What kind of a signal can they possibly use to train these SMTP rejects? Mail gets rejected at the SMTP level without user's knowledge. There is no way for a recipient to mark it as not-spam since they don't know the message ever existed. In contrast to merely classifying mail into spam/non-spam folders, it's impossible for an unprivileged human to tell the machine it has made a mistake. Only the sender knows the mail got rejected and they don't have any way to report it either. One half of the feedback loop appears to be missing.

I'm sure there is no malicious intent behind this and that there are some very smart people working on spam prevention at Google. However for a metric driven company where a majority of messages are only passed with-in the walled garden, I can see how there's little motivation to work well with mail coming from outside. If all training data is people marking external mail as spam and there's much less data about false positives, I guess it's easy to arrive to a prior that all external mail is spam even with best intentions.

This is my second rant about Google in a short while. I'm mostly indifferent to their search index policies, however this mail problem is much more frustrating. I can switch search engines, but I can't tell other people to go off Gmail. Email used to work, from its 7-bit days onward. It was one standard thing that you could rely on in the ever changing mess of messaging web apps and proprietary lock-ins. And now it's increasingly broken. I hope people realize that if they don't get a reply, perhaps it's because some machine somewhere decided for them that they don't need to know about it.

Posted by Tomaž | Categories: Life | Comments »

Wacom Cintiq 16 on Debian Stretch

15.02.2019 18:47

Wacom Cintiq 16 (DTK-1660/K0-BX) is a drawing tablet that was announced late last year. At around 600€ I believe it's now the cheapest tablet with a display you can buy from Wacom. For a while I've been curious about these things, but the professional Wacoms were just way too expensive and I wasn't convinced by the cheaper Chinese alternatives. I've been using a small Intuos tablet for several years now and Linux support has been flawless from the start. So when I recently saw the Cintiq 16 on sale at a local reseller and heard there are chances that it will work similarly well on Linux I couldn't resist buying one.

Wacom Cintiq 16 with GIMP running on it.

Even though it's a very recent device, the Linux kernel already includes a compatible driver. Regardless, I was prepared for some hacking before it was usable on Debian. I was surprised how little was actually required. As before, the Linux Wacom Project is doing an amazing job, even though Linux support is not officially acknowledged by Wacom as far as I know.

The tablet connects to the computer using two cables: HDMI and USB. HDMI connection behaves just like a normal 1920x1080 60 Hz flat panel monitor and the display works even if support for everything else is missing on the computer side. The HDMI cable also caries an I2C connection that can be used to adjust settings that you would otherwise expect in a menu accessible by buttons on the side of a monitor (aside from a power button, the Cintiq itself doesn't have any buttons).

After loading the i2c-dev kernel module, the ddcutil version 0.9.4 correctly recognized the display. The i2c-2 bus interface in the example below goes through the HDMI cable and was in my case provided by the radeon driver:

$ ddcutil detect
Display 1
   I2C bus:             /dev/i2c-2
   EDID synopsis:
      Mfg id:           WAC
      Model:            Cintiq 16
      Serial number:    ...
      Manufacture year: 2018
      EDID version:     1.3
   VCP version:         2.2
$ ddcutil --bus=2 capabilities
MCCS version: 2.2
Commands:
   Command: 01 (VCP Request)
   Command: 02 (VCP Response)
   Command: 03 (VCP Set)
   Command: 06 (Timing Reply )
   Command: 07 (Timing Request)
   Command: e3 (Capabilities Reply)
   Command: f3 (Capabilities Request)
VCP Features:
   Feature: 02 (New control value)
   Feature: 04 (Restore factory defaults)
   Feature: 05 (Restore factory brightness/contrast defaults)
   Feature: 08 (Restore color defaults)
   Feature: 12 (Contrast)
   Feature: 13 (Backlight control)
   Feature: 14 (Select color preset)
      Values:
         04: 5000 K
         05: 6500 K
         08: 9300 K
         0b: User 1
   Feature: 16 (Video gain: Red)
   Feature: 18 (Video gain: Green)
   Feature: 1A (Video gain: Blue)
   Feature: AC (Horizontal frequency)
   Feature: AE (Vertical frequency)
   Feature: B2 (Flat panel sub-pixel layout)
   Feature: B6 (Display technology type)
   Feature: C8 (Display controller type)
   Feature: C9 (Display firmware level)
   Feature: CC (OSD Language)
      Values:
         01: Chinese (traditional, Hantai)
         02: English
         03: French
         04: German
         05: Italian
         06: Japanese
         07: Korean
         08: Portuguese (Portugal)
         09: Russian
         0a: Spanish
         0d: Chinese (simplified / Kantai)
         14: Dutch
         1e: Polish
         26: Unrecognized value
   Feature: D6 (Power mode)
      Values:
         01: DPM: On,  DPMS: Off
         04: DPM: Off, DPMS: Off
   Feature: DF (VCP Version)
   Feature: E1 (manufacturer specific feature)
      Values: 00 01 02 (interpretation unavailable)
   Feature: E2 (manufacturer specific feature)
      Values: 01 02 (interpretation unavailable)
   Feature: EF (manufacturer specific feature)
      Values: 00 01 02 03 (interpretation unavailable)
   Feature: F2 (manufacturer specific feature)

I didn't play much with these settings, since I found the factory setup sufficient. I did try out setting white balance and contrast and the display responded as expected. For example, to set the white balance to 6500K:

$ ddcutil --bus=2 setvcp 14 5

Behind the USB connection is a hub and two devices connected to it:

$ lsusb -s 1:
Bus 001 Device 017: ID 0403:6014 Future Technology Devices International, Ltd FT232H Single HS USB-UART/FIFO IC
Bus 001 Device 019: ID 056a:0390 Wacom Co., Ltd 
Bus 001 Device 016: ID 056a:0395 Wacom Co., Ltd 
$ dmesg
...
usb 1-6: new high-speed USB device number 20 using ehci-pci
usb 1-6: New USB device found, idVendor=056a, idProduct=0395, bcdDevice= 1.00
usb 1-6: New USB device strings: Mfr=1, Product=2, SerialNumber=0
usb 1-6: Product: Cintiq 16 HUB
usb 1-6: Manufacturer: Wacom Co., Ltd.
hub 1-6:1.0: USB hub found
hub 1-6:1.0: 2 ports detected
usb 1-6.2: new high-speed USB device number 21 using ehci-pci
usb 1-6.2: New USB device found, idVendor=0403, idProduct=6014, bcdDevice= 9.00
usb 1-6.2: New USB device strings: Mfr=1, Product=2, SerialNumber=0
usb 1-6.2: Product: Single RS232-HS
usb 1-6.2: Manufacturer: FTDI
ftdi_sio 1-6.2:1.0: FTDI USB Serial Device converter detected
usb 1-6.2: Detected FT232H
usb 1-6.2: FTDI USB Serial Device converter now attached to ttyUSB0
usb 1-6.1: new full-speed USB device number 22 using ehci-pci
usb 1-6.1: New USB device found, idVendor=056a, idProduct=0390, bcdDevice= 1.01
usb 1-6.1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
usb 1-6.1: Product: Cintiq 16
usb 1-6.1: Manufacturer: Wacom Co.,Ltd.
usb 1-6.1: SerialNumber: ...
input: Wacom Cintiq 16 Pen as /devices/pci0000:00/0000:00:1a.7/usb1/1-6/1-6.1/1-6.1:1.0/0003:056A:0390.000D/input/input62
on usb-0000:00:1a.7-6.1/input0

056a:0395 is the hub and 056a:0390 is the Human Interface Device class device that provides the actual pen input. When the tablet is off but connected to power, the HID disconnects but other two USB devices are still present on the bus. I'm not sure what the UART is for. This thread on GitHub suggests that it offers an alternative way of interfacing with the I2C bus for adjusting the display settings.

On the stock 4.9.130 kernel that comes with Stretch the pen input wasn't working. However, the 4.19.12 kernel from stretch-backports correctly recognizes the devices and as far as I can see works perfectly.

$ cat /proc/version
Linux version 4.19.0-0.bpo.1-amd64 (debian-kernel@lists.debian.org) (gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1)) #1 SMP Debian 4.19.12-1~bpo9+1 (2018-12-30)

I'm using the stock GNOME 3.22 desktop that comes with Stretch. After upgrading the kernel, the tablet correctly showed up in the Wacom Tablet panel in gnome-control-center. Pen settings there also work and I was able to calibrate the input using the Calibrate button. Calibration procedure instructs you to press the pen onto targets shown on the display and looks exactly like when using official software on Mac OS.

Update: It seems that GNOME sometimes messes up the tablet-to-screen mapping. If the stylus suddenly starts to move the cursor on some other monitor instead of displaying it underneath the stylus on the Wacom display, check the key /org/gnome/desktop/peripherals/tablets/056a:0390/display in dconf-editor. It should contain something like ['WAC', 'Cintiq 16', '...']. If it doesn't, delete they key and restart gnome-settings-daemon.

Wacom Tablet panel in the GNOME Control Center.

In GIMP I had to enable the new input devices under Edit, Input Devices and set them to Screen mode, same as with other tablets. You get two devices: one for the pen tip and another one for when you turn the pen around and use the other eraser-like end. These two behave like independent devices in GIMP, so each remembers its own tool settings.

Configure Input Devices dialog in GIMP.

As far as I can see, pen pressure, tilt angle and the two buttons work correctly in GIMP. The only problem I had is that it's impossible to do a right-click on the canvas to open a menu. This was already unreliable on the Intuos and I suspect it has to do with the fact that the pen always moves slightly when you press the button (so GIMP registers click-and-drag rather than a click). With the higher resolution of the Cintiq it makes sense that it's even harder to hold the pen still enough.

Otherwise, GIMP works mostly fine. I found GNOME to be a bit stubborn about where it wants to place new dialog windows. If I have a normal monitor connected alongside the tablet, file and color chooser dialogs often end up on the monitor instead of the tablet. Since I can't use the pen to click on something on the monitor, it forces me to reach for the mouse, which can be annoying.

I've noticed that GIMP sometimes lags behind the pen, especially when dragging the canvas with the middle-click. I didn't notice this before, but I suspect it has to do with the higher pen resolution or update rate of the Cintiq. The display also has a higher resolution than my monitor, so there are more pixels to push around. In any case, my i5-750 desktop computer will soon be 10 years old and is way overdue for an upgrade.


In conclusion, I'm very happy with it even if it was a quite an expensive gadget to buy for an afternoon hobby. After a few weeks I'm still getting used to drawing onto the screen and tweaking tool dynamics in GIMP. The range of pressures the pen registers feels much wider than on the Intuos, although that is probably very subjective. To my untrained eyes the display looks just amazing. The screen is actually bigger than I thought and since it's not easily disconnectable it is forcing me to rethink how to organize my desk. In the end however, my only worry is that my drawing skills are often not on par with owning such a powerful tool.

Posted by Tomaž | Categories: Life | Comments »

Google index coverage

08.02.2019 11:57

It recently came to my attention that Google has a new Search Console where you can see the status of your web site in Google's search index. I checked out what it says for this blog and I was a bit surprised.

Some things I expected, like the number of pages I've blocked in the robots.txt file to prevent crawling (however I didn't know that blocking an URL there means that it can still appear in search results). Other things were weirder, like this old post being soft recognized as a 404 Not Found response. My web server is properly configured and quite capable of sending correct HTTP response codes, so ignoring standards in that regard is just craziness on Google's part. But the thing that caught my eye the most was the number of Excluded pages on the Index Coverage pane:

Screenshot of the Index Coverage pane.

Considering that I have less than a thousand published blog posts this number seemed high. Diving into the details, it turned out that most of the excluded pages were redirects to canonical URLs and Atom feeds for post comments. However at least 160 URL were permalink addresses of actual blog posts (there may be more, because the CSV export only contains the first 1000 URLs).

Index coverage of blog posts versus year of publication.

All of these were in the "crawled, not indexed" category. In their usual hand-waving way, Google describes this as:

The page was crawled by Google, but not indexed. It may or may not be indexed in the future; no need to resubmit this URL for crawling.

I read this as "we know this page exists, there's no technical problem, but we don't consider it useful to show in search results". The older the blog post, the more likely that it was excluded. Google's index apparently contains only around 60% of my content from 2006, but 100% of that published in the last couple of years. I've tried searching for some of these excluded blog posts and indeed they don't show in the results.

I have no intention to complain about my early writings not being shown to Google's users. As long as my web site complies with generally accepted technical standards I'm happy. I write about things that I find personally interesting and what I earnestly believe might be useful information in general. I don't feel entitled to be shown in Google's search results and what they include in their index or not is their own business.

That said, it did made me think. I'm using Google Search almost exclusively to find information on the web. I suspected that they heavily prioritize new over old, but I've never seriously considered that Google might be intentionally excluding parts of the web from their index altogether. I often hear the sentiment how the old web is disappearing. That the long tail of small websites is as good as gone. Some old one-person web sites may indeed be gone for good, but as this anecdote shows, some such content might just not be discoverable through Google.

All this made me switch my default search engine in Firefox to DuckDuckGo. Granted I don't know what they include or exclude from their search either. I have yet to see how well it works, but maybe it isn't such a bad idea to go back to the time where trying several search engines for a query was a standard practice.

Posted by Tomaž | Categories: Life | Comments »

Jahresrückblick

19.01.2019 20:07

And so another year rushed by. In the past twelve months I've published 19 blog posts, written around 600 notebook pages and read 13 books.

Perhaps the largest change last year was that I left my position at the Department of communication systems at the Jožef Stefan Institute. After 7 years behind the same desk I really needed a change in the environment and I already switched to working only part-time the previous fall. This year I only planned to handle closing work on an EU project that was spinning down. I found it hard to do meaningful research work while working on other things most of the week anyway.

"Kein Jammern" poster.

Even though I led my project work to official (and successful) completion I feel like I left a lot of things undone there. There are interesting results left unpublished, hardware forgotten, and not the least my PhD work which got kind of derailed over the course of the last project and was left in some deep limbo with no clear way of getting out. I thought that by stepping away of it all for a few months I will get a clearer perspective on what exactly I want to do with all of this. I miss research work, I don't miss the politics and I have yet to come to any conclusion if and how to proceed.

As a kind of ironic twist, I also unexpectedly got first authorship of a scientific paper last year. According to git log, it took almost exactly 4 years and around 350 commits and it tells a story quite unlike what I initially had in my mind. After countless rejections from various journals I basically gave up on the last submission going through. It was accepted for publication pending more experimental work, which caused a crazy month of hunting down all the same equipment from years ago, spending weekends and nights in the lab and writing up the new results.

A history of the number of journal pages written per month.

I spent most of the rest of my work days at Klevio wearing an electrical engineer's hat. Going from a well equipped institute back to a growing start-up brought new challenges. I was doing some programming and a lot of electronics design and design for manufacture, a field I barely touched with my electronics work at the Institute. In contrast to my previous RF work, here I brushed up on my analog audio knowledge and acoustics. I discovered the joy of meeting endless electromagnetic compatibility requirements for consumer devices.

Not surprisingly, after this I was not doing a lot of electronics in my spare time. I have several hardware projects in a half-finished state still left over from a year ago. I wish to work more on them this year and hopefully also write up some interesting blog posts. Similarly, I was not doing a lot of open source work, short of some low-effort maintenance of my old projects. Giving my talk about developing GIMP plug-ins was an absolute pleasure and definitely the most fun presentation to prepare last year.

Dapper

Drawing has remained my favorite pass-time and a way to fight anxiety, although I sometimes feel conflicted about it. I did countless sketches and looking back I'm happy to see my drawing has improved. I made my first half-way presentable animated short. It was nice to do such a semi-long project from start to completion, although it sometimes started to feel too much like a yet another afternoon job. I have some more ideas and with everything I learned last year I think it would be fun to try my hand at animating something more original, if only I could manage a more relaxed schedule for it.

All in all, looking back at my notes suggests it wasn't such a bad year. Except maybe December, which tends to be the most depressing month for me anyway. As last time, I'm not making any big plans for this year. I'm sure it will be again too short to clear out my personal backlog of interesting things to do and everything else that will want to happen before 2020. I only hope to waste less of it on various time-sinks like Hacker News and other addictive brain candy web sites. These seem to be starting to eat up my days despite my trying to keep my distance.

Posted by Tomaž | Categories: Life | Comments »

Going to 35c3

12.12.2018 19:03

Just a quick note that I've managed to get a ticket for this year's Chaos Communications Congress. So I'll be in Leipzig at the end of the month, in case anyone wants to have a chat. I don't have anything specific planned yet. I haven't been at the congress since 2015 and I don't know how much has changed in the past years since it moved from Hamburg. I've selected some interesting talks on Halfnarp, but I have a tentative plan to spend more time talking with people than camping in the lecture halls. Drop me an email or possibly tweet to @avian2 if you want to meet.

35c3 refreshing memories logo.

Posted by Tomaž | Categories: Life | Comments »

Notes on using GIMP and Kdenlive for animation

08.12.2018 12:01

This summer I made a 60-second hand-drawn animation using GIMP and Kdenlive. I thought I might write down some notes about the process I used and the software problems I had to work around. Perhaps someone else considering a similar endeavor will find them useful.

For sake of completion, the hardware I used for image and video editing was a Debian GNU/Linux system with 8 GB of RAM and an old quad-core i5 750 CPU. For drawing I used a 7" Wacom Intuos tablet. For some rough sketches I also used an old iPad Air 2 with Animation Creator HD and a cheap rubber finger-type stylus. Animation Creator was the only proprietary software I used.

Screenshot of the GIMP Onion Layers plug-in.

I drew all the final animation cels in GIMP 2.8.18, as packaged in Debian Stretch. I've used my Onion Layers plug-in extensively and added several enhancements to it as work progressed: It now has the option to slightly color tint next and previous frames to emphasize in which direction outlines are moving. I also added new shortcuts for adding a layer to all frames and adding a new frame. I found that GIMP will quite happily work with images with several hundreds of layers in 1080p resolution and in general I didn't encounter any big problems with it. The only thing I missed was an easier way to preview the animation with proper timings, although flipping through frames by hand using keyboard shortcuts worked quite well.

For some of the trickier sequences I used the iPad to draw initial sketches. The rubber stylus was much too imprecise to do final outlines with it, but I found drawing directly on the screen more convenient for experimenting. I later imported those sketches into GIMP using my Animation Creator import plug-in and then drew over them.

Comparison of different methods of coloring cels in GIMP.

A cel naively colored using Bucket Fill (left), manually using the Paintbrush Tool (middle) and using the Color Layer plug-in (right).

I've experimented quite a bit with how to efficiently color-in the finished outlines. Normal Bucket Fill leaves transparent pixels if used on soft anti-aliased outlines. The advice I got was to color the outlines manually with a Paintbrush Tool, but I found that much too time consuming. In the end I made a quick-and-ugly plug-in that helped automate the process somewhat. It used thresholding to create a sharp outline on a layer beneath the original outline. I then used Bucket Fill on that layer. This preserved somewhat the the quality of the initial outline.

I mostly used one GIMP XCF file per scene for cels and one file for the backgrounds (some scenes had multiple planes of backgrounds for parallax scrolling). I exported individual cels using the Export Layers into transparent-background PNG files. For the scene where I had two characters I later wished that I had one XCF file per character since that would make it easier to adjust timing.

I imported the background and cels into Kdenlive. Backgrounds were simple static Image Clips. For cels I mostly used Slideshow Clips with a small frame duration (e.g. 3 frame duration for 8 frames per second). For some scenes I instead imported individual cels separately as images and dragged them manually to the timeline if I wanted to adjust timing of individual cels. That was quite time consuming. The cels and backgrounds were composited using a large number of Composite & Transform transitions.

Kdenlive timeline for a single scene.

I was first using Kdenlive 16.12.2 as packaged by Debian but later found it too unstable. I switched to using the 18.04.1 AppImage release from the Kdenlive website. Switching versions was painful, since the project file didn't import properly in the newer version. Most transitions were wrong, so I had to redo much of the editing process.

I initially wanted to do everything in one monolithic Kdenlive project. However this proved to be increasingly inconvenient as work progressed. Even 18.04.1 was getting unstable with a huge number of clips and tracks on the timeline. I also was having trouble properly doing nice-looking dissolves between scenes that involved multiple tracks. Sometimes seemingly independent Composite & Transforms were affecting each other in unpredictable ways. So in the end I mostly ended up with one Kdenlive project per scene. I rendered each such scene to a lossless H.264 and then imported the rendered scenes into a master Kdenlive project for final editing.

Regarding Kdenlive stability, I had the feeling that it doesn't like Color Clips for some reason, or transitions to blank tracks. I'm not sure if there's really some bug or it was just my imagination (I couldn't reliably reproduce any crash that I encountered), but it seemed that the frequency of crashes went down significantly when I put one track with an all-black PNG at the bottom of my projects.

In general, the largest problem I had with Kdenlive was an issue with scaling. My Kdenlive project was 720p, but all my cels and backgrounds were in 1080p. It appeared that sometimes Composite & Transform would use 720p screen coordinates and sometimes 1080p coordinates in unpredictable ways. I think that the renderer implicitly scales down the bottom-most track to the project size, if at some point in time it sees a frame larger than the project size. In the end I couldn't figure the exact logic behind it and I had to resort to randomly experimenting until it worked. Having separate, simpler project files for each scene helped this significantly.

Comparison of Kdenlive scaling quality.

Frame scaled explicitly from 1080p to 720p using Composite & Transform (left) and scaled implicitly during rendering (right).

Another thing I noticed was the the implicit scaling of video tracks seemed to use lower-quality scaling algorithm than the Composite & Transform, resulting in annoying visible changes in image quality. In the end, I forcibly scaled all tracks to 720p using an extra Composite & Transform, even when one was not explicitly necessary.

Comparison of luminosity on transition to black using Composite & Transform and Fade to black.

Comparison of luminosity during a one second transition from a dark scene to black between Fade to black effect and Composite & Transform transition. Fade to black reaches black faster than it should, but luminosity does not jump up and down.

I was initially doing transitions between scenes using the Composite & Transform because I found adjusting the alpha values through keyframes more convenient than setting lengths of Fade to/from black effects. However I noticed that the Composite & Transform seems to have some kind of a rounding issue and transitions using it were showing a lot of bands and flickering in the final render. In the end I switched to using Dissolves and Fade to/from black which looked better.

Finally, some minor details about video encoding I learned. The H.264 quality setting in Kdenlive (CRF) is inverted. Lower values mean higher quality. H.264 uses YUV color space, while my drawings were in RGB color space. After rendering the final video some shades of gray got a bit of a green tint, which was due to the color space conversion in Kdenlive. As far as I know there is no way to help with that. GIMP and PNG only support RGB. In any case, that was quite minor and I was assured that I only saw it since I was starting at these drawings for so long.

"Home where we are" title card.

How to sum this up? Almost every artist I talked with recommended using proprietary animation software and was surprised when I told them what I'm using. I think in general that is reasonable advice (although the prices of such software seem anything but reasonable for a summer project). I was happy to spend some evenings writing code and learning GIMP internals instead of drawing, but I'm pretty sure that's more an exception than the rule.

There were certainly some annoyances that made me doubt my choice of software. Redoing all editing after switching Kdenlive versions was one of those things. Other things were perhaps just me being too focused on minor technical details. In any case, I remember doing some projects with Cinelerra many years ago and I think Kdenlive is a quite a significant improvement over it in terms of user interface and stability. Of course, neither GIMP nor Kdenlive were specifically designed for this kind of work (but I've heard Krita got native support for animations, so it might be worth checking out). If anything, the fact that it was possible for me to do this project shows how flexible open source tools can be, even when they are used in ways they were not meant for.

Posted by Tomaž | Categories: Life | Comments »

A summer animation project

30.11.2018 22:19

In December last year I went to an evening workshop on animation that was part of the Animateka festival in Ljubljana. It was fascinating to hear how a small group of local artists made a short animated film, from writing the script and making a storyboard to the final video. Having previously experimented with drawing a few seconds worth of animation in GIMP I was tempted to try at least once to go through this entire process and try to animate some kind of story. Not that I hoped to make anything remotely on the level that was presented there. They were professionals winning awards, I just wanted to do something for fun.

Unfortunately I was unable to attend the following workshops to learn more. At the time I was working for the Institute, wrapping up final months of an academic project, and at the same time just settling into an industry job. Between two jobs it was unrealistic to find space for another thing on my schedule.

I played with a few ideas here and there and thought I might prepare something to discuss at this year's GalaCon. During the Easter holidays I took a free week and being somewhat burned out I didn't have anything planned. So I used the spare time to step away from the life of an electrical engineer and come up with a short script based on a song I happened to randomly stumble upon on YouTube some weeks before. I tried to be realistic and stick as much as possible to the things I felt confident I could actually draw. By the end of the week I had a very rough 60-second animatic that I nervously shared around.

One frame of the animatic.

At the time I doubted I would actually do the final animation. I was encouraged by the responses I got to the script, but I didn't previously take notes on how much time it took me to do one frame of animation. So I wasn't sure how long it would take to complete a project like that. It just looked like an enormous pile of cels to draw. And then I found in my inbox a mail saying my scientific paper that I had unsuccessfully tried to publish for nearly 4 years got accepted pending a major revision and everything else was put on hold. It was another mad dash to repeat the experiments, process the measurements and catch the re-submission deadline.

By June my work at the Institute ended and I felt very much tired and disappointed and wanted to do something different. With a part of my week freed up I decided to spend the summer evenings working on this animation project I came up with during Easter. I went through some on-line tutorials to refresh my knowledge and, via a recommendation, bought the Animator's Survival Kit book, which naturally flew completely over my head. By August I was able to bother some con-goers in Ludwigsburg with a more detailed animatic and one fully animated scene. I was very grateful for the feedback I got there and found it encouraging that people were seeing some sense in all of it.

Wall full of animation scraps.

At the end of the summer I had the wall above my desk full of scraps and of course I was nowhere near finished. I underestimated how much work was needed and I was too optimistic in thinking that I will be able to dedicate more than a few hours per week on the project. I scaled back my expectations a bit and I made a new, more elaborate production spreadsheet. The spreadsheet said I'll finish in November, which in the end actually turned out to be a rather good estimate. The final render landed on my hard disk on the evening of 30 October.

Production spreadsheet for my animation project.

So here it is, a 60 second video that is the result of about 6 months of on-and-off evening work by a complete amateur. Looking at it one month later, it has unoriginal character design obviously inspired by the My Little Pony franchise. I guess not enough to appeal to the fans of that show and enough to repel everybody else. There's a cheesy semblance of a story. But it means surprisingly much to me and I probably would never finish it if I went with something more ambitious and original.

I've learned quite a lot about animation and video editing. I also wrote quite a bit of code for making this kind of animation possible using a completely free software stack and when time permits I plan to write another post about the technical details. Perhaps some part of my process can be reused by someone with a bit more artistic talent. In the end it was a fun, if eventually somewhat tiring, way to blow off steam after work and reflect on past decisions in life.

Home where we are

(Click to watch Home where we are video)

Posted by Tomaž | Categories: Life | Comments »

Analyzing PIN numbers

13.10.2018 12:17

Since I already had a dump from haveibeenpwned.com on my drive from my earlier password check, I thought I could use this opportunity to do some more analysis on it. Six years ago DataGenetics blog posted a detailed analysis of 4-digit numbers that were found in password lists from various data breaches. I thought it would be interesting to try to reproduce some of their work and see if their findings still hold after a few years and with a significantly larger dataset.

DataGenetics didn't specify the source of their data, except that it contained 3.4 million four-digit combinations. Guessing from the URL, their analysis was published in September 2012. I've done my analysis on the pwned-passwords-ordered-by-hash.txt file downloaded from haveibeenpwned.com on 6 October (inside the 7-Zip archive the file had a timestamp of 11 July 2018, 02:37:47). The file contains 517.238.891 SHA1 hashes with associated frequencies. By searching for SHA1 hashes that correspond to 4-digit numbers from 0000 to 9999, I found that all of them were present in the file. Total sum of their frequencies was 14.479.676 (see my previous post for the method I used to search the file). Hence my dataset was roughly 4 times the size of DataGenetics'.

Here are the top 20 most common numbers appearing in the dump, compared to the rank on the top 20 list from DataGenetics:

nnew nold PIN frequency
1 1 1234 8.6%
2 2 1111 1.7%
3 1342 1.1%
4 3 0000 1.0%
5 4 1212 0.5%
6 8 4444 0.4%
7 1986 0.4%
8 5 7777 0.4%
9 10 6969 0.4%
10 1989 0.4%
11 9 2222 0.3%
12 13 5555 0.3%
13 2004 0.3%
14 1984 0.2%
15 1987 0.2%
16 1985 0.2%
17 16 1313 0.2%
18 11 9999 0.2%
19 17 8888 0.2%
20 14 6666 0.2%

This list looks similar to the results published DataGenetics. The first two PINs are the same, but the distribution is a bit less skewed. In their results, first four most popular PINs accounted for 20% of all PINs, while here they only make up 12%. It seems also that numbers that look like years (1986, 1989, 2004, ...) have become more popular. In their list the only two in the top 20 list were 2000 and 2001.

Cumulative frequency of PINs

DataGenetics found that number 2580 ranked highly in position 22. They concluded that this is an indicator that a lot of these PINs were originally devised on devices with numerical keyboards such as ATMs and phones (on those keyboards, 2580 is straight down the middle column of keys), even though the source of their data were compromised websites where users would more commonly use a 104-key keyboard. In the haveibeenpwned.com dataset, 2580 ranks at position 65, so slightly lower. It is still in top quarter by cumulative frequency.

Here are 20 least common numbers appearing in the dump, again compared to their rank on the bottom 20 list from DataGenetics:

nnew nold PIN frequency
9981 0743 0.00150%
9982 0847 0.00148%
9983 0894 0.00147%
9984 0756 0.00146%
9986 0934 0.00146%
9985 0638 0.00146%
9987 0967 0.00145%
9988 0761 0.00144%
9989 0840 0.00142%
9991 0835 0.00141%
9990 0736 0.00141%
9993 0742 0.00139%
9992 0639 0.00139%
9994 0939 0.00132%
9995 0739 0.00129%
9996 0849 0.00126%
9997 0938 0.00125%
9998 0837 0.00119%
9999 9995 0738 0.00108%
10000 0839 0.00077%

Not surprisingly, most numbers don't appear in both lists. Since these have the lowest frequencies it also means that the smallest changes will significantly alter the ordering. The least common number 8068 in DataGenetics' dump is here in place 9302, so still pretty much at the bottom. I guess not many people choose their PINs after the iconic Intel CPU.

Here is a grid plot of the distribution, drawn in the same way as in the DataGenetics' post. Vertical axis depicts the right two digits while the horizontal axis depicts the left two digits. The color shows the relative frequency in log scale (blue - least frequent, yellow - most frequent).

Grid plot of the distribution of PINs

Many of the same patterns discussed in the DataGenetics' post are also visible here:

  • The diagonal line shows popularity of PINs where left two and right two digits repeat (pattern like ABAB), with further symmetries superimposed on it (e.g. AAAA).

  • The area in lower left corner shows numbers that can be interpreted as dates (vertically MMDD and horizontally DDMM). The resolution is good enough that you can actually see which months have 28, 30 or 31 days.

  • The strong vertical line at 19 and 20 shows numbers that can be interpreted as years. The 2000s are more common in this dump. Not surprising, since we're further into the 21st century than when DataGenetics' analysis was done.

  • Interestingly, there is a significant shortage of numbers that begin with 0, which can be seen as a dark vertical stripe on the left. A similar pattern can be seen in DataGenetics' dump although they don't comment on it. One possible explanation would be if some proportion of the dump had gone through a step that stripped leading zeros (such as a conversion from string to integer and back, maybe even an Excel table?).

In conclusion, the findings from DataGenetics' post still mostly seem to hold. Don't use 1234 for your PIN. Don't choose numbers that have symmetries in them or have years or dates in them. These all significantly increase the chances that someone will be able to guess them. And of course, don't re-use your SIM card or ATM PIN as a password on websites.

Another thing to note is that DataGenetics concluded that their analysis was possible because of leaks of clear-text passwords. However, PINs provide a very small search space of only 10000 possible combinations. It was trivial for me to perform this analysis even though haveibeenpwned.com dump only provides SHA-1 hashes, and not clear-text. With a warmed-up disk cache, the binary search only took around 30 seconds for all 10000 combinations.

Posted by Tomaž | Categories: Life | Comments »

Checking passwords against haveibeenpwned.com

07.10.2018 11:34

haveibeenpwned.com is a useful website that aggregates database leaks with user names and passwords. They have an on-line form where you can check whether your email has been seen in publicly available database dumps. The form also tells you in which dumps they have seen your email. This gives you a clue which of your passwords has been leaked and should be changed immediately.

For a while now haveibeenpwned.com reported that my email appears in a number of combined password and spam lists. I didn't find that surprising. My email address is publicly displayed on this website and is routinely scraped by spammers. If there was any password associated with it, I thought it has come from the 2012 LinkedIn breach. I knew my old LinkedIn password has been leaked, since scam mails I get are commonly mentioning it.

Screenshot of a scam email.

However, it came to my attention that some email addresses are in the LinkedIn leak, but not in these combo lists I appear in. This seemed to suggest that my appearance in those lists might not only be due to the old LinkedIn breach and that some of my other passwords could have been compromised. I thought it might be wise to double-check.

haveibeenpwned.com also provides an on-line form where they can directly check your password against their database. This seems a really bad practice to encourage, regardless of their assurances of security. I was not going to start sending off my passwords to a third-party. Luckily the same page also provides a dump of SHA-1 hashed passwords you can download and check locally. (Update: as multiple readers have pointed out, haveibeenpwned.com also offers an on-line search by partial hash, which seems a reasonable alternative if you don't want to download the whole database).

I used Transmission to download the dump over BitTorrent. After uncompressing the 7-Zip file I ended up with a 22 GB text file with one SHA-1 hash per line:

$ head -n5 pwned-passwords-ordered-by-hash.txt
000000005AD76BD555C1D6D771DE417A4B87E4B4:4
00000000A8DAE4228F821FB418F59826079BF368:2
00000000DD7F2A1C68A35673713783CA390C9E93:630
00000001E225B908BAC31C56DB04D892E47536E0:5
00000006BAB7FC3113AA73DE3589630FC08218E7:2

I chose the ordered by hash version of the file, so hashes are alphabetically ordered. The number after the colon seems to be the number of occurrences of this hash in their database. More popular passwords will have a higher number.

The alphabetical order of the file makes it convenient to do an efficient binary search on it as-is. I found the hibp-pwlookup tool tool for searching the password database, but that requires you to import the data into PostgreSQL, so it seems that the author was not aware of this convenience. In fact, there's already a BSD command-line tool that knows how to do binary search on such text files: look (it's in the bsdmainutils package on Debian).

Unfortunately the current look binary in Debian is somewhat broken and bails out with File too large error on files larger than 2 GB. It needs recompiling with a simple patch to its Makefile to work on this huge password dump. After I fixed this, I was able to quickly look up the hashes:

$ ./look -b 5BAA61E4 pwned-passwords-ordered-by-hash.txt
5BAA61E4C9B93F3F0682250B6CF8331B7EE68FD8:3533661

By the way, Stephane on the Debian bug tracker also mentions a method where you can search the file without uncompressing it first. Since I already had it uncompressed on the disk I didn't bother. Anyway, now I had to automate the process. I used a Python script similar to the following. The check() function returns True if the password in its argument is present in the database:

import subprocess
import hashlib

def check(passwd):
	s = passwd.encode('ascii')
	h = hashlib.sha1(s).hexdigest().upper()
	p = "pwned-passwords-ordered-by-hash.txt"

	try:
		o = subprocess.check_output([
			"./look",
			"-b",
			"%s:" % (h,),
			p])
	except subprocess.CalledProcessError as exc:
		if exc.returncode == 1:
			return False
		else:
			raise

	l = o.split(b':')[1].strip()

	print("%s: %d" % (
		passwd,
		int(l.decode('ascii'))))

	return True

def main():
	check('password')

main()

Before I was able to check those passwords I keep stored in Firefox, I stumbled upon another hurdle. Recent versions do not provide any facility for exporting passwords in the password manager. There are some third-party tools for that, but I found it hard to trust them. I also had my doubts on how complete they are: Firefox has switched many mechanisms for storing passwords over time and re-implementing the retrieval for all of them seems to be a non-trivial task.

In the end, I opted to get them from Browser Console using the following Javascript, written line-by-line into the console (I adapted this code from a post by cor-el on Mozilla forums):

var tokendb = Cc["@mozilla.org/security/pk11tokendb;1"].createInstance(Ci.nsIPK11TokenDB);
var token = tokendb.getInternalKeyToken();
token.login(true);

var passwordmanager = Cc["@mozilla.org/login-manager;1"].getService(Ci.nsILoginManager);
var signons = passwordmanager.getAllLogins({});
var json = JSON.stringify(signons, null, 1);

I simply copy-pasted the value of the json variable here into a text editor and saved it to a file. I used an encrypted volume, so passwords didn't end up in clear-text on the drive.

I hope this will help anyone else to check their passwords against the database of leaks without exposing them to a third-party in the process. Fortunately for me, this check didn't reveal any really nasty surprises. I did find more hits in the database than I'm happy with, however all of them were due to certain poor password policies about which I can do very little.

Posted by Tomaž | Categories: Life | Comments »

Extending GIMP with Python talk

22.09.2018 18:47

This Thursday I presented a talk on writing GIMP plug-ins in Python at the monthly Python Meetup in Ljubljana. I gave a brief guide on how to get started and shared some tips that I personally would find useful when I first started playing with GIMP. I didn't go into the details of the various functions but tried to keep it high-level, showing most of the things live in the Python REPL window and demoing some of the plug-ins I've written.

I wanted to give this talk for two reasons: first was that I struggled to find such a high-level and up-to-date introduction into writing plug-ins. I though it would be useful for anyone else diving into GIMP's Python interface. I tried to prepare slides so that they can serve as a useful document on their own (you can find the slides here, and the example plug-in on GitHub). The slides also include some useful links for further reading.

Annotated "Hello, World!" GIMP plug-in.

The second reason was that I was growing unhappy with topics presented at these meetups. It seemed to me that recently most were revolving about web and Python at the workplace. I can understand why this is so. The event needs sponsors and they want to show potential new hires how awesome it is to work with them. The web is eating the world, Python meetups or otherwise, and it all leads to a kind of a monotonic series of events. So it's been a while since I've heard a fun subject discussed and I thought I might stir things up a bit.

To be honest, initially I was quite unsure whether this was good venue for the talk. I discussed the idea with a few friends that are also regular meetup visitors and their comments encouraged me to go on with it. Afterwards I was immensely surprised at the positive response. I had prepared a short tour of GIMP for the start of the talk, thinking that most people would not be familiar with it. It turned out that almost everyone in the audience used GIMP, so I skipped it entirely. I was really happy to hear that people found the topic interesting and it reminded me what a pleasure it is to give a well received talk.

Posted by Tomaž | Categories: Life | Comments »

Investigating the timer on a Hoover washing machine

20.07.2018 17:47

One day around last Christmas I stepped into the bathroom and found myself standing in water. The 20-odd years old Candy Alise washing machine finally broke a seal. For some time already I was suspecting that it wasn't rinsing the detergent well enough from the clothes and it was an easy decision to just scrap it. So I spent the holidays shopping for a new machine as well as taking apart and reassembling bathroom furniture that was in the way, and fixing various collateral damage on plumbing. A couple of weeks later and thanks to copious amounts of help from my family I was finally able to do laundry again without flooding the apartment in the process.

I bought a Hoover WDXOC4 465 AC combined washer-dryer. My choice was mostly based on the fact that a local shop had it on stock and its smaller depth compared to the old Candy meant a bit more floor space in the small bathroom. Even though I was trying to avoid it, I ended up with a machine with capacitive touch buttons, a NFC interface and a smartphone app.

Front panel on a Hoover WDXOC4 465 AC washer.

After half a year the machine works reasonably well. I gave up on Hoover's Android app the moment it requested to read my phone's address book, but thankfully all the features I care about can be used through the front panel. Comments I saw on the web that the dryer doesn't work have so far turned out to be unfounded. My only complaint is that 3 or 4 times it happened that the washer didn't do a spin cycle when it should. I suspect that the quality of embedded software is less then stellar. Sometimes I hear the inlet valve or the drum motor momentarily stop and engage again and I wonder if that's due to a bug or controller reset or if they are meant to work that way.

Anyway, as most similar machines, this one displays the remaining time until the end of the currently running program. There's a 7-segment display on the front panel that counts down hours and minutes. One of the weirder things I noticed is that the countdown seems to use Microsoft minutes. I would look at the display to see how many minutes are left, and then when I looked again later it would be seem to be showing more time remaining, not less. I was curious how that works and whether it was just my bad memory or whether the machine was really showing bogus values, so I decided to do an experiment.

I took a similar approach to my investigation of the water heater years ago. I loaded up the machine and set it up for a wash-only cycle (6 kg cottons program, 60°C, 1400 RPM spin cycle, stain level 2) and recorded the display on the front panel with a laptop camera. I then read out the timer values from the video with a simple Python program and compared them to the frame timestamps. The results are below: The real elapsed time is on the horizontal axis and the display readout is on the vertical axis. The dashed gray line shows what the timer should ideally display if the initial estimate would be perfect:

Countdown display on a Hoover washer versus real time.

The timer first shows a reasonable estimate of 152 minutes at the start of the program. The filled-in area on the graph around 5 minutes in is when the "Kg Mode" is active. At that time the display shows a little animation as the washer supposedly weighs the laundry. My OCR program got confused by that, so the time values there are invalid. However, after the display returns to the countdown, the time displayed is significantly lower, at just below 2 hours.

"Kg Mode" description from the Hoover manual.

Image by Candy Hoover Group S.r.l.

That seems fine at first - I didn't use a full 6 kg load, so it's reasonable to believe that the machine adjusted the wash time accordingly. However the minutes on the display then count down slower than real time. So much slower in fact that they more than make up for the difference and the machine ends the cycle after 156 minutes, 4 minutes later than the initial estimate.

You can also see that the display jumps up and down during the program, with the largest jump around 100 minutes in. Even when it seems to be running consistently, it will often count a minute down, then up and down again. You can see evidence of that on the graph below that shows differences on the timer versus time. I've verified this visually on the video and it's not an artifact of my OCR.

Timer display differences on a Hoover washer.

Why is it doing that? Is it just buggy software, an oscillator running slow or maybe someone figured that people would be happier when a machine shows a lower number and then slows down the passing minutes? Hard to say. Even if you don't tend to take videos of your washing machine, it's hard not to notice that the timer stands still at the last 1 minute for around 7 minutes.

Incidentally, does anyone know how the weighing function works? I find it hard to believe that they actually measure the weight with a load cell. They must have some cheaper way, perhaps by measuring the time it takes for water level rise up in the drum or something like that. The old Candy machine also advertised this feature and considering that was an electro-mechanical affair without sophisticated electronics it must be a relatively simple trick.

Pixel sampling locations for the 7-segment display OCR.

In conclusion, a few words on how I did the display OCR. I started off with the seven-segment-ocr project by Suyash Kumar I found on GitHub. Unfortunately it didn't work for me. It uses a very clever trick to do the OCR - it just looks at intensity profiles of two horizontal and one vertical line per character per frame. However because my video had low resolution and was recorded at a slight angle no amount of tweaking worked. In the end I just hacked up a script that sampled 24 hand-picked single pixels from the video. It then compared them to a hand-picked threshold and decoded the display values from there. Still, Kumar's project came in useful since I could reuse all of his OpenCV boilerplate.

Recording of a digital clock synchronized to the DCF77 signal.

Since I like to be sure, I also double-checked that my method of getting real time from video timestamps is accurate enough. Using the same setup I recorded a DCF77-synchronized digital clock for two hours. I estimated the relative error between video timestamps and the clock to be -0.15%, which means that after two hours, my timings were at most 11 seconds off. The Microsoft-minute effects I found in Hoover are much larger than that.

Posted by Tomaž | Categories: Life | Comments »

Faking adjustment layers with GIMP layer modes

12.03.2018 19:19

Drawing programs usually use a concept of layers. Each layer is like a glass plate you can draw on. The artwork appears in the program's window as if you were looking down through the stack of plates, with ink on upper layers obscuring that on lower layers.

Example layer stack in GIMP.

Adobe Photoshop extends this idea with adjustment layers. These do not add any content, but instead apply a color correction filter to lower layers. Adjustment layers work in real-time, the color correction gets applied seamlessly as you draw on layers below.

Support for adjustment layers in GIMP is a common question. GIMP does have a Color Levels tool for color correction. However, it can only be applied on one layer at a time. Color Levels operation is also destructive. If you apply it to a layer and want to continue drawing on it, you have to also adjust the colors of your brushes if you want to be consistent. This is often hard to do. It mostly means that you need to leave the color correction operation for the end, when no further changes will be made to the drawing layers and you can apply the correction to a flattened version of the image.

Adjustment layers are currently on the roadmap for GIMP 3.2. However, considering that Debian still ships with GIMP 2.8, this seems like a long way to go. Is it possible to have something like that today? I found some tips on the web on how to fake the adjustment layers using various layer modes. But these are very hand-wavy. Ideally what I would like to do is perfectly replicate the Color Levels dialog, not follow some vague instructions on what to do if I want the picture a bit lighter. That post did give me an idea though.

Layer modes allow you to perform some simple mathematical operations between pixel values on layers, like addition, multiplication and a handful of others. If you fill a layer with a constant color and set it to one of these layer modes, you are in fact transforming pixel values from layers below using a simple function with one constant parameter (the color on the layer). Color Levels operation similarly transforms pixel values by applying a function. So I wondered, would it be possible to combine constant color layers and layer modes in such a way as to approximate arbitrary settings in the color levels dialog?

Screenshot of the Color Levels dialog in GIMP.

If we look at the color levels dialog, there are three important settings: black point (b), white point (w) and gamma (g). These can be adjusted for red, green and blue channels individually, but since the operations are identical and independent, it suffices to focus a single channel. Also note that GIMP performs all operations in the range of 0 to 255. However, the calculations work out as if it was operating in the range from 0 to 1 (effectively a fixed point arithmetic is used with a scaling factor of 255). Since it's somewhat simpler to demonstrate, I'll use the range from 0 to 1.

Let's first ignore gamma (leave it at 1) and look at b and w. Mathematically, the function applied by the Color Levels operation is:

y = \frac{x - b}{w - b}

where x is the input pixel value and y is the output. On a graph it looks like this (you can also get this graph from GIMP with the Edit this Settings as Curves button):

Plot of the Color Levels function without gamma correction.

Here, the input pixel values are on the horizontal axis and output pixel values are on the vertical. This function can be trivially split into two nested functions:

y = f_2(f_1(x))
where
f_1(x) = x - b
f_2(x) = \frac{x}{w-b}

f1 shifts the origin by b and f2 increases the slope. This can be replicated using two layers on the top of the layers stack. GIMP documentation calls these masking layers:

  • Layer mode Subtract, filled with pixel value b,
  • Layer mode Divide, filled with pixel value w - b

Note that values outside of the range 0 - 1 are clipped. This happens on each layer, which makes things a bit tricky for more complicated layer stacks.

The above took care of the black and white point settings. What about gamma adjustment? This is a non-linear operation that gives more emphasis to darker or lighter colors. Mathematically, it's a power function with a real exponent. It is applied on top of the previous linear scaling.

y = x^g

GIMP allows for values of g between 0.1 and 10 in the Color Levels tool. Unfortunately, no layer mode includes an exponential function. However, the Soft light mode applies the following equation:

R = 1 - (1 - M)\cdot(1-x)
y = ((1-x)\cdot M + R)\cdot x

Here, M is the pixel value of the masking layer. If M is 0, this simplifies to:

y = x^2

So by stacking multiple such masking layers with Soft light mode, we can get any exponent that is a multiple of 2. This is still not really useful though. We want to be able to approximate any real gamma value. Luckily, the layer opacity setting opens up some more options. Layer opacity p (again in the range 0 - 1) basically does a linear combination of the original pixel value and the masked value. So taking this into account we get:

y = (1-p)x + px^2

By stacking multiple masking layers with opacity, we can get a polynomial function:

y = a_1 x + a_2 x^2 + x_3 x^3 + \dots

By carefully choosing the opacities of masking layers, we can manipulate the polynomial coefficients an. Polynomials of a sufficient degree can be a very good approximation for a power function with g > 1. For example, here is an approximation using the above method for g = 3.33, using 4 masking layers:

Polynomial approximation for the gamma adjustment function.

What about the g < 1 case? Unfortunately, polynomials don't give us a good approximation and there is no channel mode that involves square roots or any other usable function like that. However, we can apply the same principle of linear combination with opacity settings to multiple saturated divide operations. This effectively makes it possible to piecewise linearly approximate the exponential function. It's not as good as the polynomial, but with enough linear segments it can get very close. Here is one such approximation, for g = 0.33, again using 4 masking layers:

Piecewise linear approximation for the gamma adjustment function.

To test this all together in practice I've made a proof-of-concept GIMP plug-in that implements this idea for gray-scale images. You can get it on GitHub. Note that it needs a relatively recent Scipy and Numpy versions, so if it doesn't work at first, try upgrading them from PyPi. This is how the adjustment layers look like in the Layers window:

Fake color adjustment layer stack in GIMP.

Visually, results are reasonably close to what you get through the ordinary Color Layers tool, although not perfectly identical. I believe some of the discrepancy is caused by rounding errors. The following comparisons show a black-to-white linear gradient with a gamma correction applied. First stripe is the original gradient, second has the gamma correction applied using the built-in Color Levels tool and the third one has the same correction applied using the fake adjustment layer created by my plug-in:

Fake adjustment layer compared to Color Levels for gamma = 0.33.

Fake adjustment layer compared to Color Levels for gamma = 3.33.

How useful is this in practice? It certainly has the non-destructive, real-time property of the Photoshop's adjustment layer. The adjustment is immediately visible on any change in the drawing layers. Changing the adjustment settings, like the gamma value, is somewhat awkward, of course. You need to delete the layers created by the plug-in and create a new set (although an improved plug-in could assist in that). There is no live preview, like in the Color Levels dialog, but you can use Color Levels for preview and then copy-paste values into the plug-in. The multiple layers also clutter the layer list. Unfortunately it's impossible to put them into a layer group, since layer mode operations don't work on layers outside of a group.

My current code only works on gray-scale. The polynomial approximation uses layer opacity setting, and layer opacity can't be set separately for red, green and blue channels. This means that way of applying gamma adjustment can't be adapted for colors. However support for colors should be possible for the piecewise linear method, since in that case you can control the divider separately for each channel (since it's defined by the color fill on the layer). The opacity still stays the same, but I think it should be possible to make it work. I haven't done the math for that though.

Posted by Tomaž | Categories: Life | Comments »

FOSDEM 2018

08.02.2018 11:28

Gašper and I attended FOSDEM last weekend. It is an event about open source and free software development that happens each year in Brussels. I've known about it for some time, mostly because it was quite a popular destination among a group of Kiberpipa alumni a few years ago. I've never managed to join them. This year though I was determined to go. Partly because I couldn't get a ticket for 34C3 and was starting to miss this kind of a crowd, and partly because several topics on the schedule were relevant to one of my current projects.

Marietje Schaake on Next Generation Internet at FOSDEM 2018

FOSDEM turned out to be quite unlike any other event I've attended. I was surprised to hear that there are no tickets or registration. When we arrived at the event it was immediately apparent why that is the case. Basically, for two days the Université libre de Bruxelles simply surrenders their lecture rooms and hallways to software developers. Each room is dedicated to talks on a particular topic, such as software defined radio or Internet of things. Hallways are filled with booths, from companies offering jobs to volunteer projects gathering donations and selling t-shirts. I counted more than 50 tracks, distributed over 5 campus buildings. More than once I found myself lost in the unfamiliar passageways.

Opinions whether the event is too crowded or not seem to differ. I often found the halls unbearably packed and had to step out to get a breath of air, but I do have a low threshold for these kind of things. Mostly cold, wet and windy weather didn't help with crowds indoors either. The lecture rooms themselves had volunteers taking care they were not filled over capacity. This meant that once you got in a room, following the talks was a pleasant experience. However, most rooms had long queues in front. The only way to get in was to show up early in the morning and stay there. I didn't manage to get back into any smaller room after leaving for lunch, so I usually ended up in the biggest auditorium with the keynote talks.

Queues at the food court at FOSDEM 2018.

Speaking about keynotes. I enjoyed Simon Phipps's and Italo Vignoli's talk about the past 20 years of open source. They gave a thorough overview of the topic with some predictions for the future. I found myself thinking whether open source movement really was such a phenomenal success. Indeed it is everywhere behind pervasive web services of today, however the freedom for users to run, study and improve software at their convenience is mostly missing these days where software is hidden behind an opaque web server on someone else's computer. Liam Proven in Alternate histories of computing explored how every generation reinvents solutions and mistakes of their predecessors, with a focus on Lisp machines. It seems that one defining property of computer industry is that implementing a thing from scratch is invariably favored over understanding and building upon existing work. I also recommend Steven Goodwin's talk about how hard it is to get behavior right in simplest things like smart light switches. He gave nice examples why a smart appliance that has not been well thought out will cause more grief than a dumb old one.

From the more technical talks I don't have many recommendations. I haven't managed to get into most of the rooms I had planned for. From hanging around the Internet of things discussions one general sentiment I captured was that MQTT has become a de facto standard for any Internet-connected sensor or actuator. ESP8266 remains as popular as ever for home-brew projects. Moritz Fischer gave a fascinating, but very dense intro into a multitasking scheduler for ARM Cortex microcontrollers that was apparently custom developed by Google for use on their Chromebook laptops. Even though I'm fairly familiar with ARM, he lost me after the first 10 minutes. However if I ever need to look into multitasking on ARM microcontrollers, his slides seem to be a very good reference.

One of the rare moments of dry weather at FOSDEM 2018

I don't regret going to FOSDEM. It has been an interesting experience and the trip has been worth it just for the several ideas Gašper and I got there. I can't complain about the quality of the talks I've seen and although the whole thing seemed chaotic at times it must have been a gargantuan effort on the part of the volunteer team to organize. I will most likely not be going again next year though. I feel like this is more of an event for getting together with a team you've been closely involved with in developing a large open source project. Apart from drive-by patches, I've not collaborated in such projects for years now, so I often felt as an outsider that was more adding to the crowds than contributing to the discussion.

Posted by Tomaž | Categories: Life | Comments »

Tracking notebooks

27.01.2018 20:33

Two months ago, there was a post on Hacker News about keeping a lab notebook. If you like to keep written notes, the training slides from NIH might be interesting to flip through, even if you're not working in a scientific laboratory. In the ensuing discussion, user cgore mentioned that they recommend attaching a Bluetooth tracker on notebooks. In recent years I've developed a quite heavy note taking habit and these notebooks have become increasingly important to me. I carry one almost everywhere I go. I can be absentminded at times and had my share of nightmares about losing all my notes for the last 80 pages. So the idea of a Bluetooth tracker giving me a peace of mind seemed worth exploring further.

cgore mentioned the Tile tracker in their comment. The Tile Slim variant seemed to be most convenient for sticking to a cover of a notebook. After some research I found that the Slovenian company Chipolo was developing a similar credit card-sized device, but in November last year I couldn't find a way to buy one. The Wirecutter's review of trackers was quite useful. In the end, I got one Tile Slim for my notebook.

Tile Slim Bluetooth tracker.

This is the tracker itself. The logo on the top side is a button and there is a piezo beeper on the other side. It is indeed surprisingly thin (about 2.5 mm). Of course if you cover it with a piece of paper (or ten), the bump it makes under the pen is still hard to ignore when writing over it. I mostly use school notebooks with soft covers, so the first challenge was where to put it on the notebook for it to be minimally annoying. I also wanted it to be removable so I could reuse the Tile on a new notebook.

Tile Slim in a paper envelope on a notebook cover.

What I found to work best for me is to put the Tile in a paper pocket. I glue the pocket to the back cover of the notebook, near the spine. I still feel the edges when writing, but it's not too terrible. So far I was making the pockets from ordinary office paper (printable pattern) and they seem to hold up fine for the time the notebook is in active use. They do show noticeable wear though, so maybe using heavier paper (or duct tape) wouldn't be a bad idea. The covers of notebooks I use are thick enough that I haven't noticed that I would be pressing the button on the tile with my pen.

There is no simple way to disassemble the Tile and the battery is not replaceable nor rechargeable. Supposedly it lasts a year and after that you can get a new Tile at a discount. Unless, that is, you live in a country where they don't offer that. Do-it-yourself battery replacement doesn't look like an option either, although I will probably take mine apart when it runs out. I don't particularly like the idea of encouraging more throw-away electronics

Screenshots of the Tile app on Android.

I found the Android app that comes with the tracker quite confusing at first. Half of the screen space is dedicated to "Tips" and it has some kind of a walk-through mode when you first use it. It was not clear to me what was part of the walk-through and what was for real. I ended up with the app in a state where it insisted that my Tile is lost and stubbornly tried to push their crowd-sourced search feature. Some web browsing later I found that I needed to restart my phone, and after that I didn't notice the app erroneously losing contact with the Tile again. In any case, hopefully I won't be opening the app too often.

Once I figured them out, the tracker and the app did seem to do their job reasonably well. For example, if I left the notebook at the office and went home the app would tell me the street address of the office where I left it. I simulated a lost notebook several times and it always correctly identified the last seen address, so that seems encouraging. I'm really only interested in this kind of rough tracking. I'm guessing the phone records the GPS location when it doesn't hear the Bluetooth beacons from the tracker anymore. The Tile also has a bunch of features I don't care about: you can make your phone ring by pressing the button on the Tile or you can make the Tile play a melody. There's also a RSSI ranging kind of thing in the app where it will tell you how far from the Tile you are in the room, but it's as useless as all other such attempts I've seen.

I have lots of problems in a notebook.

With apologies to xkcd

The battery drain on the phone after setting up the app is quite noticeable. I haven't done any thorough tests, but my Motorola Moto G5 Plus went from around 80% charge at the end of the day to around 50%. While before I could usually go two days without a charger, now the battery will be in the red after 24 hours. I'm not sure whether that is the app itself or the fact that now Bluetooth on the phone must be turned on all the times. Android's battery usage screen lists neither the Tile app nor Bluetooth (nor anything else for that matter) as causing significant battery drain.

Last but not least, carrying the Tile on you is quite bad for privacy. The Tile website simply brushes such concerns aside, saying we are all being constantly tracked anyway. But the fact is that using this makes (at least) one more entity aware of your every move. The app does call home with your location, since you can look up current Tile location on the web (although for me, it says I don't own any Tiles, so maybe the call-home feature doesn't actually work). It's another example of an application that would work perfectly fine without any access to the Internet at all but for some reason needs to talk to the Cloud. The Tiles are also trivially trackable by anyone with a Bluetooth dongle and I wonder what kind of authentication is necessary for triggering that battery-draining play-the-melody thing.

I will probably keep using this Tile until it runs out. It does seem to be useful, although I have not yet lost a notebook. I'm not certain I will get another though. The throw-away nature of it is off-putting and I don't like being a walking Bluetooth beacon. However it does make me wonder how much thinner you could make one without that useless button and beeper, and perhaps adding a wireless charger. And also if some kind of a rolling code could be used to prevent non-authorized people from tracking your stuff.

Posted by Tomaž | Categories: Life | Comments »