Analyzing PIN numbers

13.10.2018 12:17

Since I already had a dump from haveibeenpwned.com on my drive from my earlier password check, I thought I could use this opportunity to do some more analysis on it. Six years ago DataGenetics blog posted a detailed analysis of 4-digit numbers that were found in password lists from various data breaches. I thought it would be interesting to try to reproduce some of their work and see if their findings still hold after a few years and with a significantly larger dataset.

DataGenetics didn't specify the source of their data, except that it contained 3.4 million four-digit combinations. Guessing from the URL, their analysis was published in September 2012. I've done my analysis on the pwned-passwords-ordered-by-hash.txt file downloaded from haveibeenpwned.com on 6 October (inside the 7-Zip archive the file had a timestamp of 11 July 2018, 02:37:47). The file contains 517.238.891 SHA1 hashes with associated frequencies. By searching for SHA1 hashes that correspond to 4-digit numbers from 0000 to 9999, I found that all of them were present in the file. Total sum of their frequencies was 14.479.676 (see my previous post for the method I used to search the file). Hence my dataset was roughly 4 times the size of DataGenetics'.

Here are the top 20 most common numbers appearing in the dump, compared to the rank on the top 20 list from DataGenetics:

nnew nold PIN frequency
1 1 1234 8.6%
2 2 1111 1.7%
3 1342 1.1%
4 3 0000 1.0%
5 4 1212 0.5%
6 8 4444 0.4%
7 1986 0.4%
8 5 7777 0.4%
9 10 6969 0.4%
10 1989 0.4%
11 9 2222 0.3%
12 13 5555 0.3%
13 2004 0.3%
14 1984 0.2%
15 1987 0.2%
16 1985 0.2%
17 16 1313 0.2%
18 11 9999 0.2%
19 17 8888 0.2%
20 14 6666 0.2%

This list looks similar to the results published DataGenetics. The first two PINs are the same, but the distribution is a bit less skewed. In their results, first four most popular PINs accounted for 20% of all PINs, while here they only make up 12%. It seems also that numbers that look like years (1986, 1989, 2004, ...) have become more popular. In their list the only two in the top 20 list were 2000 and 2001.

Cumulative frequency of PINs

DataGenetics found that number 2580 ranked highly in position 22. They concluded that this is an indicator that a lot of these PINs were originally devised on devices with numerical keyboards such as ATMs and phones (on those keyboards, 2580 is straight down the middle column of keys), even though the source of their data were compromised websites where users would more commonly use a 104-key keyboard. In the haveibeenpwned.com dataset, 2580 ranks at position 65, so slightly lower. It is still in top quarter by cumulative frequency.

Here are 20 least common numbers appearing in the dump, again compared to their rank on the bottom 20 list from DataGenetics:

nnew nold PIN frequency
9981 0743 0.00150%
9982 0847 0.00148%
9983 0894 0.00147%
9984 0756 0.00146%
9986 0934 0.00146%
9985 0638 0.00146%
9987 0967 0.00145%
9988 0761 0.00144%
9989 0840 0.00142%
9991 0835 0.00141%
9990 0736 0.00141%
9993 0742 0.00139%
9992 0639 0.00139%
9994 0939 0.00132%
9995 0739 0.00129%
9996 0849 0.00126%
9997 0938 0.00125%
9998 0837 0.00119%
9999 9995 0738 0.00108%
10000 0839 0.00077%

Not surprisingly, most numbers don't appear in both lists. Since these have the lowest frequencies it also means that the smallest changes will significantly alter the ordering. The least common number 8068 in DataGenetics' dump is here in place 9302, so still pretty much at the bottom. I guess not many people choose their PINs after the iconic Intel CPU.

Here is a grid plot of the distribution, drawn in the same way as in the DataGenetics' post. Vertical axis depicts the right two digits while the horizontal axis depicts the left two digits. The color shows the relative frequency in log scale (blue - least frequent, yellow - most frequent).

Grid plot of the distribution of PINs

Many of the same patterns discussed in the DataGenetics' post are also visible here:

  • The diagonal line shows popularity of PINs where left two and right two digits repeat (pattern like ABAB), with further symmetries superimposed on it (e.g. AAAA).

  • The area in lower left corner shows numbers that can be interpreted as dates (vertically MMDD and horizontally DDMM). The resolution is good enough that you can actually see which months have 28, 30 or 31 days.

  • The strong vertical line at 19 and 20 shows numbers that can be interpreted as years. The 2000s are more common in this dump. Not surprising, since we're further into the 21st century than when DataGenetics' analysis was done.

  • Interestingly, there is a significant shortage of numbers that begin with 0, which can be seen as a dark vertical stripe on the left. A similar pattern can be seen in DataGenetics' dump although they don't comment on it. One possible explanation would be if some proportion of the dump had gone through a step that stripped leading zeros (such as a conversion from string to integer and back, maybe even an Excel table?).

In conclusion, the findings from DataGenetics' post still mostly seem to hold. Don't use 1234 for your PIN. Don't choose numbers that have symmetries in them or have years or dates in them. These all significantly increase the chances that someone will be able to guess them. And of course, don't re-use your SIM card or ATM PIN as a password on websites.

Another thing to note is that DataGenetics concluded that their analysis was possible because of leaks of clear-text passwords. However, PINs provide a very small search space of only 10000 possible combinations. It was trivial for me to perform this analysis even though haveibeenpwned.com dump only provides SHA-1 hashes, and not clear-text. With a warmed-up disk cache, the binary search only took around 30 seconds for all 10000 combinations.

Posted by Tomaž | Categories: Life | Comments »

Checking passwords against haveibeenpwned.com

07.10.2018 11:34

haveibeenpwned.com is a useful website that aggregates database leaks with user names and passwords. They have an on-line form where you can check whether your email has been seen in publicly available database dumps. The form also tells you in which dumps they have seen your email. This gives you a clue which of your passwords has been leaked and should be changed immediately.

For a while now haveibeenpwned.com reported that my email appears in a number of combined password and spam lists. I didn't find that surprising. My email address is publicly displayed on this website and is routinely scraped by spammers. If there was any password associated with it, I thought it has come from the 2012 LinkedIn breach. I knew my old LinkedIn password has been leaked, since scam mails I get are commonly mentioning it.

Screenshot of a scam email.

However, it came to my attention that some email addresses are in the LinkedIn leak, but not in these combo lists I appear in. This seemed to suggest that my appearance in those lists might not only be due to the old LinkedIn breach and that some of my other passwords could have been compromised. I thought it might be wise to double-check.

haveibeenpwned.com also provides an on-line form where they can directly check your password against their database. This seems a really bad practice to encourage, regardless of their assurances of security. I was not going to start sending off my passwords to a third-party. Luckily the same page also provides a dump of SHA-1 hashed passwords you can download and check locally.

I used Transmission to download the dump over BitTorrent. After uncompressing the 7-Zip file I ended up with a 22 GB text file with one SHA-1 hash per line:

$ head -n5 pwned-passwords-ordered-by-hash.txt
000000005AD76BD555C1D6D771DE417A4B87E4B4:4
00000000A8DAE4228F821FB418F59826079BF368:2
00000000DD7F2A1C68A35673713783CA390C9E93:630
00000001E225B908BAC31C56DB04D892E47536E0:5
00000006BAB7FC3113AA73DE3589630FC08218E7:2

I chose the ordered by hash version of the file, so hashes are alphabetically ordered. The number after the colon seems to be the number of occurrences of this hash in their database. More popular passwords will have a higher number.

The alphabetical order of the file makes it convenient to do an efficient binary search on it as-is. I found the hibp-pwlookup tool tool for searching the password database, but that requires you to import the data into PostgreSQL, so it seems that the author was not aware of this convenience. In fact, there's already a BSD command-line tool that knows how to do binary search on such text files: look (it's in the bsdmainutils package on Debian).

Unfortunately the current look binary in Debian is somewhat broken and bails out with File too large error on files larger than 2 GB. It needs recompiling with a simple patch to its Makefile to work on this huge password dump. After I fixed this, I was able to quickly look up the hashes:

$ ./look -b 5BAA61E4 pwned-passwords-ordered-by-hash.txt
5BAA61E4C9B93F3F0682250B6CF8331B7EE68FD8:3533661

By the way, Stephane on the Debian bug tracker also mentions a method where you can search the file without uncompressing it first. Since I already had it uncompressed on the disk I didn't bother. Anyway, now I had to automate the process. I used a Python script similar to the following. The check() function returns True if the password in its argument is present in the database:

import subprocess
import hashlib

def check(passwd):
	s = passwd.encode('ascii')
	h = hashlib.sha1(s).hexdigest().upper()
	p = "pwned-passwords-ordered-by-hash.txt"

	try:
		o = subprocess.check_output([
			"./look",
			"-b",
			"%s:" % (h,),
			p])
	except subprocess.CalledProcessError as exc:
		if exc.returncode == 1:
			return False
		else:
			raise

	l = o.split(b':')[1].strip()

	print("%s: %d" % (
		passwd,
		int(l.decode('ascii'))))

	return True

def main():
	check('password')

main()

Before I was able to check those passwords I keep stored in Firefox, I stumbled upon another hurdle. Recent versions do not provide any facility for exporting passwords in the password manager. There are some third-party tools for that, but I found it hard to trust them. I also had my doubts on how complete they are: Firefox has switched many mechanisms for storing passwords over time and re-implementing the retrieval for all of them seems to be a non-trivial task.

In the end, I opted to get them from Browser Console using the following Javascript, written line-by-line into the console (I adapted this code from a post by cor-el on Mozilla forums):

var tokendb = Cc["@mozilla.org/security/pk11tokendb;1"].createInstance(Ci.nsIPK11TokenDB);
var token = tokendb.getInternalKeyToken();
token.login(true);

var passwordmanager = Cc["@mozilla.org/login-manager;1"].getService(Ci.nsILoginManager);
var signons = passwordmanager.getAllLogins({});
var json = JSON.stringify(signons, null, 1);

I simply copy-pasted the value of the json variable here into a text editor and saved it to a file. I used an encrypted volume, so passwords didn't end up in clear-text on the drive.

I hope this will help anyone else to check their passwords against the database of leaks without exposing them to a third-party in the process. Fortunately for me, this check didn't reveal any really nasty surprises. I did find more hits in the database than I'm happy with, however all of them were due to certain poor password policies about which I can do very little.

Posted by Tomaž | Categories: Life | Comments »

Extending GIMP with Python talk

22.09.2018 18:47

This Thursday I presented a talk on writing GIMP plug-ins in Python at the monthly Python Meetup in Ljubljana. I gave a brief guide on how to get started and shared some tips that I personally would find useful when I first started playing with GIMP. I didn't go into the details of the various functions but tried to keep it high-level, showing most of the things live in the Python REPL window and demoing some of the plug-ins I've written.

I wanted to give this talk for two reasons: first was that I struggled to find such a high-level and up-to-date introduction into writing plug-ins. I though it would be useful for anyone else diving into GIMP's Python interface. I tried to prepare slides so that they can serve as a useful document on their own (you can find the slides here, and the example plug-in on GitHub). The slides also include some useful links for further reading.

Annotated "Hello, World!" GIMP plug-in.

The second reason was that I was growing unhappy with topics presented at these meetups. It seemed to me that recently most were revolving about web and Python at the workplace. I can understand why this is so. The event needs sponsors and they want to show potential new hires how awesome it is to work with them. The web is eating the world, Python meetups or otherwise, and it all leads to a kind of a monotonic series of events. So it's been a while since I've heard a fun subject discussed and I thought I might stir things up a bit.

To be honest, initially I was quite unsure whether this was good venue for the talk. I discussed the idea with a few friends that are also regular meetup visitors and their comments encouraged me to go on with it. Afterwards I was immensely surprised at the positive response. I had prepared a short tour of GIMP for the start of the talk, thinking that most people would not be familiar with it. It turned out that almost everyone in the audience used GIMP, so I skipped it entirely. I was really happy to hear that people found the topic interesting and it reminded me what a pleasure it is to give a well received talk.

Posted by Tomaž | Categories: Life | Comments »

Getting ALSA sound levels from a command-line

01.09.2018 14:33

Sometimes I need to get an idea of signal levels coming from an ALSA capture device, like a line-in on a sound card, and I only have a ssh session available. For example, I've recently been exploring a case of radio-frequency interference causing audible noise in an audio circuit of an embedded device. It's straightforward to load up raw samples into a Jupyter notebook or Audacity and run some quick analysis (or simply listen to the recording on headphones). But what I've really been missing is just a simple command-line tool that would show me some RMS numbers. This way I wouldn't need to transfer potentially large wav files around quite so often.

I felt like a command-line VU meter was something that should have already existed, but finding anything like that on Google turned out elusive. Overtime I've ended up with a bunch of custom half-finished Python scripts. However Python is slow, especially on low-powered ARM devices, so I was considering writing a more optimized version in C. Luckily I've recently come across this recipe for sox which does exactly what I want. Sox is easily apt-gettable on all Debian flavors I commonly care about and even on slow CPUs the following doesn't take noticeable longer than it takes to record the data:

$ sox -q -t alsa -d -n stats trim 0 5
             Overall     Left      Right
DC offset  -0.000059 -0.000059 -0.000046
Min level  -0.333710 -0.268066 -0.333710
Max level   0.273834  0.271820  0.273834
Pk lev dB      -9.53    -11.31     -9.53
RMS lev dB    -25.87    -26.02    -25.73
RMS Pk dB     -20.77    -20.77    -21.09
RMS Tr dB     -32.44    -32.28    -32.44
Crest factor       -      5.44      6.45
Flat factor     0.00      0.00      0.00
Pk count           2         2         2
Bit-depth      15/16     15/16     15/16
Num samples     242k
Length s       5.035
Scale max   1.000000
Window s       0.050

This captures 5 seconds of audio (trim 0 5) from the default ALSA device (-t alsa -d), runs it through the stats filter and discards the samples without saving them to a file (-n). The stats filter calculates some useful statistics and dumps them to standard error. A different capture device can be selected through the AUDIODEV environment variable.

The displayed values are pretty self-explanatory: min and max levels show extremes in the observed samples, scaled to ±1 (so they are independent of the number of bits in ADC samples). Root-mean-square (RMS) statistics are scaled so that 0 dB is the full-scale of the ADC. In addition to overall mean RMS level you also get peak and trough values measured over a short sliding window (length of this window is configurable, and is shown on the last line of the output). This gives you some idea of the dynamics in the signal as well as the overall volume. Description of other fields can be found in the sox(1) man page.

Posted by Tomaž | Categories: Code | Comments »

Accuracy of a cheap USB multimeter

12.08.2018 11:08

Some years ago I bought this cheap USB gadget off Amazon. You connect it in-line with a USB-A device and it shows voltage and current on the power lines of a USB bus. There were quite a few slightly different models available under various brands. Apart from Muker mine doesn't have any specific model name visible (a similar product is currently listed as Muker V21 on Amazon). I chose this particular model because it looked like it had a nicer display than the competition. It also shows the time integral of the current to give an estimate of the charge transferred.

Muker V21 USB multimeter.

Given that it cost less than 10 EUR when I bought it I didn't have high hopes as to its accuracy. I found it useful to quickly check for life signs in a USB power supply, and to check whether a smartphone switches to fast charge or not. However I wouldn't want to use it for any kind of measurement where accuracy is important. Or would I? I kept wondering how accurate it really is.

For comparison I took a Voltcraft VC220 multimeter. True, this is not a particularly high-precision instrument either, but it does come with a list of measurement tolerances in its manual. At least I can check whether the Muker's readings fall with-in the error range of VC220.

My first test measured open-circuit voltage. Muker's input port was connected to a lab power supply and the output port was left empty. I recorded the voltage displayed by the VC220 and Muker for a range of set values (VC220 was measuring voltage on the input port). Below approximately 3.3 V Muker's display went off, so that seems to be the lowest voltage that can be measured. On the graph below you can see a comparison. Error bars show the VC220's measurement tolerance for the 20 V DC range I used in the experiment.

Comparison of voltage measurements.

In the second test I connected an electronic DC load to Muker's output port and varied the load current. I recorded the current displayed by the VC220 (on the output port) and Muker. Again, error bars on the graph below show the maximum VC220 error as per its manual.

Comparison of current measurements.

Finally, I was interested in how much resistance Muker adds onto the supply line. I noticed that some devices will not fast charge when Muker is connected in-line, but will do so when connected directly to the charger. The USB data lines seems to pass directly through the device, so USB DCP detection shouldn't be the cause of that.

I connected Muker with a set of USB-A test leads to a lab power supply and a DC load. The total cable length was approximately 90 cm. With 2.0 A of current flowing through the setup I measured the voltages on both sides of the leads. I did the measurement with the Muker connected in the middle of the leads and with just the two leads connected together:

With Muker Leads only
Current [A] 2.0 2.0
Voltage, supply side [V] 5.25 5.25
Voltage, load side [V] 3.61 4.15
Total resistance [mΩ] 820 550

Muker V21 connected to USB test leads.

In summary, the current displayed by the Muker device seems to be reasonably correct. Measurements fell well with-in the tolerances of the VC220 multimeter, with the two instruments deviating for less than 20 mA in most of the range. Only at 1.8 A and above did the difference start to increase. Voltage readings seem much less accurate and Muker measurements appeared too high. However they did fall with-in the measurement tolerances between 4.2 and 5.1 V.

In regard to increased resistance of the power supply, it seems that Muker adds about 270 mΩ to the supply line. I suspect a significant part of that is contact resistance of the one additional USB-A connector pair in the line. The values I measured did differ quite a lot if I wiggled the connectors.

Posted by Tomaž | Categories: Analog | Comments »

Investigating the timer on a Hoover washing machine

20.07.2018 17:47

One day around last Christmas I stepped into the bathroom and found myself standing in water. The 20-odd years old Candy Alise washing machine finally broke a seal. For some time already I was suspecting that it wasn't rinsing the detergent well enough from the clothes and it was an easy decision to just scrap it. So I spent the holidays shopping for a new machine as well as taking apart and reassembling bathroom furniture that was in the way, and fixing various collateral damage on plumbing. A couple of weeks later and thanks to copious amounts of help from my family I was finally able to do laundry again without flooding the apartment in the process.

I bought a Hoover WDXOC4 465 AC combined washer-dryer. My choice was mostly based on the fact that a local shop had it on stock and its smaller depth compared to the old Candy meant a bit more floor space in the small bathroom. Even though I was trying to avoid it, I ended up with a machine with capacitive touch buttons, a NFC interface and a smartphone app.

Front panel on a Hoover WDXOC4 465 AC washer.

After half a year the machine works reasonably well. I gave up on Hoover's Android app the moment it requested to read my phone's address book, but thankfully all the features I care about can be used through the front panel. Comments I saw on the web that the dryer doesn't work have so far turned out to be unfounded. My only complaint is that 3 or 4 times it happened that the washer didn't do a spin cycle when it should. I suspect that the quality of embedded software is less then stellar. Sometimes I hear the inlet valve or the drum motor momentarily stop and engage again and I wonder if that's due to a bug or controller reset or if they are meant to work that way.

Anyway, as most similar machines, this one displays the remaining time until the end of the currently running program. There's a 7-segment display on the front panel that counts down hours and minutes. One of the weirder things I noticed is that the countdown seems to use Microsoft minutes. I would look at the display to see how many minutes are left, and then when I looked again later it would be seem to be showing more time remaining, not less. I was curious how that works and whether it was just my bad memory or whether the machine was really showing bogus values, so I decided to do an experiment.

I took a similar approach to my investigation of the water heater years ago. I loaded up the machine and set it up for a wash-only cycle (6 kg cottons program, 60°C, 1400 RPM spin cycle, stain level 2) and recorded the display on the front panel with a laptop camera. I then read out the timer values from the video with a simple Python program and compared them to the frame timestamps. The results are below: The real elapsed time is on the horizontal axis and the display readout is on the vertical axis. The dashed gray line shows what the timer should ideally display if the initial estimate would be perfect:

Countdown display on a Hoover washer versus real time.

The timer first shows a reasonable estimate of 152 minutes at the start of the program. The filled-in area on the graph around 5 minutes in is when the "Kg Mode" is active. At that time the display shows a little animation as the washer supposedly weighs the laundry. My OCR program got confused by that, so the time values there are invalid. However, after the display returns to the countdown, the time displayed is significantly lower, at just below 2 hours.

"Kg Mode" description from the Hoover manual.

Image by Candy Hoover Group S.r.l.

That seems fine at first - I didn't use a full 6 kg load, so it's reasonable to believe that the machine adjusted the wash time accordingly. However the minutes on the display then count down slower than real time. So much slower in fact that they more than make up for the difference and the machine ends the cycle after 156 minutes, 4 minutes later than the initial estimate.

You can also see that the display jumps up and down during the program, with the largest jump around 100 minutes in. Even when it seems to be running consistently, it will often count a minute down, then up and down again. You can see evidence of that on the graph below that shows differences on the timer versus time. I've verified this visually on the video and it's not an artifact of my OCR.

Timer display differences on a Hoover washer.

Why is it doing that? Is it just buggy software, an oscillator running slow or maybe someone figured that people would be happier when a machine shows a lower number and then slows down the passing minutes? Hard to say. Even if you don't tend to take videos of your washing machine, it's hard not to notice that the timer stands still at the last 1 minute for around 7 minutes.

Incidentally, does anyone know how the weighing function works? I find it hard to believe that they actually measure the weight with a load cell. They must have some cheaper way, perhaps by measuring the time it takes for water level rise up in the drum or something like that. The old Candy machine also advertised this feature and considering that was an electro-mechanical affair without sophisticated electronics it must be a relatively simple trick.

Pixel sampling locations for the 7-segment display OCR.

In conclusion, a few words on how I did the display OCR. I started off with the seven-segment-ocr project by Suyash Kumar I found on GitHub. Unfortunately it didn't work for me. It uses a very clever trick to do the OCR - it just looks at intensity profiles of two horizontal and one vertical line per character per frame. However because my video had low resolution and was recorded at a slight angle no amount of tweaking worked. In the end I just hacked up a script that sampled 24 hand-picked single pixels from the video. It then compared them to a hand-picked threshold and decoded the display values from there. Still, Kumar's project came in useful since I could reuse all of his OpenCV boilerplate.

Recording of a digital clock synchronized to the DCF77 signal.

Since I like to be sure, I also double-checked that my method of getting real time from video timestamps is accurate enough. Using the same setup I recorded a DCF77-synchronized digital clock for two hours. I estimated the relative error between video timestamps and the clock to be -0.15%, which means that after two hours, my timings were at most 11 seconds off. The Microsoft-minute effects I found in Hoover are much larger than that.

Posted by Tomaž | Categories: Life | Comments »

Monitoring HomePlug AV devices

23.05.2018 18:51

Some time ago I wanted to monitor the performance of a network of Devolo dLAN devices. These are power-line Ethernet adapters. Each device looks like a power brick with a standard Ethernet RJ45 port. You can plug several of these into wall sockets around a building and theoretically they will together act as an Ethernet bridge, linking all their ports as if they were connected to a single network. The power-line network in question seemed to be having intermittent problems, but without some kind of a log it was hard to pin-point exactly what was the problem.

I have very little experience with power-line networks and some quick web searches yielded conflicting information about how these things work and what protocols are at work behind the curtains. Purely from the user perspective, the experience seems to be similar to wireless LANs. While individual devices have flashy numbers written on them, such as 500 Mbps, these are just theoretical "up to" throughputs. In practice, bandwidth of individual links in the network seems to be dynamically adjusted based on signal quality and is commonly quite a bit lower than advertised.

Devolo Cockpit screenshot.

Image by devolo.com

Devolo provides an application called Cockpit that allows you to configure the devices and visualize the power-line network. The feature I was most interested in was the real-time display of the physical layer bitrate for each individual link in the network. While the Cockpit is available for Linux, it is a user friendly point-and-click graphical application and chances were small that I would be able to integrate it into some kind of an automated monitoring process. The prospect of decoding the underlying protocol seemed easier. So I did a packet capture with Wireshark while the Cockpit was fetching the bitrate data:

HomePlug AV protocol decoded in Wireshark.

Wireshark immediately showed the captured packets as part of the HomePlug AV protocol and provided a nice decode. This finally gave me a good keyword I could base my further web searches on, which revealed a helpful white paper with some interesting background technical information. HomePlug AV physical layer apparently uses frequencies in the range of 2 - 28 MHz using OFDM with adaptive number of bits per modulation symbol. The network management is centralized, using a coordinator and a mix of CSMA/CA and TDMA access.

More importantly, the fact that Wireshark decode showed bitrate information in plain text gave me confidence that replicating the process of querying the network would be relatively straightforward. Note how the 113 Mbit/sec in the decode directly corresponds to hex 0x71 in raw packet contents. It appeared that only two packets were involved, a Network Info Request and a Network Info Confirm:

HomePlug AV Network Info Confirmation packet decode.

However before diving directly into writing code from scratch I came across the Faifa command-line tool on GitHub. The repository seems to be a source code dump from a now-defunct dev.open-plc.org web site. There is very little in terms of documentation or clues to its progeny. Last commit was in 2016. However a browse through its source code revealed that it is capable of sending the 0xa038 Network Info Request packet and receiving and decoding the corresponding 0xa039 Network Info Confirm reply. This was exactly what I was looking for.

Some tweaking and a compile later I was able to get the bitrate info from my terminal. Here I am querying one device in the power-line network (its Ethernet address is in the -a parameter). The queried device returns the current network coordinator and a list of stations it is currently connected to, together with the receive and transmit bitrates for each of those connections:

# faifa -i eth4 -a xx:xx:xx:xx:xx:xx -t a038
Faifa for HomePlug AV (GIT revision master-5563f5d)

Started receive thread
Frame: Network Info Request (Vendor-Specific) (0xA038)

Dump:
Frame: Network Info Confirm (Vendor-Specific) (A039), HomePlug-AV Version: 1.0
Network ID (NID): xx xx xx xx xx xx xx
Short Network ID (SNID): 0x05
STA TEI: 0x24
STA Role: Station
CCo MAC: 
	xx:xx:xx:xx:xx:xx
CCo TEI: 0xc2
Stations: 1
Station MAC       TEI  Bridge MAC        TX   RX  
----------------- ---- ----------------- ---- ----
xx:xx:xx:xx:xx:xx 0xc2 xx:xx:xx:xx:xx:xx 0x54 0x2b
Closing receive thread

The original tool had some annoying problems that I needed to work around before deploying it to my monitoring system. Most of all, it operated by sending the query with Ethernet broadcast address as the source. It then put the local network interface into promiscuous mode to listen for broadcasted replies. This seemed like bad practice and created problems for me, least of which was log spam with repeated kernel warnings about promiscuous mode enters and exits. It's possible that the use of broadcasts was a workaround for hardware limitation on some devices, but the devices I tested (dLAN 200 and dLAN 550) seem to reply just fine to queries from non-broadcast addresses.

I also fixed a race condition that was in the original tool due to the way it received the replies. If multiple queries were running on the same network simultaneously sometimes replies from different devices became confused. Finally, I fixed some rough corners regarding libpcap usage that prevented the multi-threaded Faifa process from exiting cleanly once a reply was received. I added a -t command-line option for sending and receiving a single packet.

As usual, the improved Faifa tool is available in my fork on GitHub:

$ git clone https://github.com/avian2/faifa.git

To conclude, here is an example of bitrate data I recorded using this approach. It shows transmit bitrates reported by one device in the network to two other devices (here numbered "station 1" and "station 2"). The data was recorded over the course of 9 days and the network was operating normally during this time:

Recorded PHY bitrate for two stations.

Even this graph shows some interesting things. Some devices (like the "station 1" here) seem to enter a power saving mode. Such devices don't appear in the network info reports, which is why data is missing for some periods of time. Even out of power saving mode, devices don't seem to update their reported bitrates if there is no data being transferred on that specific link. I think this is why the "station 2" here seems to have long periods where the reported bitrate remains constant.

Posted by Tomaž | Categories: Code | Comments »

On the output voltage of a real flyback converter

13.05.2018 13:18

I was recently investigating a small switch-mode power supply for a LED driver. When I was looking at the output voltage waveform on an oscilloscope it occurred to me that it looked very little like the typical waveforms that I remember from university lectures. So I thought it would be interesting to briefly explain here why it looks like that and where the different parts of the waveform come from.

The power supply I was looking at is based on the THX203H integrated circuit. It's powered from line voltage (230 V AC) and uses an isolated (off-line) flyback topology. The circuit is similar to the one shown for a typical application in the THX203H datasheet. The switching frequency is around 70 kHz. Below I modified the original schematic to remove the pi filter on the output which wasn't present in the circuit I was working with:

Schematic of a flyback converter based on THX203H.

When this power supply was loaded with its rated current of 1 A, this is how the output voltage on the output terminals looked like on an oscilloscope:

Output voltage of a flyback converter on an oscilloscope.

If you recall the introduction into switch mode voltage converters, they operate by charging a coil using the input voltage and discharging it into the output. A regulator keeps the duty cycle of the switch adjusted so that the mean inductor current is equal to the load current. For flyback converters, a typical figure you might recall for discontinuous operation is something like the following:

Idealized coil currents and output voltage in a flyback converter.

From top to bottom are the primary coil current, secondary coil current and output voltage, not drawn to scale on the vertical axis. First the ferrite core is charged by the increasing current in the primary coil. Then the controller turns off the primary current and the core discharges through the secondary coil into the output capacitor.

Note how the ripple on the oscilloscope looks more like the secondary current than the idealized output voltage waveform. The main realization here is that the ripple in this case is defined mostly by the output capacitor's equivalent series resistance (ESR) rather than it's capacitance. The effect of ESR is ignored in the idealized graphs above.

In this power supply, the 470 μF 25 V aluminum electrolytic capacitor on the output has an ESR somewhere around 0.1 Ω. The peak capacitor charging current is around 3 A and hence the maximum voltage drop on the ESR is around 300 mV. On the other hand, ripple due to capacitor charging and discharging is an order of a magnitude smaller, at around 30 mV peak-to-peak in this specific case.

Adding the ESR drop to the capacitor voltage gives a much better approximation of the observed output voltage. The break in the slope marks the point where the coil has stopped discharging. Before that point the slope is defined by the decaying current in the coil and the capacitor ESR. After that point, the slope is defined by the discharging output capacitor.

ESR voltage drop, capacitor voltage and output voltage in a flyback converter.

The only feature still missing is the high-frequency noise we see on the oscilloscope. This is caused by the switching done by the THX203H. Abrupt changes in the current cause the leakage magnetic flux in the transformer and the capacitances of the diodes to oscillate. Since the output filter has been removed as a cost-cutting measure, the ringing can be seen unattenuated on the output terminals. The smaller oscillations are caused by the primary switching on, while the larger oscillations that are obscuring the rising slope are caused by the THX203H switching off the primary coil.

Switching noise on the output voltage of a flyback converter.

Posted by Tomaž | Categories: Analog | Comments »

Switching window scaling in GNOME

01.05.2018 13:22

A while back I got a new work laptop: a 13" Dell XPS 9360. I was pleasantly surprised that installing the latest Debian Stretch with GNOME went smoothly and no special tweaks were needed to get everything up and running. The laptop works great and the battery life in Linux is a significant step up from my old HP EliteBook. The only real problem I noticed after a few months of use is weird behavior of the headphone jack, which often doesn't work for some unknown reason.

In any case, this is my first computer with a high-DPI screen. The 13-inch LCD panel has a resolution of 3200x1800, which means that text on a normal X window screen is almost unreadable without some kind of scaling. Thankfully, GNOME that ships with Stretch has a relatively good support for window scaling. You can set a scaling factor in the GNOME Tweak Tool and all windows will have their content scaled by an integer factor, making text and icons intelligible again.

Window scaling setting in GNOME Tweak Tool.

This setting works fine for me, at least for terminal windows and Firefox, which is what I mostly use on this computer. I've only noticed some minor cosmetic issues when I change this at run-time. Some icons and buttons in GNOME Shell (like the bar on the top of the screen or the settings menu on the upper-right) will sometimes look weird until the next reboot.

A bigger annoyance was the fact that I often use this computer with a normal (non-high-DPI) external monitor. I had to open up the Tweak Tool each time I connected or disconnected the monitor. Navigating huge scaled UI on the low-DPI external monitor or tiny UI on the high-DPI laptop panel got old really quick. It was soon obvious that changing that setting should be a matter of a single key press.

Finding a way to set window scaling programmatically was surprisingly difficult (not unlike my older effort in switching audio output device) I tried a few different approaches, like setting some dconf keys, but none worked reliably. I ended up digging into the Tweak Tool source. This revealed that the Tweak Tool is built around a nice Python library that exposes the necessary settings as functions you can call from your own scripts. The rest was simple.

I ended up with the following Python script:

#!/usr/bin/python2.7

from gtweak.utils import XSettingsOverrides

def main():
        xsettings = XSettingsOverrides()

        sf = xsettings.get_window_scaling_factor()

	if sf == 1:
		sf = 2
	else:
		sf = 1

	xsettings.set_window_scaling_factor(sf)

if __name__ == "__main__":
	main()

I have this script saved as toggle_hidpi and then a shortcut set in GNOME Keyboard Settings so that Super-F11 runs it. Note that using the laptop's built-in keyboard this means pressing the Windows logo, Fn, and F11 keys due to the weird modern practice of hiding normal function keys behind the Fn modifier. On an external USB keyboard, only Windows logo and F11 need to be pressed.

High DPI toggle shortcut in GNOME keyboard settings.

Posted by Tomaž | Categories: Code | Comments »

Faking adjustment layers with GIMP layer modes

12.03.2018 19:19

Drawing programs usually use a concept of layers. Each layer is like a glass plate you can draw on. The artwork appears in the program's window as if you were looking down through the stack of plates, with ink on upper layers obscuring that on lower layers.

Example layer stack in GIMP.

Adobe Photoshop extends this idea with adjustment layers. These do not add any content, but instead apply a color correction filter to lower layers. Adjustment layers work in real-time, the color correction gets applied seamlessly as you draw on layers below.

Support for adjustment layers in GIMP is a common question. GIMP does have a Color Levels tool for color correction. However, it can only be applied on one layer at a time. Color Levels operation is also destructive. If you apply it to a layer and want to continue drawing on it, you have to also adjust the colors of your brushes if you want to be consistent. This is often hard to do. It mostly means that you need to leave the color correction operation for the end, when no further changes will be made to the drawing layers and you can apply the correction to a flattened version of the image.

Adjustment layers are currently on the roadmap for GIMP 3.2. However, considering that Debian still ships with GIMP 2.8, this seems like a long way to go. Is it possible to have something like that today? I found some tips on the web on how to fake the adjustment layers using various layer modes. But these are very hand-wavy. Ideally what I would like to do is perfectly replicate the Color Levels dialog, not follow some vague instructions on what to do if I want the picture a bit lighter. That post did give me an idea though.

Layer modes allow you to perform some simple mathematical operations between pixel values on layers, like addition, multiplication and a handful of others. If you fill a layer with a constant color and set it to one of these layer modes, you are in fact transforming pixel values from layers below using a simple function with one constant parameter (the color on the layer). Color Levels operation similarly transforms pixel values by applying a function. So I wondered, would it be possible to combine constant color layers and layer modes in such a way as to approximate arbitrary settings in the color levels dialog?

Screenshot of the Color Levels dialog in GIMP.

If we look at the color levels dialog, there are three important settings: black point (b), white point (w) and gamma (g). These can be adjusted for red, green and blue channels individually, but since the operations are identical and independent, it suffices to focus a single channel. Also note that GIMP performs all operations in the range of 0 to 255. However, the calculations work out as if it was operating in the range from 0 to 1 (effectively a fixed point arithmetic is used with a scaling factor of 255). Since it's somewhat simpler to demonstrate, I'll use the range from 0 to 1.

Let's first ignore gamma (leave it at 1) and look at b and w. Mathematically, the function applied by the Color Levels operation is:

y = \frac{x - b}{w - b}

where x is the input pixel value and y is the output. On a graph it looks like this (you can also get this graph from GIMP with the Edit this Settings as Curves button):

Plot of the Color Levels function without gamma correction.

Here, the input pixel values are on the horizontal axis and output pixel values are on the vertical. This function can be trivially split into two nested functions:

y = f_2(f_1(x))
where
f_1(x) = x - b
f_2(x) = \frac{x}{w-b}

f1 shifts the origin by b and f2 increases the slope. This can be replicated using two layers on the top of the layers stack. GIMP documentation calls these masking layers:

  • Layer mode Subtract, filled with pixel value b,
  • Layer mode Divide, filled with pixel value w - b

Note that values outside of the range 0 - 1 are clipped. This happens on each layer, which makes things a bit tricky for more complicated layer stacks.

The above took care of the black and white point settings. What about gamma adjustment? This is a non-linear operation that gives more emphasis to darker or lighter colors. Mathematically, it's a power function with a real exponent. It is applied on top of the previous linear scaling.

y = x^g

GIMP allows for values of g between 0.1 and 10 in the Color Levels tool. Unfortunately, no layer mode includes an exponential function. However, the Soft light mode applies the following equation:

R = 1 - (1 - M)\cdot(1-x)
y = ((1-x)\cdot M + R)\cdot x

Here, M is the pixel value of the masking layer. If M is 0, this simplifies to:

y = x^2

So by stacking multiple such masking layers with Soft light mode, we can get any exponent that is a multiple of 2. This is still not really useful though. We want to be able to approximate any real gamma value. Luckily, the layer opacity setting opens up some more options. Layer opacity p (again in the range 0 - 1) basically does a linear combination of the original pixel value and the masked value. So taking this into account we get:

y = (1-p)x + px^2

By stacking multiple masking layers with opacity, we can get a polynomial function:

y = a_1 x + a_2 x^2 + x_3 x^3 + \dots

By carefully choosing the opacities of masking layers, we can manipulate the polynomial coefficients an. Polynomials of a sufficient degree can be a very good approximation for a power function with g > 1. For example, here is an approximation using the above method for g = 3.33, using 4 masking layers:

Polynomial approximation for the gamma adjustment function.

What about the g < 1 case? Unfortunately, polynomials don't give us a good approximation and there is no channel mode that involves square roots or any other usable function like that. However, we can apply the same principle of linear combination with opacity settings to multiple saturated divide operations. This effectively makes it possible to piecewise linearly approximate the exponential function. It's not as good as the polynomial, but with enough linear segments it can get very close. Here is one such approximation, for g = 0.33, again using 4 masking layers:

Piecewise linear approximation for the gamma adjustment function.

To test this all together in practice I've made a proof-of-concept GIMP plug-in that implements this idea for gray-scale images. You can get it on GitHub. Note that it needs a relatively recent Scipy and Numpy versions, so if it doesn't work at first, try upgrading them from PyPi. This is how the adjustment layers look like in the Layers window:

Fake color adjustment layer stack in GIMP.

Visually, results are reasonably close to what you get through the ordinary Color Layers tool, although not perfectly identical. I believe some of the discrepancy is caused by rounding errors. The following comparisons show a black-to-white linear gradient with a gamma correction applied. First stripe is the original gradient, second has the gamma correction applied using the built-in Color Levels tool and the third one has the same correction applied using the fake adjustment layer created by my plug-in:

Fake adjustment layer compared to Color Levels for gamma = 0.33.

Fake adjustment layer compared to Color Levels for gamma = 3.33.

How useful is this in practice? It certainly has the non-destructive, real-time property of the Photoshop's adjustment layer. The adjustment is immediately visible on any change in the drawing layers. Changing the adjustment settings, like the gamma value, is somewhat awkward, of course. You need to delete the layers created by the plug-in and create a new set (although an improved plug-in could assist in that). There is no live preview, like in the Color Levels dialog, but you can use Color Levels for preview and then copy-paste values into the plug-in. The multiple layers also clutter the layer list. Unfortunately it's impossible to put them into a layer group, since layer mode operations don't work on layers outside of a group.

My current code only works on gray-scale. The polynomial approximation uses layer opacity setting, and layer opacity can't be set separately for red, green and blue channels. This means that way of applying gamma adjustment can't be adapted for colors. However support for colors should be possible for the piecewise linear method, since in that case you can control the divider separately for each channel (since it's defined by the color fill on the layer). The opacity still stays the same, but I think it should be possible to make it work. I haven't done the math for that though.

Posted by Tomaž | Categories: Life | Comments »