Python gotcha

21.10.2008 21:22

Here's another simple way to add a subtle bug to your Python code that I've found recently.

Consider the following simple function that adds an element to an existing array, or creates a new array if none is specified in its arguments:

def foo(baz, bar = []):
        bar.append(baz)
        return bar

If you see a bug there, congratulations. It took me quite a while.

If not, here's an example program that uses this function and its output:

bar = foo('a')
bar = foo('b', bar)

print repr(bar)

# Start with a new list

bar = foo('c')
bar = foo('d', bar)

print repr(bar)
avian@galxpolx:~$ python example.py
['a', 'b']
['a', 'b', 'c', 'd']

Why doesn't foo() work as intended? It turns out that Python evaluates the default values of function arguments only once, when the function is defined and not each time the function is called.

Each time foo() is called without the second argument, the reference to the same array is passed to it. This array is empty at first, but every call to foo() adds an element that never gets removed.

So, the correct (and much less elegant) way is to create an empty array in the function body and the lesson learned is that only immutable objects should be used as default values in Python functions.

Posted by Tomaž | Categories: Code | Comments »

gnome-mount is the new ed

19.10.2008 17:21

It is bad enough when you encounter software that doesn't work. It's absolutely infuriating when it adds insult to injury and stubbornly refuses to do your bidding.

Historically, the most famous such stubborn piece of software is ed, the standard editor. As the urban legend goes, a typical user session looks like this:

golem$ ed
?
help
?
?
?
quit
?
exit
?
bye
?
hello? 
?
eat flaming death
?
^C
?
^C
?
^D
?

You could forgive the brave authors of the Unix dark ages for trading in a bit of user friendliness in exchange for precious core memory and teletype paper. And you can also be forgiven for thinking that today we are beyond such compromises.

Let's try an exercise similar to the one above with gnome-mount, a command-line utility that is used by various parts of Gnome Desktop to mount removable drives.

galxpolx$ gnome-mount --device /dev/hdb
gnome-mount 0.7
galxpolx$ gnome-mount --device /dev/hdb1
gnome-mount 0.7
galxpolx$ gnome-mount --device /dev/hdb1 --mount-point /media/sdcard
gnome-mount 0.7
galxpolx$ gnome-mount --device /dev/hdb1 --mount-point /media/sdcard --fstype vfat
gnome-mount 0.7
galxpolx$ gnome-mount --verbose --device /dev/hdb1
gnome-mount 0.7
galxpolx$ gnome-mount --verbose --hal-udi /org/freedesktop/Hal/devices/usb_device_cf2_6225_146030377350_if0
gnome-mount 0.7
galxpolx$ gnome-mount --eat-flaming-death
gnome-mount 0.7

At this point you can see that gnome-mount is as consistent in its output as ed. After several minutes of trying it has firmly established that you are in fact running version 0.7 of gnome-mount, but offers no additional clues on why it will not mount the freaking SD card.

Yes, I understand that gnome-mount is meant only to be ran by other processes and not by the user directly. However, the whole point of the Unix software model is that you have modular design, where each module can be individually tested manually. Giving no feedback on why an operation has failed is the worst kind of programming offense. And why is there a --verbose flag if it doesn't do anything?

In general, this is one of the things that bother me the most on the modern Linux Desktop. I don't expect everything to work out of the box. On the contrary, I'm prepared to research the issue and make corrections when something doesn't work. But with complicated systems in place on recent installations like udev and hal it happens more often than not that I hit a brick wall with software that will just not tell you what's wrong with it. It will simply just not work. With a problem description like that it's even futile to ask for help on a project mailing list until you've thoroughly studied the source code of several utilities.

It's been so last time with my efforts to make brightness buttons play nicely with gnome-power-manager on my Eee PC and it's also looking that way with the more recent problem with automatic mounting of removable volumes.

Posted by Tomaž | Categories: Code | Comments »

The not so great zero challenge

13.10.2008 21:27

The Great Zero Challenge tries to dispel the myth that you can recover any data from a hard drive that has been intentionally erased by overwriting its contents once with a single stream of zeros.

It's a nice idea with a problem. If it is possible to recover data, it's probably hugely expensive and it's highly unlikely that any company that is capable of doing that would take the challenge for a (recently increased) prize of $500. The money amount itself would probably be irrelevant if there would be a large media company backing it so that there would be some warranty of positive media coverage. Still, I congratulate the organizers for betting their money in order to dispel untruths as they put it.

Anyway, when I first heard of the challenge some time ago I noticed something interesting. The challenge says that you must identify the name of at least one of the files or folders on the disk. What if it would be possible to win the challenge without even touching the disk? Here's a enlarged portion of one of the censored screenshots available on the challenge website:

Sloppy censor, exhibit 1

Notice the dotted line? The censor has been a bit sloppy. That's one side of the selection box. So it looks like the folder was selected in the Explorer when the screenshot was made. This gives you a pretty good idea of the length of the folder name. More specifically, since the line is dotted, you know the width of the box is 62 or 63 pixels.

Windows uses a proportional font for file names (and by the looks of the screenshot they used the default Windows XP theme), so that further reduces the number of possible filenames. With a couple of trial screenshots I measured the width of all English letters and a short C program tried all possible combinations that included only letters.

The result? From a dictionary of English words, 2091 names matched. Far too many to be useful for guessing the correct name.

So knowing the length of a string rendered in proportional fonts isn't enough without some kind of a context. Is there any more information available in the screenshot to narrow the possibilities even further?

Sloppy censor, exhibit 2

Take a look at what else haven't been censored on the screenshot. There is one file with .gz extension, one with .tar extension and a directory. This suggests a decompressed distribution of one of the open source programs (for example something like "linux-2.6.23.tar.gz", "linux-2.6.23.tar" and "linux-2.6.24"). Size of the .tar file confirms that since it's approximately twice the size of the .gz file - a typical compression ratio for ASCII text or source code. The modification date of the .tar file suggests this is a fairly old release from the late 2006.

So, now all I need to find is an open source software that had a tar.gz release in November or December 2006, was 4.862 kB in size and had around 10 characters in the filename. Is there an easily searchable database of open source software out there that has this information? I haven't found one yet, but the Great Zero Challenge sure looks much less formidable when you look at it this way. And you don't even need to dust off that old electron microscope.

Posted by Tomaž | Categories: Ideas | Comments »

Slow to boot

09.10.2008 20:15

After reading that it's possible to boot an Eee PC into Linux in five seconds I started tweaking Debian installation on my 701 to see how close I can get to that impressive time. I didn't want to completely throw away any chances of cleanly upgrading packages so I tried to stay away from radical changes to the system like using a static /dev instead of udev for boot and patching X11.

Currently my boot time stands at 32 seconds (that is from the moment I press the power button to the GDM login screen). Far cry from the instant on action of Arjan van de Ven, but a bit better than a couple of minutes you get with a stock Debian install.

Out of those 32 seconds, 8 seconds are spent in the BIOS and GRUB boot loader. There's nothing I can do about those.

I reduced time spent loading the kernel by compiling a custom kernel. I started with a stock Debian kernel config (I'm currently using 2.6.23 because of all the problems with newer versions) and removed initrd (compiled-in SATA and filesystems support) and code that doesn't make sense on Eee like SMP support.

To reduce the time spent in the early boot I followed instructions on the Debian Wiki. I linked /bin/sh to /bin/dash, parallelized startup scripts and removed all services that aren't necessary for the system. I skipped readahead installation since I don't think it would help much with the low seek times of the solid state drive.

Then I installed bootchart to see if there is any more low hanging fruit. One thing that bootchart showed nicely was a sleep in /lib/udev/net.agent. This shell script is run by udev for every new detected network device and checks for any autoconfiguration directives for it in /etc/network/interfaces. For some reason it waits for loopback device to get accessible before doing anything, which means that on boot it halts udev initialization until kernel decides to bring lo up. Since I don't think I'll be hot-plugging any network devices into my Eee in the near future I commented out the relevant line in /etc/udev/rules.d/80-drivers.rules. This removed several seconds.

So, right now the boot chart looks like this:

Eee PC bootchart

You can see that the majority of time is spent in the kernel. Either in the boot process (before bootchartd started, left of the edge of the chart, or in module initialization (modprobe processes on the chart). There is little more I could do. Even with more aggressive boot parallelization further gains would be small. You can also see that CPU is constantly at 100% and that there is practically zero I/O wait time during boot, so my assumption about readahead was probably correct.

There is one thing on this graph I don't understand though. I replaced modprobe binary with a wrapper that logged modprobe start and end times (that's why the chart shows modprobe and modprobe.orig processes) to see exactly which modules are hogging the machine. bootchartd here shows that each modprobe takes several seconds, but my log shows that each modprobe exited after less that 100 ms. So either modprobe forks the process or bootchart gets confused somehow.

Posted by Tomaž | Categories: Code | Comments »

We have actually reached the future

05.10.2008 20:03

Recently I came across this logo on a Electronic Tube Handbook cover from 1953.

Philips electronic tube handbook cover circa 1953

Compare it with the current logo of NXP semiconductors, a former Philips semiconductor manufacturing division.

NXP semiconductors logo circa 2008

I'm no graphic designer, but I think changes in style very well reflect how much has changed during these 55 years in the electronics industry.

Posted by Tomaž | Categories: Life | Comments »

Capacitor charging for dummies

03.10.2008 19:56

Say you want to charge an empty capacitor to a certain voltage. It's pretty simple to determine analytically the values of all variables versus time in such a system and RC circuits are usually the subject of high-school physics. However even so there is an interesting point to a charging capacitor that is often omitted from the text books.

If you are using a constant voltage source (like in majority of practical cases), any circuit that charges a capacitor can be generalized into the following circuit:

Charging a capacitor through a resistive element

Ucc is the voltage of the source and C is capacitor capacitance. f(u) can be an arbitrary function, provided it describes a passive resistive element (in this case specifically f(u) >= 0 for u > 0 and f(0) = 0).

For example, an ideal resistor has:

f(u) = \frac{u}{R}

And a current source made with transistors can be approximated with:

f(u) = I_c

As you probably know, the energy of a capacitor with capacitance C charged to voltage Ucc is equal to:

W_{capacitor} = \frac{C \cdot U_{cc}^2}{2}

While the work done by the voltage source when charging it is equal to:

W_{source} = \int_0^{\infty} U_{cc} \cdot i \cdot dt = U_{cc} \cdot \int_0^{\infty} i \cdot dt = U_{cc} \cdot Q
W_{source} = U_{cc} \cdot U_{cc} \cdot C = C \cdot U_{cc}^2

You can see here that no matter what the f(u) function of the element in the circuit is, only half of the work done by the source goes to increasing the capacitor energy. The other half is dissipated as heat in the resistor, transistor or any other element connected between the source and the capacitor.

Why is that? One way to understand it is to consider how a small unit of charge gets transferred from the source at potential Ucc to the positive capacitor plate with potential of:

u_c = \frac{q_c}{C}

The very first charge unit has to fall into a potential well of a discharged capacitor (qc = 0). It looses potential energy by doing that and this energy lost must be converted to heat in the resistive element. Following charge units have to gradually loose less and less potential until the capacitor is fully charged and potentials on both sides are equal. This process doesn't depend on the resistance offered to the charge units by the material between the two potentials.

The obvious question here is what happens if you connect an empty capacitor directly to the voltage source? Well, that is a circuit that makes as much sense in circuit theory as statement 1 = 0 in mathematics. Generalizations that permit the use of circuit theory do not allow such an event and indeed it is also impossible in practice, since every capacitor has some series resistance. Even introducing mathematical tricks like the Dirac delta function doesn't give you a consistent answer.

So what if you're environmentally conscious and don't want to waste half of the energy on each charge cycle (or more probably, you're transferring so much energy over that capacitor that it becomes unpractical to cool surrounding elements)?

Since using only resistive elements is obviously out of the question, the circuit must use at least one additional reactive element, specifically a coil. For example like this:

Charging a capacitor through a coil

To start charging the capacitor, you close the switch. The circuit then forms a LC circuit that begins to oscillate. When the voltage on the capacitor reaches approximately half of the source voltage, you open the switch. The energy stored in the magnetic field in the coil will then charge the capacitor over the second half of the cycle and the diode will prevent the capacitor from discharging again through the coil. Here's how such a charge cycle looks like in a SPICE simulation:

Charging a capacitor through a coil

Upper trace: voltage on the diode. Lower trace: capacitor voltage Uc

Still, some energy is lost on the switch resistance and the diode, but this time that portion depends on variables that are at least to some extent under your control (i.e. "on" resistance of the switch, Uf of the diode). How to calculate that is left as an exercise for the reader.

In conclusion, you see that although ideal capacitors are advertised as lossless elements, charging them includes a hefty tax unless you're doing it in the right way. With low-power devices this doesn't matter much, but when you're pushing some non-trivial power through the capacitor it becomes a problem.

This is the reason why all practical switching DC-to-DC converters with a significant power output use coils as the reactive element whose terminals are switched. It's the consequence of the fact that the sources of power we use are closer to constant voltage than constant current.

In fact, the circuit above is identical to the buck converter. Note however, that if you're using it only for charging an initially empty capacitor, the final voltage can be higher than Ucc and depends on when you turn off the switch.

Posted by Tomaž | Categories: Analog | Comments »