Avian’s Blog

Electronics and Free Software

European Semantic Web Conference

30.05.2008 20:58

Andraž and I are flying to 5th European Semantic Web Conference early tomorrow morning to see what is new in this field and hoping to meet some interesting people.

satellite photo of Tenerife shows some interesting geographical features (unfortunately Wikipedia says that volcano isn't active since a couple of million years ago). I hope we'll have some opportunity to see a part of this landscape first hand.

Posted by Tomaž | Categories: Life | Comments »

BeautifulSoup tips, part 2

25.05.2008 19:49

Here's another interesting catch in BeautifulSoup: you can iterate through BeautifulSoup Tag's child nodes simply by using a Tag object as an iterable object. For example in a for loop like this:

for t in tag:
	# do something with t

However, what if tag is a NavigableString? If you're doing a recursive search through the tree, this will happen sooner or later. Since NavigableString doesn't have any child nodes, you would expect that this for loop would throw an exception, right? Well, not exactly.

Since NavigableString derives from Unicode class, it can also be used as an iterable object, however this time you'll iterate through single characters in its contents (which are Unicode objects themselves).

That was a source of some weird parsing errors in some of the code I was working on. So before iterating through a tag, check if it isn't a subclass of NavigableString:

if not isinstance(tag, NavigableString):
	for t in tag:
		# do something with t
Posted by Tomaž | Categories: Code | Comments »

Early summer math

22.05.2008 20:02

Getting caught in an early summer shower a while ago got me thinking. If you have to run a certain distance through the rain and want to remain as dry as possible, is it better to slowly walk or run as fast as possible? It's a popular question and I remember seeing Mythbusters and Brainiac episodes on a similar topic. But since I didn't feel like experimenting I tried to come up with a theoretical solution.

First there are some things that needed to be defined:

I defined rain as a homogeneous mixture of air and water, moving downwards with a constant velocity (since raindrops reach their terminal velocity well before hitting the ground).

The measure of wetness is the amount of water accumulated on you during the exercise, and that amount is proportional to the volume of air/water mixture you displace during movement. The rationale here is that the air will move around you as you move through the rain while water droplets will stick to you since they cannot follow the air flow due to their inertia.

To make calculation simpler I also presumed that the person is of a rectangular shape (the following calculation is done in two dimensions, but accounting for the depth is trivial). You can think of the rectangle sides a and b as your projections to the vertical and horizontal plane.

Now with these things defined, it's pretty simple to get to a result. You basically have to calculate the volume of the hole you bore through the rain. Here is the situation with the ground as the frame of reference, where va is your velocity, vb is the velocity of rain droplets and d is the distance you have to cross in rain.

It may be simpler to think with the water droplets as the frame of reference. In that case the rain is stationary and you are moving up and right through it with the velocity va - vb.

The displaced volume is then the sum of volumes of three parallelograms:

\mathcal{P} = \mathcal{P}_1 + \mathcal{P}_2 + \mathcal{P}_3
\mathcal{P} = a \cdot b + a \cdot h + b \cdot d
\mathcal{P} = a \cdot b + a \cdot \frac{v_b \cdot d}{v_a} + b \cdot d

Now if you look at these three terms: the first and the third one are constant. Only the second one depends on your velocity va and it's an inverse relationship.

So the conclusion of this purely theoretical endeavor is that the faster you run, the dryer you'll be and even if your speed goes to infinity, you're still going to get wet. Also note that the amount of water accumulated on your front side is the same regardless of your speed, it's only the amount that falls on the top of your head that varies.

Posted by Tomaž | Categories: Ideas | Comments »

BeautifulSoup tips

20.05.2008 22:26

BeautifulSoup is a wonderful tool for parsing XML sludge gathered on the bottom of the internet. However you pay for versatility and resistance to broken markup with a big performance hit compared to proper XML parsers. The documentation acknowledges that and suggests that you only parse a part of the document - it doesn't offer any tips on what to do if you really must walk through the whole tree.

Today I found the following trick with which I managed to speed some code that walks entire BeautifulSoup document trees by almost four times:

NavigableString and Tag objects offer a very useful string member which can be used to access the Unicode string contained in them.

This is very convenient, since you only access the string member no matter what kind of node you are processing. However, there is a catch: If the Tag object doesn't contain a single NavigableString, the string member evaluates to None (as per documentation). What happens behind the scenes is that the Tag object doesn't have the string member in __dict__ and the Tag's __getattr__() method gets called. This does an expensive search through all of Tag's attributes (because BeautifulSoup Tag's XML attributes can also be accessed as Python members) before returning None. So hitting the string member when there isn't any will do a lot of processing for nothing.

The second catch is that while the string member is implemented in __dict__ in Tag objects, it's implemented in a (fairly efficient) __getattr__() method in NavigableString.

So the following code:

s = tag.string

can be replaced with this equivalent, which was approximately 4 times faster in my case:

if isinstance(tag, NavigableString):
	s = tag.string
else:
	s = tag.__dict__.get('string', None)

Of course, using hacks like this means that your application now depends on library internals, so big fat warnings are probably in order around such code. The code above was tested with BeautifulSoup 3.0.5 - it's quite possible that newer versions will have things done differently.

Posted by Tomaž | Categories: Code | Comments »

Blog update

18.05.2008 11:53

Two weeks ago I got the idea that maybe it's time to move away from Nanoblogger and start using some "real" blog software.

Well, after experimenting with Wordpress and MovableType for last two weeks I must say that Nanoblogger still beats them all. Have you ever tried writing a post on EeePC with those Javascript WYSIWYG editors? I'll describe my experiences in some other post, for now I just want to say that all blogging software sucks, Nanoblogger just sucks the least.

So yes, I'm staying with Nanoblogger. I've modified it a bit, added support for comments and I'm hoping to rewrite some parts of it in Perl to make it faster.

Oh, and there's also jsMath:

\oint_{\mathcal{A}} \vec D \cdot \vec{dA} = \sum_i Q_i

Anyway, if something's not working for you now, please leave a comment or drop me a mail.

Posted by Tomaž | Categories: Code | Comments »

EeePC ATA hickups

04.05.2008 17:06

Today I started getting errors like this on Eee:

ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata2.00: cmd ef/05:fe:00:00:00/00:00:00:00:00/40 tag 0 cdb 0x0 data 0 
         res 51/04:fe:00:00:00/00:00:00:00:00/e0 Emask 0x1 (device error)
	 ata2.00: configured for UDMA/33
	 ata2: EH complete
	 sd 1:0:0:0: [sda] 7815024 512-byte hardware sectors (4001 MB)
	 sd 1:0:0:0: [sda] Write Protect is off
	 sd 1:0:0:0: [sda] Mode Sense: 00 3a 00 00
	 sd 1:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA

That's with a stock Debian 2.6.23 kernel which I've been running since I first installed Debian.

I just hope this doesn't mean the solid state drive is already reaching its maximum number of write cycles. So far these errors don't seem to have any effect short of annoying me when I'm using the console.

Update: Mystery solved. Some browsing through kernel source revealed that "ef" is ATA command ATA_CMD_SET_FEATURES. This sounded like something hdparm would do and indeed hdparm was being run from some misconfigured ACPI scripts I forgot to clean up yesterday.

Posted by Tomaž | Categories: Code | Comments »

EeePC's hotkeys

03.05.2008 12:58

I was wondering yesterday why when I press the LCD brightness hotkeys on my EeePC (Fn-F3 and Fn-F4) I don't see the nice GNOME-styled OSD (like the one that pops up when I change the speaker volume).

GNOME brightness OSD on EeePC

Well, it turned out that quite a few things needed fixing. Among other things HAL configuration and GNOME Power manager. However even before I could start fixing things there was a non-trivial matter of finding out all the steps that happen after you press the hotkey and before the volume or LCD brightness get changed. As far as I could see this isn't documented anywhere so in case someone else will do something similar, here's what I found out.

Note that this is most likely Debian and EeePC specific and will probably change soon.

Step 1: Hardware

When you press a Fn-F combination, laptop's hardware generates an ACPI event. This event is gets picked up by acpid using the Linux kernel ACPI driver.

In EeePC's case Fn-F4 and Fn-F3 keypresses also directly modify the brightness setting without any software intervention. Not so with audio volume hotkeys.

Step 2: ACPI

acpid consults its configuration in /etc/acpi/events and runs a shell script that corresponds to the ACPI event.

This shell scripts runs acpi_fakekey that inserts a keycode into kernel's keyboard event FIFO. This in effect simulates a keypress on the keyboard.

This is done like so because different laptops have different means of reporting hotkeys (different ACPI events, scancodes, etc.). After this step all these events are translated into a standardized keycode in the event FIFO (like 224 for KEY_BRIGHTNESSDOWN - see /usr/share/acpi-support/key-constants)

Step 3: GNOME

Some GNOME component gets the keypress via an X event and reacts to it. In case of LCD brightness hotkeys the component is GNOME Power Manager and in case of audio volume hotkeys it's the GNOME Settings Daemon.

It might be that HAL also plays a role here. Maybe these applications get the events through dbus from HAL. lshal -m suggests that HAL also reacts to these keypresses, but I haven't dig deep enough into the code to see if GNOME listens to it.

There's also some weird business of how keycodes get translated into X events via xkb settings. A document on Ubuntu Wiki tries to clear this up, but I failed to see the exact connection between codes in X configuration and codes in acpi_fakekey.

Step 4: HAL

GNOME component displays a box with the current status and sends a message to HAL to change the LCD brightness.

HAL has some configuration in /usr/share/hal/fdi/policy that tells it how to do that. For example in case of LCD brightness setting it calls /usr/lib/hal/scripts/hal-system-lcd-set-brightness and /usr/lib/hal/scripts/linux/hal-system-lcd-set-brightness-linux

In case of audio volume GNOME Settings Daemon looks like it changes the setting itself, without resorting to HAL.

Also, since EeePC's hardware changes the brightness itself, this last step is unnecessary - all GNOME needs to do is display the box (and that's one of the things that needed fixing).

Posted by Tomaž | Categories: Code | Comments »

Python in operator revisited

02.05.2008 19:38

Hruške has some well founded arguments against my last Python performance rant.

I stand corrected. I must have messed something up with my measurements. I have now repeated them on two different machines and I got the same results as Hruške. So the in operator in Python is indeed fast when the right hand operand is a dictionary.

I can only say in my defense that I thought that this part of Python documentation confirms that the in operator works in a general way on all iterable objects:

__iter__()
Return the iterator object itself. This is required to allow both containers and iterators to be used with the for and in statements. This method corresponds to the tp_iter slot of the type structure for Python objects in the Python/C API.

As for his other comment about Wikiprep code size:

avian@toybox:~/dev/html2latex-1.1$ sloccount .

... snip ...

Totals grouped by language (dominant language first):
perl:           797 (100.00%)

Total Physical Source Lines of Code (SLOC)                = 797
avian@toybox:~/dev/wikiprep$ sloccount .

... snip ...

Totals grouped by language (dominant language first):
perl:          2060 (85.12%)
cpp:            220 (9.09%)
python:          50 (2.07%)
ansic:           49 (2.02%)
sh:              41 (1.69%)

Total Physical Source Lines of Code (SLOC)                = 2,420

So yeah, Wikiprep Perl code which I maintain for Zemanta is still bigger.

That of course doesn't mean that there aren't any bigger Perl monstrosities in the wild. After a quick survey of free Perl software I have on the disk I found gtablix which has 6283 Perl SLOC and GCfilms with an unbelievable 25758 Perl SLOC (neither of which seems to be maintained BTW).

Posted by Tomaž | Categories: Code | Comments »

Music format rant

01.05.2008 18:20

Today I wanted to burn a CD which I could listen in my car. Since the CD player there supports MP3 format this shouldn't be that hard, right? I'll just copy the music I already have on my hard disk (and which I ripped from my CD collection) to a data CD and that's it.

Well, not really. On my Debian machine, all the music got ripped into Ogg Vorbis (since that was the default setting in Sound Juicer). On my PowerBook, iTunes ripped everything into AAC (again, the default setting). I'm sure that if I had a machine running Windows, everything would be in some Microsoft proprietary format.

I know, I know. I should have checked the settings before feeding my computers all my CDs. Still, it's frustrating that I have to go through all that again.

And of course, I didn't have any such problems with music that I downloaded from the net (like free tracks from Machinae Supremacy, don't get me wrong).

Posted by Tomaž | Categories: Life | Comments »