1984 was not an instruction manual

30.03.2009 8:46

News from the UK are getting scarier each day.

Previous week two stories circled through Slovenian newspapers: First one reporting that the UK government is going to monitor conversations on all social networking sites. Well, considering that they already had plans to record all phone calls and internet traffic, it seems a bit late to start ringing the bell now (aren't Facebook conversations part of the internet traffic?).

However, combine that with the second news article about the proposal to add social network sites into the primary school curriculum (instead of those useless mathematical skills for instance).

Is it just me, or does that sound like a plan to make sure that future generations will enter all details about their private lives into social networking databases, where they can be conveniently mined by the state?

Posted by Tomaž | Categories: Life | Comments »

Video of my Pleo talk

28.03.2009 17:11
Robot Pleo ali kako dinozavru zamenjati vrat

Two weeks ago I gave a talk at Kiberpipa about Ugobe's Pleo. I talked about my quest to fix Piki's broken neck and interesting technical details I found while poking inside his belly.

If you missed the talk, the video is now available at Kiberpipa's site (in Slovene).

Posted by Tomaž | Categories: Life | Comments »

Wordpress is a bad joke

22.03.2009 11:25

I've written a little over 400 entries in my blog in the last four years. Now it appears like that's a bit more than what Nanoblogger can handle. It's taking more and more time to update the site and worst of all, I've been getting reports that the archive pages are badly broken.

I've first tried to fix Nanoblogger, but that's nearly impossible. Imagine 2500 lines of bash code, with functions that use global variables to pass data around and no comments.

Obviously it's time I move the site to some other software. In fact, if I've looked into the core code earlier, I probably wouldn't use Nanoblogger in the first place (the plug-in interface is relatively clean in comparison).

I played with the idea to write my own replacement for Nanoblogger, because I really like the idea of managing the site from a command line interface and having all my posts available in flat files. I even wrote some experimental code, when I figured that I'm just reinventing the wheel.

Colleagues at work suggested that I try Wordpress.

So I got a shared web host with Wordpress and started importing my data in it. I decided to write a Wordpress Extended RSS file containing all the posts and comments. There's no documentation for this format, but I managed to replicate it by example using a short Perl script and XML::Writer.

Well, it turned out all my imported posts came out of Wordpress garbled with line breaks inserted everywhere and worst of all, badly broken HTML (for example <p<br/>>).

Since I couldn't find a reason for that in my WXR file (it checked out as valid XML - I could hardly make anything else with XML::Writer), I started digging around Wordpress code.

First sign of trouble was when I stumbled upon the following function (wp-admin/import/wordpress.php:59)

function get_tag( $string, $tag ) {
	global $wpdb;
	preg_match("|<$tag.*?>(.*?)</$tag>|is", $string, $return);
	$return = preg_replace('|^<!\[CDATA\[(.*)\]\]>$|s', '$1', $return[1]);
	$return = $wpdb->escape( trim( $return ) );
	return $return;
}

Riiight. I guess nobody told Wordpress developers that the eighth circle of hell is reserved for people parsing XML with regular expressions.

Ok, but maybe I can forgive them one mistake. This is after all just code for importing entries that is used once in the life time of a Wordpress installation.

Still, their funny way of reimplementing a true XML parser wasn't the cause for my problems, so I dug deeper. Then I got to wp-includes/formatting.php and I didn't know whether I should laugh or cry:

$pee = preg_replace('|<p>\s*?</p>|', '', $pee); // under certain strange conditions it could create a P of entirely whitespace
$pee = preg_replace('!<p>([^<]+)\s*?(</(?:div|address|form)[^>]*>)!', "<p>$1</p>$2", $pee);
$pee = preg_replace( '|<p>|', "$1<p>", $pee );
$pee = preg_replace('!<p>\s*(</?' . $allblocks . '[^>]*>)\s*</p>!', "$1", $pee); // don't pee all over a tag
$pee = preg_replace("|<p>(<li.+?)</p>|", "$1", $pee); // problem with nested lists
$pee = preg_replace('|<p><blockquote([^>]*)>|i', "<blockquote$1><p>", $pee);

and on for 2300 more lines like that. You can see where this is going.

This is code that gets executed for every piece of text that is posted through Wordpress. It's a miracle it works as well as it does!

This convinced me that going from Nanoblogger to Wordpress isn't that much of an improvement in regard to code quality.

And the third thing that makes Wordpress a joke is security. It's a little more than an afterthought. There's the wonderful security record of PHP written web applications and then there's the fact that there is no way to make Wordpress secure. Out of the box it doesn't even support SSL and the plug-in that adds support only does that for the page where you enter your password.

That's only marginally better than having no SSL at all. What good is securing the password entry page when all the administrative interface is getting sent in clear? Also, I find the practice of having code that is capable of modifying itself (by design!) very questionable.

If I would use Wordpress I could say goodbye to writing anything from events like the CCC.

That's it as far as I'm concerned about Wordpress. I'm never looking in that direction again.

I guess when everyone is rolling around on pentagons you sometimes have to reinvent the wheel.

Posted by Tomaž | Categories: Code | Comments »

Acronym explanations for the Desktop

02.03.2009 20:42

Here's a little hack called Acrozym I've thrown together in a moment of inspiration.

The world gross product of acronyms is still raising and it's hard to keep track of all the abbreviations, especially when you're reading text from the field you're not familiar with.

I found myself in such a situation this morning at work when I tried to read a paper about Internet advertisements (and that's the maximum yearly dosage of the topic I can handle). I constantly went back and forth between the document and Wikipedia.

Incidentally, providing acronym explanations for blog posts was one of the features requested for the Zemanta service a while ago.

My Wikitag engine that provides the automatic in-text linking part of the Suggest service is certainly capable of this task. When suggesting the links it often provides explanatory links to Wikipedia for acronyms. In fact, a part of it is specially designed to recognize them. So the only thing left to do was a convenient interface to this part of the Zemanta API from an X11 desktop.

Since text with acronyms can appear in the most unexpected places I decided to use the most universal way of supplying the text - the middle-click clipboard of the X11 system.

Here's how it works: You select any piece of text on the screen just like you would if you wanted to copy it via the middle-click method. You then run Acrozym (e. g. via a convenient launcher on the panel) and in a second or two a window pops up with the explanation of any acronyms in the text you selected.

Acrozym screenshot

Since this isn't an ordinary phrase search (you get different explanations depending on the context), you also need to include a sentence or two around the acronym. The engine will not be able to disambiguate the meaning otherwise (or you will get a wrong explanation).

The code is available here if you want to play with it. As I said, it's a hack written in a couple of hours, but seems to work pretty well for me (any patches are welcome, of course).

Installation instructions are included in the tarball. The code is 90 lines of Perl and you'll probably need to install two modules from the CPAN before running it. It uses libnotify for display, so it should be cross platform (only tested on GNOME though).

Posted by Tomaž | Categories: Code | Comments »