25.11.2007 19:06

wikiprep.pl sure takes a long time...

Posted by Tomaž | Categories: Life | Comments »

More statistics

22.11.2007 23:27

Zemanta's department of lies, damn lies and statistics presents: a histogram of lengths of articles on English Wikipedia and lengths of blog posts and news articles on the rest of the internet:

Histogram of Wikipedia article lengths

Histogram of blog posts and news articles

Posted by Tomaž | Categories: Life | Comments »


22.11.2007 3:54

Last Friday I went with the rest of Zemanta's team to see the famous crack on the floor of Tate Modern gallery. Considering that I read Stephen Baxter's Moonseed recently, the timing just couldn't be better.

The gallery occupies the building of a former power station and this particular art installation is placed in its massive turbine hall.

The sheer size of the hall is impressive. You can still see some large steam pipes that were cut off at the walls. I can't stop myself thinking that this place probably looked more impressive when it was filled with machines than how it is now, serving as a gallery for modern art.

On the other hand Shibboleth (the formal name of the installation) looks equally impressive. It spans the whole length of the hall - it starts at the entrance and disappears below the far wall - and looks very realistic, down to the smallest detail. Both edges of the crevice really look like they once fit together. I haven't seen any clues on how this was made - even the hairline cracks at the edges look like they formed in the material of the floor. I expected to see some marks where they dug out a larger groove and then filled it with concrete, but now I have no idea how they managed to dig such deep and narrow grooves into the concrete floor.

The realism breaks down only when you look closely at the inner walls of the crack which are too smooth to be natural and where you can see the iron mesh that reinforces the concrete.

I failed to see how this installation addresses a long legacy of racism and colonialism that underlies the modern world which is, as I learned from a sign on a wall, the message that the artist wanted to convey with her work.

Posted by Tomaž | Categories: Ideas | Comments »

Ivan Cankar

20.11.2007 1:41

English Wikipedia says this about Slovenian writer Ivan Cankar:

The son of a village tailor, he studied electrical engineering in Vienna, and lived there for some time as a freelance writer.

Is it possible that in four years of Slovenian literature courses in high school nobody cared to mention that? I'm sure I would remember that one interesting fact from lectures that were otherwise filled with (to me at least) uninteresting interpretations of his works.

Slovenian and German Wikipedias seem to disagree though. They say that he studied architecture for a while before switching to literature.

Posted by Tomaž | Categories: Life | Comments »

My very own Perl bug

19.11.2007 0:30

They say that every hacker once in his life founds a bug in Perl. It seems I just found mine.

It started when I was hacking on Wikiprep, the Perl script we're using to parse Wikipedia. After I made some trivial change and ran it on a sample of Wikipedia pages I was greeted with a Segmentation fault message on the console. Since this was the first time I saw Perl do that, I quickly ran the same thing on another machine (I still don't trust Leopard on my laptop), this time a trusted Debian GNU/Linux development server. When I got the same result I knew I'm up to something.

After a couple of hours of testing I came up with this short script that consistently crashed Perl on every machine I tried:

sub a {
	my ($ref) = @_;

  	$$ref =~ s/\[\[[^\[]*\]\]/&b(pos($$ref))/eg;

sub b {
	my ($a) = @_;
	return "";

$a = "";
open(FILE, "<sf.dat");
binmode(FILE,  ':utf8');
while(<FILE>) {
	$a .= $_;


sf.dat in this case is a UTF-8 encoded file with some links in MediaWiki markup (you can see it here).

After some consulting with people on #perl IRC channel we found a work around and my work on Wikipedia could continue. The crash seems to be caused by the pos() function, which can be replaced with lookups to @+ and @- arrays.

I haven't yet reported this issue since I was told Perl 5.10 is just around the corner, which might have this issue already fixed.

Posted by Tomaž | Categories: Code | Comments »

Cool switches

12.11.2007 18:24

The next electronic thing I build will definitely have one of these things on it.

(found on CNET's list of Top ten off switches)

Posted by Tomaž | Categories: Ideas | Comments »

Making Wikipedia a better place

11.11.2007 22:48

... one speedy delete at a time.

I noticed some weird aliases today that seemed to be polluting Zemanta's semantic database. After making sure these terms actually exist on Wikipedia, Jure and I got on #wikipedia IRC channel and started bugging people. A few minutes later the world became a better place and the following redirects linking to Apple's iPhone were speedy deleted:

  • Jesus Phone
  • Jesus-Phone
  • Uphone
  • Ophone
  • Eye phone

Now the only questionable alias for iPhone remains the God Machine, which actually has a citation behind it.

Posted by Tomaž | Categories: Life | Comments »

Grass cutter

04.11.2007 15:59

Remember Advanced Lawnmower Simulator? It was a simple and boring game for the Sinclair Spectrum where you had to push your lawnmower over a field. It was also an elaborate hoax performed by the editors of the Your Spectrum magazine.

Advanced Lawnmower Simulator

This is what it used to look like in the 80s.

Now it seems that some people did have some fun with it after all. There appears to be a remake built into a £30 50-games-in-one hand-held console called Gamespower 50. They also enhanced the graphics a bit:

Grass Cutter

However it still appears to be as simple and boring as the original. There's a review you can see at YouTube (Dr. Ashen talks about the Grass Cutter around 4:00 into the video).

Posted by Tomaž | Categories: Life | Comments »


03.11.2007 14:25

Two days ago, Apple users at Zemanta got a shiny box with the new version of Mac OS X inside. Since I was still running OS X 10.3 on my PowerBook I made a backup and upgraded without hesitation.

Shiny Leopard box

On the first glance Leopard looked much too shiny for my taste. The second look revealed just how much of a resource hog it is. My 60 GB disk is now 30% occupied just with the operating system itself (upgrade took approximately 11 GB of additional disk space). 768 MB of RAM seems to be just enough to keep the system ticking.

The default system also looks much to shiny for my taste. One of the first things I did is to turn the dock back to it's previous, non-3D non-shiny look (so much for not having to use command line on a Mac). Fortunately I was somehow spared the transparent menu bar (it seems that is much harder to disable). Perhaps it's the old hardware I'm using. On the other hand I still get blurred background behind menus. It's really a minimal visual change, but I'm sure they did that only to show that a Mac can do that just as easily as Vista.

Since I never used 10.4 this is also the first time I've seen Spotlight and Dashboard. The first one is great - starting applications for example. Not so much for finding documents in my experience because I always get a ton of search results from various C and Python source files I have on the disk (the same problem I had with Beagle on Linux). For the dashboard on the other hand I can't see any good use. I currently only have iStat Pro there. I still use sticky notes and calculator as standalone applications.

For some new features I have the feeling that they are there just so that Apple's marketing department could say that they added more than 300 new features. Quick look is one such example - isn't it easier to double click a file and open it than Control-clicking it and selecting Quick look (which takes about as long to load up as Preview anyway)? Or that flip-through-the-album mode that finder has now. Just more shiny things with no useful value.

Spaces is nice. I missed virtual desktops on Macs. It still has bugs though. If I have for example terminal windows on two desktops and switch to Terminal from some other application with Ctrl-Tab the system will take me to some random desktop with a terminal on it. Ctrl-` will also just cycle between windows on the current desktop, not all desktops.

Regarding application compatibility: Leopard's X11 is terribly broken. I installed Tiger's X11 but problems remain. Gimp isn't working for example and OpenOffice sometimes gives me "Command timed out" error and sometimes crashes after I type in a couple of words. Also Vi for OS X works worse on Leopard (for example Ctrl-6 shortcut stopped working). MacVim works perfectly (and has a nicer icon). Other things appear to be working, although Jure says that upgrade broke his MacPorts and MySql installations.

A big surprise was that the ssh client they ship with Leopard now pops-up a graphical window asking for a password / passphrase (probably through ssh-agent). I'm not sure I like this - command line utilities should stay in the command line.

SSH pop-up

A couple of new features also strongly reminded me of Vista. For example the new Mac OS X is constantly asking me to allow or cancel some actions. I don't know how Apple can make fun of the Microsoft in their commercials about this when they aren't any better. For example, in the first day of using Leopard I had to allow application to run for the first time, application downloaded from the Internet to run, application to access the Internet and application to accept connections from the Internet ...

Both Jure and I also noticed a strange side-effect of the upgrade: we both seem to be making more typing mistakes than before. I'm guessing that Leopard has some new system of filtering keystrokes and that it no longer registers very short key presses or something. Also I have the feeling that the keyboard repeat rate is lower that before. I have no benchmarks to prove it though.

Posted by Tomaž | Categories: Life | Comments »