My very own Perl bug

19.11.2007 0:30

They say that every hacker once in his life founds a bug in Perl. It seems I just found mine.

It started when I was hacking on Wikiprep, the Perl script we're using to parse Wikipedia. After I made some trivial change and ran it on a sample of Wikipedia pages I was greeted with a Segmentation fault message on the console. Since this was the first time I saw Perl do that, I quickly ran the same thing on another machine (I still don't trust Leopard on my laptop), this time a trusted Debian GNU/Linux development server. When I got the same result I knew I'm up to something.

After a couple of hours of testing I came up with this short script that consistently crashed Perl on every machine I tried:

sub a {
	my ($ref) = @_;

  	$$ref =~ s/\[\[[^\[]*\]\]/&b(pos($$ref))/eg;
}

sub b {
	my ($a) = @_;
	return "";
}

$a = "";
open(FILE, "<sf.dat");
binmode(FILE,  ':utf8');
while(<FILE>) {
	$a .= $_;
}
close(FILE);

&a(\$a);

sf.dat in this case is a UTF-8 encoded file with some links in MediaWiki markup (you can see it here).

After some consulting with people on #perl IRC channel we found a work around and my work on Wikipedia could continue. The crash seems to be caused by the pos() function, which can be replaced with lookups to @+ and @- arrays.

I haven't yet reported this issue since I was told Perl 5.10 is just around the corner, which might have this issue already fixed.

Posted by Tomaž | Categories: Code

Add a new comment


(No HTML tags allowed. Separate paragraphs with a blank line.)