My very own Perl bug
They say that every hacker once in his life founds a bug in Perl. It seems I just found mine.
It started when I was hacking on Wikiprep, the Perl script we're using to parse Wikipedia. After I made some trivial change and ran it on a sample of Wikipedia pages I was greeted with a Segmentation fault message on the console. Since this was the first time I saw Perl do that, I quickly ran the same thing on another machine (I still don't trust Leopard on my laptop), this time a trusted Debian GNU/Linux development server. When I got the same result I knew I'm up to something.
After a couple of hours of testing I came up with this short script that consistently crashed Perl on every machine I tried:
sub a { my ($ref) = @_; $$ref =~ s/\[\[[^\[]*\]\]/&b(pos($$ref))/eg; } sub b { my ($a) = @_; return ""; } $a = ""; open(FILE, "<sf.dat"); binmode(FILE, ':utf8'); while(<FILE>) { $a .= $_; } close(FILE); &a(\$a);
sf.dat in this case is a UTF-8 encoded file with some links in MediaWiki markup (you can see it here).
After some consulting with people on #perl IRC channel we found a work around and my work on Wikipedia could continue. The crash seems to be caused by the pos() function, which can be replaced with lookups to @+ and @- arrays.
I haven't yet reported this issue since I was told Perl 5.10 is just around the corner, which might have this issue already fixed.