26.03.2008 18:27
Juggling parentheses in Perl
Matching balanced constructs (like some text that must contain matching pairs of parentheses) is a classical example where traditional regular expressions fail. It isn't surprising then that the Perl FAQ says that you should use text parsing functions from Text::Balanced module when you need to do that.
Perl however also provides extended regular expressions that among a lot of other things also support this kind of matches. Theory says that such extended expressions can't be compiled to finite automata and so typically have worse complexity than the traditional O(n) - something that is also mentioned in another part of Perl documentation.
So when I was confronted today with a problem that involved matching balanced "{{{" and "}}}" pairs (guess what kind of markup uses them) I naturally followed the suggestion in the FAQ.
Well, it turned out that that suggestion isn't one of the best ones. Text::Balanced is slow. Actually it's so slow that the identical code that uses the extended regular expression from Regexp::Common::balanced is around 50 times faster.
So much for the difference between theory and practice.
Posted by Tomaž | Categories: Code
