canspice.org

home about code feeds archives links

OSCON 2007: Advanced Parsing Techniques, by Mark-Jason Dominus

Mark-Jason Dominus kicks off Monday’s afternoon tutorial with Advanced Parsing Techniques (for Perl).

What’s parsing? It’s the process of taking an unstructured input (such as a sequence of characters) and turning it into a data structure. Parse::RecDescent is a closed system parser, but we’re going to look at an open one: HOP::Parser.

An example of something we might want to parse is a mathematical function that a user has input from a webpage (e.g. (x^2 + 3*x) * sin(x*2) +14). An easy solution is to use eval to turn user input into compiled Perl code. The problem with this is it’s easy to make it go wrong. An alternative would be to implement an evaluator for expressions. This would take your string and turn it into a list of tokens, a process called lexing. Perl is good for this because of its regular expression engine.

When you’re parsing, you generally need a grammar, which describes all of the expressions and tokens and how they interrelate. Parse::RecDescent basically uses this method, where each grammar rule becomes a function.

So far we’ve done a lot of parsing (including a lot of examples that aren’t included here), but what about evaluation? Enter a bunch of examples of how to do this that I’m not going to reproduce. Go buy his book. :-)

Tags: , , , , ,

Leave a Reply

Name (required)
Mail (required)
Website
Comments