←back to thread

Parser Combinators Beat Regexes

(entropicthoughts.com)

122 points mooreds | 2 comments | 09 Apr 25 21:53 UTC | HN request time: 0.448s | source

1. pklausler ◴[10 Apr 25 14:28 UTC] No.43644154[source]▶

>>43638422 (OP) #

Parser combinators work well for big messy problems as well as regexes; I used a custom library of combinators to write the Fortran parser for LLVM. They’re great for error recovery and for disambiguating tricky situations, of which Fortran has more than its share; I ended up parsing cooked characters without a tokenization phase.

One hard part is getting good performance when the same production is being tried as part of multiple higher level alternatives and backtracking is taking place. You can get into trouble for bad input cases, like expressions with thousands of operands being used in a context that isn’t clear until you get to characters after the expression. But there’s ways to solve that problem, so I recommend combinators to you if you need to write a parser.

replies(1): >>43646027 #

2. pklausler ◴[10 Apr 25 17:13 UTC] No.43646027[source]▶

>>43644154 (TP) #

(adding to the above)

Another downside of parser combinators, at least the ones that I wrote as a basis for Fortran parsing, is that if you implement them as C++ template classes, the types get astonishingly long in their names. This leads to long compilation times and makes debuggers very hard to use. Both of these concerns only affect the developer, of course, and seemed to me to be acceptable so long as the parser is fast and robust.

I'd use parser combinators again if I ever need to write a big parser, although I might not use overloaded operators (">>", "/") and confusing function names ("many" vs. "some") inherited from Haskell's Parsec library.