Parser Combinators Beat Regexes

(entropicthoughts.com)

122 points mooreds | 2 comments | 09 Apr 25 21:53 UTC | HN request time: 0.531s | source

Show context

yen223 ◴[10 Apr 25 00:49 UTC] No.43639551[source]▶

Parser combinators are one of those ideas that is really powerful in practice, but will never be mainstream because it had the bad fortune of coming from the functional programming world. It's a shame too, because parser combinators are what made parsers click for me.

Take the hard problem of parsing a language, break it down into the much simpler problem of reading tokens, then write simple "combinators" (and, or, one_or_many, etc etc) to combine these simple parsers into larger more complex parsers.

You could almost build an entire parser combinator library from scratch from memory, because none of the individual components are complicated.

replies(13): >>43639775 #>>43639805 #>>43639834 #>>43640597 #>>43641009 #>>43641205 #>>43641459 #>>43641675 #>>43642100 #>>43642148 #>>43643853 #>>43644151 #>>43650405 #

thaumasiotes ◴[10 Apr 25 01:30 UTC] No.43639775[source]▶

>>43639551 #

> Take the hard problem of parsing a language, break it down into the much simpler problem of reading tokens, then write simple "combinators" (and, or, one_or_many, etc etc) to combine these simple parsers into larger more complex parsers.

That's a terrible example for "parser combinators beat regexes"; those are the three regular operations. (Except that you use zero_or_many in preference to one_or_many.)

replies(1): >>43641710 #

1. mrkeen ◴[10 Apr 25 07:57 UTC] No.43641710[source]▶

>>43639775 #

Can I see the (implied) counter-example?

A regex which turns a language's source code into an AST?

replies(1): >>43642623 #

2. thaumasiotes ◴[10 Apr 25 10:46 UTC] No.43642623[source]▶

>>43641710 (TP) #

What implied counterexample? Regular expressions define the simplest language type a CS education generally covers. You would expect anything to be more capable.

But you wouldn't prove it by demonstrating that you can recognize regular languages. That's the thing regular expressions are good at!

↑