Parser Combinators Beat Regexes

(entropicthoughts.com)

122 points mooreds | 4 comments | 09 Apr 25 21:53 UTC | HN request time: 0.792s | source

Show context

o11c ◴[10 Apr 25 00:17 UTC] No.43639374[source]▶

Note that most implementations of both parser combinators and regexes can fail very badly (exponential time). Never use either on untrusted input, unless you can prove your implementation lets you stay linear.

replies(4): >>43639552 #>>43639845 #>>43640240 #>>43641298 #

1. giraffe_lady ◴[10 Apr 25 00:49 UTC] No.43639552[source]▶

>>43639374 #

PEGs don't have this problem right? Linear time over length of input? Though I guess most peg libraries probably aren't "pure" pegs to get around its limitations and may reintroduce this issue?

replies(3): >>43639629 #>>43639698 #>>43650782 #

2. yen223 ◴[10 Apr 25 01:02 UTC] No.43639629[source]▶

>>43639552 (TP) #

PEGs don't have this problem only because they place restrictions on the grammar.

In practice, this isn't a problem, but it does require you write grammar rules in a specific way (e.g. no left-recursion)

3. o11c ◴[10 Apr 25 01:16 UTC] No.43639698[source]▶

>>43639552 (TP) #

No, PEG is prone to exactly this in its default form. Parser combinator usually means something PEG-ish anyway.

If you memoize (packrat), it improves to polynomial time (not sure what, but it's not linear; that's false advertising and a fraudulent claim), but you're still stuck with the vulnerability to bugs in the grammar.

A better idea is to create a parser-combinator-style API on top of an LL or LR backend, for guaranteed linear time and only enough stack space for the deepest part of the AST.

4. vrighter ◴[11 Apr 25 05:32 UTC] No.43650782[source]▶

>>43639552 (TP) #

Only if a packrat parser is implemented, which requires a lot of memory.

↑