←back to thread

Parser Combinators Beat Regexes

(entropicthoughts.com)
120 points mooreds | 4 comments | | HN request time: 0s | source
Show context
o11c ◴[] No.43639374[source]
Note that most implementations of both parser combinators and regexes can fail very badly (exponential time). Never use either on untrusted input, unless you can prove your implementation lets you stay linear.
replies(4): >>43639552 #>>43639845 #>>43640240 #>>43641298 #
1. giraffe_lady ◴[] No.43639552[source]
PEGs don't have this problem right? Linear time over length of input? Though I guess most peg libraries probably aren't "pure" pegs to get around its limitations and may reintroduce this issue?
replies(3): >>43639629 #>>43639698 #>>43650782 #
2. yen223 ◴[] No.43639629[source]
PEGs don't have this problem only because they place restrictions on the grammar.

In practice, this isn't a problem, but it does require you write grammar rules in a specific way (e.g. no left-recursion)

3. o11c ◴[] No.43639698[source]
No, PEG is prone to exactly this in its default form. Parser combinator usually means something PEG-ish anyway.

If you memoize (packrat), it improves to polynomial time (not sure what, but it's not linear; that's false advertising and a fraudulent claim), but you're still stuck with the vulnerability to bugs in the grammar.

A better idea is to create a parser-combinator-style API on top of an LL or LR backend, for guaranteed linear time and only enough stack space for the deepest part of the AST.

4. vrighter ◴[] No.43650782[source]
Only if a packrat parser is implemented, which requires a lot of memory.