Parser Combinators Beat Regexes

(entropicthoughts.com)

122 points mooreds | 1 comments | 09 Apr 25 21:53 UTC | HN request time: 0.209s | source

Show context

o11c ◴[10 Apr 25 00:17 UTC] No.43639374[source]▶

Note that most implementations of both parser combinators and regexes can fail very badly (exponential time). Never use either on untrusted input, unless you can prove your implementation lets you stay linear.

replies(4): >>43639552 #>>43639845 #>>43640240 #>>43641298 #

internet_points ◴[10 Apr 25 06:42 UTC] No.43641298[source]▶

>>43639374 #

This is one of the reasons I've been afraid to use parser combinators too heavily. With regular (finite-state) languages I know their time usage, with parser combinators I have no clue, and there are so many different libraries and implementations and some are monadic and some are applicative and few mention worst-case. There are benchmarks https://gitlab.com/FinnBender/haskell-parsing-benchmarks#byt... but what kinds of "dangerous operations" do I have to watch out for when writing with parser combinators?

(I guess part of the problem is just that regular languages have been studied since Kleene had a full head of hair, while parser combinators were more or less unknown until the 80's.)

replies(2): >>43641727 #>>43643059 #

1. wyager ◴[10 Apr 25 12:15 UTC] No.43643059[source]▶

>>43641298 #

The evaluation model of most of these parser combinator libraries is simple enough that you can just think directly about the call tree. It's difficult to get "surprised" like you can with PCREs. E.g. (x+x+)+y as a regex may blow up, but "many1 (many1 x >> many1 x) >> y" will just evaluate forwards a single time, greedily, and fail, at least with any haskell parser combinator library I can think of.

↑