Parser Combinators Beat Regexes

(entropicthoughts.com)

122 points mooreds | 1 comments | 09 Apr 25 21:53 UTC | HN request time: 0.217s | source

Show context

o11c ◴[10 Apr 25 00:17 UTC] No.43639374[source]▶

Note that most implementations of both parser combinators and regexes can fail very badly (exponential time). Never use either on untrusted input, unless you can prove your implementation lets you stay linear.

replies(4): >>43639552 #>>43639845 #>>43640240 #>>43641298 #

thaumasiotes ◴[10 Apr 25 01:46 UTC] No.43639845[source]▶

>>43639374 #

Only PCREs are exponential time, in service of a feature you basically never need. Regexes are always linear time.

They can take exponential space, though, so I'm not sure why knowing you'll be able to process the data in linear time is supposed to keep you safe.

replies(4): >>43639936 #>>43639940 #>>43640180 #>>43640937 #

1. pjscott ◴[10 Apr 25 02:03 UTC] No.43639940[source]▶

>>43639845 #

This depends on the implementation of the regex engine; most are potentially superlinear in time, since that’s the easiest way of doing it, and quite fast until suddenly it isn’t. I always check the algorithm being used before I use regular expressions in production. I was surprised how many use a recursive descent strategy!

(Also, BTW, you can deal with the exponential space issue by using an NFA instead of a DFA – it’s slower that way, but the memory space required is reliably linearly proportional to the size of the regex.)

↑