←back to thread

698 points jgrahamc | 2 comments | | HN request time: 0.571s | source
Show context
hn_throwaway_99 ◴[] No.20425667[source]
One thing that was interesting to me:

The outage was caused by a regex that ended up doing a lot of backtracking, which caused PCRE, the regex engine, to essentially handle a runaway expression.

This reminded me of a HN post from a couple months back by the author of Google Code Search, and how it worked: https://swtch.com/~rsc/regexp/regexp4.html . Interestingly, he wrote his own regex engine, RE2, specifically because PCRE and others did not use real automata and he needed a way to do arbitrary regex search safely.

replies(4): >>20425803 #>>20426118 #>>20426331 #>>20426651 #
bsder ◴[] No.20426118[source]
The problem is that a deterministic regex engine (deterministic finite automata or DFA) is strictly less powerful than a non-deterministic one (NFA). DFA's can't backtrack, for example. In addition, DFA's can be quite a bit slower for certain inputs and matches.
replies(6): >>20426140 #>>20426328 #>>20426344 #>>20426421 #>>20426551 #>>20427243 #
1. Firerouge ◴[] No.20426140[source]
Perhaps a parser exists that can determine if an input regex is runaway backtrack prone, and can automatically switch to a deterministic algorithm?
replies(1): >>20426282 #
2. ehaliewicz2 ◴[] No.20426282[source]
Just check if it uses backreferences, otherwise it can be implemented via NFA/DFA.