←back to thread

698 points jgrahamc | 1 comments | | HN request time: 0s | source
Show context
hn_throwaway_99 ◴[] No.20425667[source]
One thing that was interesting to me:

The outage was caused by a regex that ended up doing a lot of backtracking, which caused PCRE, the regex engine, to essentially handle a runaway expression.

This reminded me of a HN post from a couple months back by the author of Google Code Search, and how it worked: https://swtch.com/~rsc/regexp/regexp4.html . Interestingly, he wrote his own regex engine, RE2, specifically because PCRE and others did not use real automata and he needed a way to do arbitrary regex search safely.

replies(4): >>20425803 #>>20426118 #>>20426331 #>>20426651 #
bsder ◴[] No.20426118[source]
The problem is that a deterministic regex engine (deterministic finite automata or DFA) is strictly less powerful than a non-deterministic one (NFA). DFA's can't backtrack, for example. In addition, DFA's can be quite a bit slower for certain inputs and matches.
replies(6): >>20426140 #>>20426328 #>>20426344 #>>20426421 #>>20426551 #>>20427243 #
1. Thorrez ◴[] No.20426421[source]
I don't know what you mean by "DFA's can't backtrack". Maybe you mean DFAs don't support backreferences, which is true, but NFAs don't support backreferences either.

I believe if r is the size of the regex and d is the size of the data, an NFA is O(r) to compile and O(rd) to execute, while a DFA is O(2^r) to compile and O(d) to execute. So DFAs are slower to compile, but faster to execute.