←back to thread

698 points jgrahamc | 2 comments | | HN request time: 0.449s | source
Show context
hn_throwaway_99 ◴[] No.20425667[source]
One thing that was interesting to me:

The outage was caused by a regex that ended up doing a lot of backtracking, which caused PCRE, the regex engine, to essentially handle a runaway expression.

This reminded me of a HN post from a couple months back by the author of Google Code Search, and how it worked: https://swtch.com/~rsc/regexp/regexp4.html . Interestingly, he wrote his own regex engine, RE2, specifically because PCRE and others did not use real automata and he needed a way to do arbitrary regex search safely.

replies(4): >>20425803 #>>20426118 #>>20426331 #>>20426651 #
bsder ◴[] No.20426118[source]
The problem is that a deterministic regex engine (deterministic finite automata or DFA) is strictly less powerful than a non-deterministic one (NFA). DFA's can't backtrack, for example. In addition, DFA's can be quite a bit slower for certain inputs and matches.
replies(6): >>20426140 #>>20426328 #>>20426344 #>>20426421 #>>20426551 #>>20427243 #
1. auscompgeek ◴[] No.20426328[source]
Actually, it is proven that NFAs and DFAs are equally expressive. See https://en.wikipedia.org/wiki/Powerset_construction
replies(1): >>20426662 #
2. bsder ◴[] No.20426662[source]
"You are technically correct. The best kind of correct."

In theory, your statement is perfectly correct. However, quoting that reference:

"However, if the NFA has n states, the resulting DFA may have up to 2^n states, an exponentially larger number, which sometimes makes the construction impractical for large NFAs."

This means that in practice, DFAs are larger, slower, and sometimes can't be run at all if complex enough.

However, this was my mistake. I remembered (vaguely) the 2^n issue and didn't follow up to make sure I was accurate.

And I completely spaced on the fact that neither NFA's nor DFA's handle backreferences without extension.