If you memoize (packrat), it improves to polynomial time (not sure what, but it's not linear; that's false advertising and a fraudulent claim), but you're still stuck with the vulnerability to bugs in the grammar.
A better idea is to create a parser-combinator-style API on top of an LL or LR backend, for guaranteed linear time and only enough stack space for the deepest part of the AST.
They can take exponential space, though, so I'm not sure why knowing you'll be able to process the data in linear time is supposed to keep you safe.
(Also, BTW, you can deal with the exponential space issue by using an NFA instead of a DFA – it’s slower that way, but the memory space required is reliably linearly proportional to the size of the regex.)
This includes most flavors of regexp you find in the wild: Python’s re module, JavaScript regular expressions, Ruby’s regular expressions, Perl, PCRE, and even basic and extended REs used in grep.
Russ Cox has written some very accessible posts on the linear (in input) properties of NFAs and the equivalence of Thompson regular expressions [0]. There’s also quite a bit of literature on the Glushkov construction of regular expressions (and its NFA equivalence) [1] that’s worth reading if you find the topic interesting.
Both Go and Rust have non-backtracking regular expression libraries, and you can find solid non-backtracking C and C++ libraries for regular expressions (eg: libfsm and Hyperscan).
0: https://swtch.com/~rsc/regexp/ 1: My favorite is _Flexible Pattern Matching Strings: Practical On-Line Search Algorithms for Texts and Biological Sequences_ by Gonzalo Navarro
Any re dialect which supports backtracking necessarily has a non-linear worst case, and while a select few have very high resilience against exponential backtracking (e.g. never managed to make postgres fall over) most can be made to fail with a pattern a few characters long.
FA-based engines are getting more popular, but they’re far from universal.
(I guess part of the problem is just that regular languages have been studied since Kleene had a full head of hair, while parser combinators were more or less unknown until the 80's.)
Basically, a monadic parser combinator can have code that "inspects" a previously parsed value (context-sensitivity) but an applicative parser cannot.
Imagine an input like "3 alice bob dave", with a number and then that many names.
We want to parse
data Parsed = Parsed {count :: Int, names :: [Name]}
Example: monadic parser: count <- parseInt
names <- parseN count name
return (Parsed count names)
You need to know the value of count before you keep parsing. Context-sensitive.Applicative parsers don't let you "inspect" the parsed values. You can do stuff like
Parsed <$> parseInt <*> many name
But if it's not clear where in the input the list of name ends without looking at the output of `parseInt`, you're hosed. There's no way to inspect the output of "parseInt" while you are parsing with an applicative parser.You could do something like:
Parsed <$> literal "1" <*> replicateM 1 name
<|> Parsed <$> literal "2" <*> replicateM 2 name
<|> ...
where you have an alternative case for each possible number, but obviously this does not generalize to parse realistic inputs.Technically, you can use Haskell's laziness to parse this particular grammar efficiently enough using applicatives+alternatives to construct an infinite parse tree, but that's kind of an advanced edge case that won't work in most languages.
And then it does lead back to your "????" - which presumably represents the answer to the question of "What's the simplest abstraction that allows one to build a "Parser" (would this still be using combinators??) that is powerful enough to parse regular languages, but, by design, not powerful enough to parse context-free languages?"