←back to thread

Parser Combinators Beat Regexes

(entropicthoughts.com)
120 points mooreds | 6 comments | | HN request time: 1.034s | source | bottom
Show context
o11c ◴[] No.43639374[source]
Note that most implementations of both parser combinators and regexes can fail very badly (exponential time). Never use either on untrusted input, unless you can prove your implementation lets you stay linear.
replies(4): >>43639552 #>>43639845 #>>43640240 #>>43641298 #
internet_points ◴[] No.43641298[source]
This is one of the reasons I've been afraid to use parser combinators too heavily. With regular (finite-state) languages I know their time usage, with parser combinators I have no clue, and there are so many different libraries and implementations and some are monadic and some are applicative and few mention worst-case. There are benchmarks https://gitlab.com/FinnBender/haskell-parsing-benchmarks#byt... but what kinds of "dangerous operations" do I have to watch out for when writing with parser combinators?

(I guess part of the problem is just that regular languages have been studied since Kleene had a full head of hair, while parser combinators were more or less unknown until the 80's.)

replies(2): >>43641727 #>>43643059 #
mrkeen ◴[] No.43641727[source]
> I've been afraid to use parser combinators

> With regular (finite-state) languages I know their time usage

Are you talking about parsers or grammars?

replies(1): >>43643009 #
1. wyager ◴[] No.43643009[source]
There's a correspondence between ???/applicative/monadic parsers and regular/context free/context sensitive grammars.
replies(2): >>43643282 #>>43644232 #
2. internet_points ◴[] No.43643282[source]
Now I'm really curious what ??? will turn out to be :-D
3. BoiledCabbage ◴[] No.43644232[source]
Interesting, could you give some more detail?
replies(1): >>43644669 #
4. wyager ◴[] No.43644669[source]
I tried googling to find an article, and I found some stuff explaining it, but this seems to be deeper lore than I thought it was.

Basically, a monadic parser combinator can have code that "inspects" a previously parsed value (context-sensitivity) but an applicative parser cannot.

Imagine an input like "3 alice bob dave", with a number and then that many names.

We want to parse

   data Parsed = Parsed {count :: Int, names :: [Name]}
Example: monadic parser:

  count <- parseInt
  names <- parseN count name
  return (Parsed count names)
You need to know the value of count before you keep parsing. Context-sensitive.

Applicative parsers don't let you "inspect" the parsed values. You can do stuff like

  Parsed <$> parseInt <*> many name

But if it's not clear where in the input the list of name ends without looking at the output of `parseInt`, you're hosed. There's no way to inspect the output of "parseInt" while you are parsing with an applicative parser.

You could do something like:

      Parsed <$> literal "1" <*> replicateM 1 name
  <|> Parsed <$> literal "2" <*> replicateM 2 name
  <|> ...
where you have an alternative case for each possible number, but obviously this does not generalize to parse realistic inputs.

Technically, you can use Haskell's laziness to parse this particular grammar efficiently enough using applicatives+alternatives to construct an infinite parse tree, but that's kind of an advanced edge case that won't work in most languages.

replies(1): >>43645679 #
5. BoiledCabbage ◴[] No.43645679{3}[source]
Thanks, and yes that makes perfect sense. It's a useful way to think about the problem space.

And then it does lead back to your "????" - which presumably represents the answer to the question of "What's the simplest abstraction that allows one to build a "Parser" (would this still be using combinators??) that is powerful enough to parse regular languages, but, by design, not powerful enough to parse context-free languages?"

replies(1): >>43647041 #
6. wyager ◴[] No.43647041{4}[source]
Yeah, exactly. I don't know what it would look like. It would be nice if the answer was "functorial", since that's at the bottom of the functor-applicative-monad hierarchy, but functor doesn't provide any way to actually combine the combinators, so that's out.