Parser Combinators Beat Regexes

(entropicthoughts.com)

122 points mooreds | 1 comments | 09 Apr 25 21:53 UTC | HN request time: 0.243s | source

Show context

torginus ◴[10 Apr 25 07:06 UTC] No.43641440[source]▶

Functional programming languages are uniquely suitable for writing parsers for grammars, their whole feature set (discriminated unions, pattern matching etc.) is very suited to turning text into ASTs. It's not often that one needs to parse a custom DSL grammar.

That said, I'd say it's a pretty rare occasion to have that need, and other languages have great libraries for accomplishing the same, I'm partial to Chevrotain for Javascript (https://chevrotain.io/docs/), which has a small DSL for describing grammars.

Parser combinators are both less efficient and less featureful than bespoke solutions, which is a problem when you're writing a parser for your custom language, and you want to add extra features, like resilience, or rich error reporting.

To add a bit of snark, functional languages seem to best suited for writing parsers/compilers for other functional programming languages, which is pretty telling of the inward-facing functional culture.

replies(1): >>43641602 #

mrkeen ◴[10 Apr 25 07:38 UTC] No.43641602[source]▶

>>43641440 #

> Parser combinators are both less efficient and less featureful than bespoke solutions, which is a problem when you're writing a parser for your custom language, and you want to add extra features, like resilience, or rich error reporting.

This is untrue. PCs are the bespoke solutions. The article shows how to introduce a feature like state. Parser generators are the technique which just dump fixed-functionality source code on you.

replies(1): >>43641778 #

1. torginus ◴[10 Apr 25 08:09 UTC] No.43641778[source]▶

>>43641602 #

Parser combinators define a recursive parsing implementation, as well as declaration syntax. While it's very flexible in terms of what it can parse, it has no fixed guarantees on making progress on the token stream or having a bounded lookahead size before they can parse anything.

A hand written LL(1) parser, (for a suitable grammar) is much more efficient than a parser combinator, afaik most production languages use a hand-written parser, with parser combinators not really being used in high-effort implementations.

I have no extensive experience in surveying parser generators (only using them, and they seemed to be fast enough), I'd guess it's up to the implementation on how efficient parsing code is, with my ballpark assumption being that it's worse than handwritten code.

↑