Parser Combinators Beat Regexes

1. arn3n ◴[10 Apr 25 00:34 UTC] No.43639465[source]▶

“In other languages, it would be considered overkill to write a full parser when a simple regex can do the same thing. In Haskell, writing a parser is no big deal. We just do it and move on with our lives.”

I see a long code file filled with comments and long paragraph-level explanations. I think I’d rather just learn and use regex.

replies(5): >>43639538 #>>43639912 #>>43639965 #>>43641791 #>>43644069 #

2. OneDeuxTriSeiGo ◴[10 Apr 25 00:47 UTC] No.43639538[source]▶

>>43639465 (TP) #

I mean that's the nature of article code no? You are writing for a generic audience and want to be very explicit as to what your code is doing to teach the audience.

For your average haskell programmer you could drop all of those comments since the code isn't really doing anything that couldn't be determined by just reading it.

3. codebje ◴[10 Apr 25 01:58 UTC] No.43639912[source]▶

>>43639465 (TP) #

The main advantage of recursive descent parsing is readability. Forget the machinery of the parser, which is (a) trivial enough that AI will generate correctly it for you with next to no prompting, and (b) widely available in libraries anyway. The win is writing a parser that reads like the grammar it implements, which means changes to your grammar are easy to apply to your parser.

Regexes are effectively write-only. If you change the grammar parsed by a regex, you're probably going to discard the regex and make a new one.

4. layer8 ◴[10 Apr 25 02:07 UTC] No.43639965[source]▶

>>43639465 (TP) #

Whenever I write a regex, I end up with a comments roughly ten times longer than the regex. That being said, regular expressions are often the right tool for the job (i.e. parsing a regular language, as opposed to a context-free language or whatever), just the syntax becomes unreadable rather quickly. I’m sure you could build a nicer regular-expression syntax in Haskell.

replies(3): >>43640776 #>>43640864 #>>43642797 #

5. CBLT ◴[10 Apr 25 05:01 UTC] No.43640776[source]▶

>>43639965 #

I love the verbose flag[0] to regex, so I can write comments inline.

[0] https://docs.python.org/3/library/re.html#re.VERBOSE

6. f1shy ◴[10 Apr 25 05:16 UTC] No.43640864[source]▶

>>43639965 #

Yes. Regex tend to become rather fast write only. One solution is commenting, but is still complex. What I like to do now (in C) is define parts of it. Just a crude example to get the idea:

   // STRing: matches anything inside quotes (single or double)
   #define STR "[\"'](.*)[\"']"
   // NUMber: matches decimal or hexadecimal numbers
   #define NUM "([[:digit:]]x?[[:xdigit:]]*)"
   
   regcomp(&reg_exp, STR NUM , REG_EXTENDED | REG_ICASE);

So at the end I compose the RE with the various parts, which are documented separately.

7. zokier ◴[10 Apr 25 11:29 UTC] No.43642797[source]▶

>>43639965 #

> just the syntax becomes unreadable rather quickly. I’m sure you could build a nicer regular-expression syntax in Haskell.

Of course regular expressions are really more of a category of expressions, and the traditional kleene star notation is only one of many options; regular expressions do not somehow inherently need to use that specific syntax.

Pomsky and VerbalExpressions are just some examples of alternative syntaxes for regex. Apparently there is even a port of VerbalExpressions for Haskell:

https://github.com/VerbalExpressions/HaskellVerbalExpression...

replies(1): >>43643224 #

8. qrobit ◴[10 Apr 25 12:43 UTC] No.43643224{3}[source]▶

>>43642797 #

I looked at the VerbalExpressionJS[1] example and it looks like combining parsers to me. If you need to make regex more verbose, better use parser combinator library when available. RegEx benefits compared to parser combinators other than compactness aren't obvious to me.

[1]: <https://github.com/VerbalExpressions/JSVerbalExpressions/tre...>

9. bazoom42 ◴[10 Apr 25 14:21 UTC] No.43644069[source]▶

>>43639465 (TP) #

Sounds like you think the comments and explantions are the problem? You can write regexes with comments and parsers without. Regexes are not generally known to be self explanatory, except for trivial cases like \d+