←back to thread

Parser Combinators Beat Regexes

(entropicthoughts.com)
120 points mooreds | 8 comments | | HN request time: 2.039s | source | bottom
1. DadBase ◴[] No.43639894[source]
Parser combinators are great until you need to parse something real, like CSV with embedded newlines and Excel quotes. That’s when you reach for the reliable trio: awk, duct tape, and prayer.
replies(2): >>43640949 #>>43640977 #
2. masklinn ◴[] No.43640949[source]
Seems like exactly the situation where you’d want parsers, because they can do any form of quoting \ escaping you want and have no reason to care about newlines any more than they do any other character.
replies(1): >>43643957 #
3. iamevn ◴[] No.43640977[source]
I don't follow why parser combinators would be a bad tool for CSV. It seems like one would specify a CSV parser as (pardon the pseudocode):

  separator = ','
  quote = '"'
  quoted_quote = '""'
  newline = '\n'
  plain_field = sequence(char_except(either(separator, quote, newline)))
  quoted_field = quote + sequence(either(char_except(quote), quoted_quote)) + quote 
  field = either(quoted_field, plain_field)
  row = sequence_with_separator(field, separator)
  csv = sequence_with_separator(row, newline)
Seems fairly natural to me, although I'll readily admit I haven't had to write a CSV parser before so I'm surely glossing over some detail.
replies(2): >>43641113 #>>43643933 #
4. kqr ◴[] No.43641113[source]
I think GP was sarcastic. We have these great technologies available but people end up using duct tape and hope anyway.
replies(1): >>43643935 #
5. DadBase ◴[] No.43643933[source]
Ah, you've clearly never had to parse a CSV exported from a municipal parking database in 2004. Quoted fields inside quoted fields, carriage returns mid-name, and a column that just says "ERROR" every 37th row. Your pseudocode would flee the scene.
6. DadBase ◴[] No.43643935{3}[source]
Exactly. At some point every parser combinator turns into a three-line awk script that runs perfectly as long as the moon is waning and the file isn’t saved from Excel for Mac.
7. DadBase ◴[] No.43643957[source]
Parsers can handle it, sure, but then you blink and you're ten layers deep trying to explain why a single unmatched quote ate the rest of the file. Sometimes a little awk and a strong coffee gets you further.
replies(1): >>43648879 #
8. maxbond ◴[] No.43648879{3}[source]
Use what's working for you but one of the strengths of parser combinators is managing abstraction so that you can reason locally rather than having to manage things "ten layers deep." That's more of a problem in imperative approaches where you are manually managing a complex state machine, and can indeed have very deep `if` hierarchies.

Parser combinator are good at letting you express the grammar as a relationship among functions so that the compiler handles the nitty-gritty and error prone bits. (Regexes also abstract away these details, of course.)