←back to thread

Parser Combinators Beat Regexes

(entropicthoughts.com)
120 points mooreds | 4 comments | | HN request time: 0.638s | source
Show context
DadBase ◴[] No.43639894[source]
Parser combinators are great until you need to parse something real, like CSV with embedded newlines and Excel quotes. That’s when you reach for the reliable trio: awk, duct tape, and prayer.
replies(2): >>43640949 #>>43640977 #
1. iamevn ◴[] No.43640977[source]
I don't follow why parser combinators would be a bad tool for CSV. It seems like one would specify a CSV parser as (pardon the pseudocode):

  separator = ','
  quote = '"'
  quoted_quote = '""'
  newline = '\n'
  plain_field = sequence(char_except(either(separator, quote, newline)))
  quoted_field = quote + sequence(either(char_except(quote), quoted_quote)) + quote 
  field = either(quoted_field, plain_field)
  row = sequence_with_separator(field, separator)
  csv = sequence_with_separator(row, newline)
Seems fairly natural to me, although I'll readily admit I haven't had to write a CSV parser before so I'm surely glossing over some detail.
replies(2): >>43641113 #>>43643933 #
2. kqr ◴[] No.43641113[source]
I think GP was sarcastic. We have these great technologies available but people end up using duct tape and hope anyway.
replies(1): >>43643935 #
3. DadBase ◴[] No.43643933[source]
Ah, you've clearly never had to parse a CSV exported from a municipal parking database in 2004. Quoted fields inside quoted fields, carriage returns mid-name, and a column that just says "ERROR" every 37th row. Your pseudocode would flee the scene.
4. DadBase ◴[] No.43643935[source]
Exactly. At some point every parser combinator turns into a three-line awk script that runs perfectly as long as the moon is waning and the file isn’t saved from Excel for Mac.