←back to thread

Parser Combinators Beat Regexes

(entropicthoughts.com)
122 points mooreds | 1 comments | | HN request time: 0.405s | source
Show context
DadBase ◴[] No.43639894[source]

Parser combinators are great until you need to parse something real, like CSV with embedded newlines and Excel quotes. That’s when you reach for the reliable trio: awk, duct tape, and prayer.

replies(2): >>43640949 #>>43640977 #
iamevn ◴[] No.43640977[source]

I don't follow why parser combinators would be a bad tool for CSV. It seems like one would specify a CSV parser as (pardon the pseudocode):

  separator = ','
  quote = '"'
  quoted_quote = '""'
  newline = '\n'
  plain_field = sequence(char_except(either(separator, quote, newline)))
  quoted_field = quote + sequence(either(char_except(quote), quoted_quote)) + quote 
  field = either(quoted_field, plain_field)
  row = sequence_with_separator(field, separator)
  csv = sequence_with_separator(row, newline)

Seems fairly natural to me, although I'll readily admit I haven't had to write a CSV parser before so I'm surely glossing over some detail.

replies(2): >>43641113 #>>43643933 #
1. DadBase ◴[] No.43643933[source]

Ah, you've clearly never had to parse a CSV exported from a municipal parking database in 2004. Quoted fields inside quoted fields, carriage returns mid-name, and a column that just says "ERROR" every 37th row. Your pseudocode would flee the scene.