←back to thread

Parse, Don't Validate (2019)

(lexi-lambda.github.io)
389 points melse | 1 comments | | HN request time: 0s | source
Show context
bruce343434 ◴[] No.27640435[source]
This still sounds like validation but with extra steps. (or less?)
replies(3): >>27640629 #>>27640647 #>>27642145 #
plesiv ◴[] No.27640629[source]
The post is saying:

- don't drop the info gathered from checks while validating, but keep track of it

- if you do this, you'll effectively be parsing

- parsing is more powerful that validating

"Extra steps" would be keeping track of info gathered from checks.

replies(1): >>27641503 #
bruce343434 ◴[] No.27641503[source]
Right. My takeaway was "verify and validate once, then put it in a specially marked datastructure, or if your language allow it make the typesystem guarantee some conditions of the data, then work with that from there". Where does parsing come in the picture?
replies(3): >>27642118 #>>27642191 #>>27642202 #
kortex ◴[] No.27642191[source]
> verify and validate once, then put it in a specially marked datastructure

Not to steal vvillena's thunder, but that's pretty much the dictionary definition of "parsing"

> analyze (a string or text) into logical syntactic components, typically in order to test conformability to a logical grammar.

Parsing is taking some collection of symbols, and emitting some other structure that obeys certain rules. Those symbols need not be text, they can be any abstract "thing". A symbol could be a full-blown data structure - you can parse a List into a NotEmptyList, where there's some associated grammar with the NEL that's a stricter version of the List grammar.

replies(1): >>27642332 #
justinpombrio ◴[] No.27642332{3}[source]
Haha, "Those symbols need not be text", you say, right after quoting a definition that says they need to be "a string or text"!

There's a field of study called "parsing", which studies "parsers". Hundreds of papers. Very well defined problem: turning a list of symbols into a tree shaped parse tree (or data structure). The defining aspect of parsing, that makes it difficult and an interesting thing to study, is that you're starting with a list and ending with a tree. If you're converting tree to tree (that is, a typical data structure to a typical data structure), all the problems vanish (or change drastically) and all the parsing techniques are inapplicable.

I'm kind of annoyed that people are starting to use the word "parse" metaphorically. Bit by bit, precise words turn fuzzy. Alas, it will be a lost battle.

replies(2): >>27643530 #>>27645199 #
1. samatman ◴[] No.27645199{4}[source]
Sure, but the list of symbols can be an arbitrary collection where the symbols are 0 and 1.

Voila, now your 'string' is 'binary data' not 'text'.

Parsing binary data is my bread and butter, so I might be biased but: it works fine.

Anything which comes over the wire is a string, anything which comes out of store is a string. If you're using something like protobufs, that's great, because having to marshal/serialize/parse along every process boundary is expensive and probably unnecessary.

But at some point, and anywhere on the 'surface' of the system, data has to be un-flattened into a shape. That's parsing.