- don't drop the info gathered from checks while validating, but keep track of it
- if you do this, you'll effectively be parsing
- parsing is more powerful that validating
"Extra steps" would be keeping track of info gathered from checks.
Not to steal vvillena's thunder, but that's pretty much the dictionary definition of "parsing"
> analyze (a string or text) into logical syntactic components, typically in order to test conformability to a logical grammar.
Parsing is taking some collection of symbols, and emitting some other structure that obeys certain rules. Those symbols need not be text, they can be any abstract "thing". A symbol could be a full-blown data structure - you can parse a List into a NotEmptyList, where there's some associated grammar with the NEL that's a stricter version of the List grammar.
I also find this a confusing use of the word "parse", and it's not explained in the post, and I think "parse, don't validate" is a poor slogan as a result. The traditional slogan is "make illegal states unrepresentable", though that's a bit narrower of a concept.
There's a field of study called "parsing", which studies "parsers". Hundreds of papers. Very well defined problem: turning a list of symbols into a tree shaped parse tree (or data structure). The defining aspect of parsing, that makes it difficult and an interesting thing to study, is that you're starting with a list and ending with a tree. If you're converting tree to tree (that is, a typical data structure to a typical data structure), all the problems vanish (or change drastically) and all the parsing techniques are inapplicable.
I'm kind of annoyed that people are starting to use the word "parse" metaphorically. Bit by bit, precise words turn fuzzy. Alas, it will be a lost battle.
Ha yeah nice catch, that's why I added that in there. In this case the dictionary is slightly wrong.
> The defining aspect of parsing, that makes it difficult and an interesting thing to study, is that you're starting with a list and ending with a tree.
Ah, I didn't know that! Great bit to learn.
In that case, I will say that the "increase a data structure's rules" is a bit ambiguous.
I think my statement is still correct in that "a symbol could be a data structure," right? Like you could take a list of dicts and emit a tree of dicts.
But wait, a list is a kind of tree, or rather, there is a parse tree of recursive head/tail branches. So I think you could still argue List->NotEmptyList is a Parse because NEL requires a nonzero "head" and zero or one NEL as "tail."
Yeah I guess. Text combinator libraries like Haskell's Parsec and Rust's Nom are typically parametric over the type of "characters". Realistically I don't think I've ever seen anyone use one of those libraries for an input that wasn't text-like, though; do you have a use case in mind?
> But wait, a list is a kind of tree, or rather, there is a parse tree of recursive head/tail branches.
Yes, so you can run into parsing problems when working with trees, if you work really hard at it. But if you do the correct action is "reconsider your life choices" and not "use parsing theory".
Voila, now your 'string' is 'binary data' not 'text'.
Parsing binary data is my bread and butter, so I might be biased but: it works fine.
Anything which comes over the wire is a string, anything which comes out of store is a string. If you're using something like protobufs, that's great, because having to marshal/serialize/parse along every process boundary is expensive and probably unnecessary.
But at some point, and anywhere on the 'surface' of the system, data has to be un-flattened into a shape. That's parsing.