←back to thread

Parse, don't validate (2019)

(lexi-lambda.github.io)
398 points declanhaigh | 2 comments | | HN request time: 0.413s | source
Show context
bruce343434 ◴[] No.35053912[source]
Note that this basically requires your language to have ergonomic support for sum types, immutable "data classes", pattern matching.

The point is to parse the input into a structure which always upholds the predicates you care about so you don't end up continuously defensively programming in ifs and asserts.

replies(12): >>35054046 #>>35054070 #>>35054386 #>>35054514 #>>35054901 #>>35054993 #>>35055124 #>>35055230 #>>35056047 #>>35057866 #>>35058185 #>>35059271 #
crabbone ◴[] No.35054514[source]
It's not just about these limitations.

In order to be useful, type systems need to be simple, but there's no such restrictions on rules that govern our expectations of data correctness.

OP is delusional if they think that their approach can be made practical. I mean, what if the expectation from the data that an value is a prime number? -- How are they going to encode this in their type systems? And this is just a trivial example.

There are plenty of useful constraints we routinely expect in message exchanges that aren't possible to implement using even very elaborate type systems. For example, if we want to ensure that all ids in XML nodes are unique. Or that the last digit of SSN is a checksum of the previous digits using some complex formula. I mean, every Web developer worth their salt knows that regular expressions are a bad idea for testing email addresses (which would be an example of parsing), and it's really preferable to validate emails by calling a number of predicates on them.

And, of course, these aren't the only examples: password validation (the annoying part that asks for capital letter, digit, special character? -- I want to see the author implement a parser to parse possible inputs to password field, while also giving helpful error messages s.a. "you forgot to use a digit"). Even though I don't doubt it's possible to do that, the resulting code would be an abomination compared to the code that does the usual stuff, i.e. just checks if a character is in a set of characters.

replies(10): >>35054557 #>>35054562 #>>35054640 #>>35054916 #>>35054920 #>>35055046 #>>35055734 #>>35055902 #>>35056302 #>>35057473 #
1. PhilipRoman ◴[] No.35055902[source]
I think your comment was unfairly downvoted without objective reasons. This is a real issue with advanced type systems and the current solutions are not very good (although they can be practical in some cases) - you can either automatically decorate constructors with assertion code (slow) or trust external input (unsafe, something like __builtin_unreachable in C). And after you're done with that, good luck getting a deterministic and fast type checker which can verify proofs (which you have to write yourself) about arbitrary theorems in your program. Yes, I'm aware there exist languages that can do this to a degree but there is a good reason why they aren't used in mainstream software.

I genuinely wonder how one would write a proof in something like Agda, that

    parseJson("{foo:"+encodeJson(someObject)+"}") 
always succeeds
replies(1): >>35060965 #
2. crabbone ◴[] No.35060965[source]
I wish this was the first such case. But, what I see happen way too often is this:

Some dude comes up with another data definition language (DDL) that uses ML-style types. Everyone jumps from their seats in standing ovation. And in the end we get another useless configuration language that cannot come anywhere close to the needs of application developers, and so they pedal away on their squared-wheel bicycles of hand-rolled very custom data validation procedures.

This is even more disheartening because we already have created tools that made some very good progress into systematic input validation. And they were with us since the down of programming (well, almost, we had SQL since early 70's, then we also had Prolog, then we had various XML schema languages, and finally TLA+). It's amazing how people keep ignoring solutions that achieved so much compared to ensuring that a list isn't empty... and yet present it as the way forward...