Parse, don't validate (2019)

(lexi-lambda.github.io)

398 points declanhaigh | 1 comments | 07 Mar 23 08:47 UTC | HN request time: 0s | source

Show context

bruce343434 ◴[07 Mar 23 10:54 UTC] No.35053912[source]▶

Note that this basically requires your language to have ergonomic support for sum types, immutable "data classes", pattern matching.

The point is to parse the input into a structure which always upholds the predicates you care about so you don't end up continuously defensively programming in ifs and asserts.

replies(12): >>35054046 #>>35054070 #>>35054386 #>>35054514 #>>35054901 #>>35054993 #>>35055124 #>>35055230 #>>35056047 #>>35057866 #>>35058185 #>>35059271 #

crabbone ◴[07 Mar 23 12:18 UTC] No.35054514[source]▶

>>35053912 #

It's not just about these limitations.

In order to be useful, type systems need to be simple, but there's no such restrictions on rules that govern our expectations of data correctness.

OP is delusional if they think that their approach can be made practical. I mean, what if the expectation from the data that an value is a prime number? -- How are they going to encode this in their type systems? And this is just a trivial example.

There are plenty of useful constraints we routinely expect in message exchanges that aren't possible to implement using even very elaborate type systems. For example, if we want to ensure that all ids in XML nodes are unique. Or that the last digit of SSN is a checksum of the previous digits using some complex formula. I mean, every Web developer worth their salt knows that regular expressions are a bad idea for testing email addresses (which would be an example of parsing), and it's really preferable to validate emails by calling a number of predicates on them.

And, of course, these aren't the only examples: password validation (the annoying part that asks for capital letter, digit, special character? -- I want to see the author implement a parser to parse possible inputs to password field, while also giving helpful error messages s.a. "you forgot to use a digit"). Even though I don't doubt it's possible to do that, the resulting code would be an abomination compared to the code that does the usual stuff, i.e. just checks if a character is in a set of characters.

replies(10): >>35054557 #>>35054562 #>>35054640 #>>35054916 #>>35054920 #>>35055046 #>>35055734 #>>35055902 #>>35056302 #>>35057473 #

ocharles ◴[07 Mar 23 12:38 UTC] No.35054640[source]▶

>>35054514 #

> OP is delusional if they think that their approach can be made practical. I mean, what if the expectation from the data that an value is a prime number? -- How are they going to encode this in their type systems? And this is just a trivial example.

People get too caught up in thinking that the type _has_ to express intricate properties, it doesn't. How am I going to express the expectation that something is prime? With the following closed API:

  module Prime where

  data PrimeNumber

  parsePrime :: Int -> Maybe PrimeNumber
  toInt :: PrimeNumber -> Int

Now the problem is that _leaving_ this API forgets information. Whether or not that is a problem is a different question, and very dependent on the context.

The same applies to your comment about passwords. One can quite easily create a closed module that encapsulates a ValidPassword type that simply performs runtime character tests on a string.

I want to stress that this approach is making a trade off (as I earlier mentioned about leaving the API forgetting information, forcing you to re-parse). However, this puts this design somewhere in the middle of the spectrum. At one extreme end we have primitive obsession and shotgun parsing everywhere, with this we push the parsing into a sane place and try and hold on to these parsed values as long as possible, and at the extreme end we need dependent types or sophisticated encodings where the value carries a lot more information (and here we get towards propositions as types)

replies(2): >>35056482 #>>35064605 #

crabbone ◴[07 Mar 23 15:36 UTC] No.35056482[source]▶

>>35054640 #

Your type doesn't describe prime numbers. You just named it "prime number", but there's no proof or any other guarantee that it's a prime number.

> People get too caught up in thinking that the type _has_ to express intricate properties

Where do you get this from? Did you even read what you are replying to? I never said anything like that... What I'm saying is that the approach taken by OP is worthless when it comes to real-life uses of validation.

So, continuing with your example: you will either end up doing validation instead of parsing (i.e. you will implement parsePrime validator function), or your will not actually validate that your input is a prime number... The whole point OP was trying to make is that they wanted to capture the constraints on data in a type describing those constraints, but outside of trivial examples s.a. non-empty list, that's used by OP, that leads to programs that are either impossible or are extremely complex.

> One can quite easily create a closed module that encapsulates a ValidPassword

And, again, that would be doing _validation_ not parsing. I'm not sure if you even understand what the conflict here is, or are you somehow agreeing with me w/o saying so?

replies(1): >>35058705 #

chowells ◴[07 Mar 23 18:02 UTC] No.35058705[source]▶

>>35056482 #

If you can't create a value of type PrimeNumber that doesn't contain a prime number, there's a bit more to it than naming. Not all type-level guarantees need to come from structural properties of the type. They can also come from structural properties of the environment of the type. Providing no public constructor is such a property.

The example was written rather badly, though. It should have pointed out that the module was exporting the type and a couple helper functions, but not the data constructor.

But despite that, the key point was correct. Validating is examining a piece of data and returning "good" or "bad". Parsing is returning a new piece of data which encodes the goodness property at the type level, or failing to return anything. It's a better paradigm because the language prevents you from forgetting what situation you're in.

replies(2): >>35059113 #>>35060307 #

crabbone ◴[07 Mar 23 19:54 UTC] No.35060307[source]▶

>>35058705 #

> If you can't create a value of type PrimeNumber that doesn't contain a prime number

Because you wrote a validation function, the exact thing OP told you not to do. Hooray?!

The goal of OP was to create a type that incorporates constraints on data, just like in their example about the non-empty list they created a type that in the type itself contains the constraints s.t. it's impossible to implement this type in a way that it will have an empty list.

You did the opposite. You created a type w/o any constraints whatsoever, and then added a validation function to it to make sure you only create values validated by that function. So... you kind of proved my point: it's nigh impossible to create a program intelligible to human beings that has a "prime number" type, and that's why we use validation -- it's easy to write, easy to understand.

Your type isn't even a natural number, let alone a prime number.

replies(1): >>35061011 #

1. chowells ◴[07 Mar 23 20:49 UTC] No.35061011[source]▶

>>35060307 #

Are you aware that it's impossible to do any kind of parsing without validating the data? Saying "you have a validation function" is not some sort of disproof of parsing.

Parsing is an additional job on top of validation - providing type-level evidence that the data is good. That's what makes it valuable. It's not some theoretical difference in power. It's better software engineering.

↑