←back to thread

Parse, don't validate (2019)

(lexi-lambda.github.io)
398 points declanhaigh | 1 comments | | HN request time: 0s | source
Show context
bruce343434 ◴[] No.35053912[source]
Note that this basically requires your language to have ergonomic support for sum types, immutable "data classes", pattern matching.

The point is to parse the input into a structure which always upholds the predicates you care about so you don't end up continuously defensively programming in ifs and asserts.

replies(12): >>35054046 #>>35054070 #>>35054386 #>>35054514 #>>35054901 #>>35054993 #>>35055124 #>>35055230 #>>35056047 #>>35057866 #>>35058185 #>>35059271 #
crabbone ◴[] No.35054514[source]
It's not just about these limitations.

In order to be useful, type systems need to be simple, but there's no such restrictions on rules that govern our expectations of data correctness.

OP is delusional if they think that their approach can be made practical. I mean, what if the expectation from the data that an value is a prime number? -- How are they going to encode this in their type systems? And this is just a trivial example.

There are plenty of useful constraints we routinely expect in message exchanges that aren't possible to implement using even very elaborate type systems. For example, if we want to ensure that all ids in XML nodes are unique. Or that the last digit of SSN is a checksum of the previous digits using some complex formula. I mean, every Web developer worth their salt knows that regular expressions are a bad idea for testing email addresses (which would be an example of parsing), and it's really preferable to validate emails by calling a number of predicates on them.

And, of course, these aren't the only examples: password validation (the annoying part that asks for capital letter, digit, special character? -- I want to see the author implement a parser to parse possible inputs to password field, while also giving helpful error messages s.a. "you forgot to use a digit"). Even though I don't doubt it's possible to do that, the resulting code would be an abomination compared to the code that does the usual stuff, i.e. just checks if a character is in a set of characters.

replies(10): >>35054557 #>>35054562 #>>35054640 #>>35054916 #>>35054920 #>>35055046 #>>35055734 #>>35055902 #>>35056302 #>>35057473 #
mrkeen ◴[] No.35056302[source]
> password validation (the annoying part that asks for capital letter, digit, special character? -- I want to see the author implement a parser to parse possible inputs to password field, while also giving helpful error messages s.a. "you forgot to use a digit").

This is what Applicative Functors were born to do. Here's a good article on it: https://www.baeldung.com/vavr-validation-api

Check the types:

    public Validation<Seq<String>, User> validateUser(...)
Even though it's called "validation", it's still the approach the OP recommends.

It reads as "If you have a Seq of Strings, you might be able to construct a User, or get back validation errors instead".

Contrast this with the wrong way of doing things:

    User user = new User(seqOfStrings);
    user.validate();
replies(1): >>35056528 #
crabbone ◴[] No.35056528[source]
No. it's not the approach OP recommends. And that's why it's called validation. I have no idea why would you question that. OP wants to capture constraints on data as ML-style types. But, ML-style types have very limited expressive power, and, when it comes to real-life situation are practically useless outside of the most trivial cases.
replies(1): >>35058426 #
1. mrkeen ◴[] No.35058426[source]
> No. it's not the approach OP recommends.

It absolutely is.

> I have no idea why would you question that.

I did not question [that they were different approaches], I explained, through example and counter-example, why they were the same approach. I will try again.

Alexis wrote both 'validate' and 'parse' examples in ML-style types:

    validateNonEmpty :: [a] -> IO ()            // ML-typed 'validate'

    parseNonEmpty :: [a] -> IO (NonEmpty a)     // ML-typed 'parse'
More from the article:

    The difference lies entirely in the return type: validateNonEmpty always returns (), the type that contains no information, but parseNonEmpty returns NonEmpty a, a refinement of the input type that preserves the knowledge gained in the type system. Both of these functions check the same thing, but parseNonEmpty gives the caller access to the information it learned, while validateNonEmpty just throws it away.
I chose OO-style types for my samples, because there's a large fraction of HN users who dismiss ML-ish stuff as academic, or "practically useless outside of the most trivial cases".

    // OO-typed 'validate' (my straw man)
    class User {
        // returns void aka '()' aka "the type that contains no information"
        void validateUser() throws InvalidUserEx {...}          
    }

    /* OO-typed 'parse' (as per my baeldung link)
     * "gives the caller access to the information it learned"
     * In this case it gives back MORE than just the User,
     * it also gives back 'why it went wrong', per your request above for password validation
     * (In contrast with parseNonEmpty which just throws an exception.)
     */
    class UserValidator {
        Validation<Seq<String>, User> validateUser(...) {...}   
    }
> But, ML-style types have very limited expressive power

Hindley-Milner types are a godddamned crown-jewel of computer science.