←back to thread

Parse, don't validate (2019)

(lexi-lambda.github.io)
398 points declanhaigh | 2 comments | | HN request time: 0s | source
Show context
bruce343434 ◴[] No.35053912[source]
Note that this basically requires your language to have ergonomic support for sum types, immutable "data classes", pattern matching.

The point is to parse the input into a structure which always upholds the predicates you care about so you don't end up continuously defensively programming in ifs and asserts.

replies(12): >>35054046 #>>35054070 #>>35054386 #>>35054514 #>>35054901 #>>35054993 #>>35055124 #>>35055230 #>>35056047 #>>35057866 #>>35058185 #>>35059271 #
crabbone ◴[] No.35054514[source]
It's not just about these limitations.

In order to be useful, type systems need to be simple, but there's no such restrictions on rules that govern our expectations of data correctness.

OP is delusional if they think that their approach can be made practical. I mean, what if the expectation from the data that an value is a prime number? -- How are they going to encode this in their type systems? And this is just a trivial example.

There are plenty of useful constraints we routinely expect in message exchanges that aren't possible to implement using even very elaborate type systems. For example, if we want to ensure that all ids in XML nodes are unique. Or that the last digit of SSN is a checksum of the previous digits using some complex formula. I mean, every Web developer worth their salt knows that regular expressions are a bad idea for testing email addresses (which would be an example of parsing), and it's really preferable to validate emails by calling a number of predicates on them.

And, of course, these aren't the only examples: password validation (the annoying part that asks for capital letter, digit, special character? -- I want to see the author implement a parser to parse possible inputs to password field, while also giving helpful error messages s.a. "you forgot to use a digit"). Even though I don't doubt it's possible to do that, the resulting code would be an abomination compared to the code that does the usual stuff, i.e. just checks if a character is in a set of characters.

replies(10): >>35054557 #>>35054562 #>>35054640 #>>35054916 #>>35054920 #>>35055046 #>>35055734 #>>35055902 #>>35056302 #>>35057473 #
PartiallyTyped ◴[] No.35054916[source]
> I mean, what if the expectation from the data that an value is a prime number? -- How are they going to encode this in their type systems? And this is just a trivial example.

In TypeScript we can define

    type Prime = number

    function isPrime(value: number) value is Prime {
        // run sieve
    }
From here, you may have e.g.

    function foo(value: Prime, ...) {

    }
And it will be typed checked.

    function fooOrFail(v: number) {
        if (isPrime(v))
            foo(v)
        else 
            throw new TypeError()
    }
replies(2): >>35055148 #>>35056821 #
crabbone ◴[] No.35056821[source]
You are doing validation! Your type has no properties of a type of prime numbers. You either didn't read the article in OP, or didn't understand what OP was arguing for.

Yes, it's fine, if you want to validate your input in this way -- I have no problems with it. It's just that you are doing validation, not parsing, at least not in the terms OP used them.

replies(1): >>35057766 #
PartiallyTyped ◴[] No.35057766[source]
The argument of OP is that you should first construct your input such that it adheres to a very specific type that contains all the information that you require, e.g. nonEmpty, and then allow that to go through the rest of your code.

Am I mistaken?

My mistake in the above snippets is precisely that TypeScript can not make the type more specific, i.e. Number to Prime, because `type Prime=number` is only creating an alias. I am not creating a type that is a more specific version of number but an alias.

Had I actually created a proper type, the parsing would have been correct. The parsing component is happening in the outer function because at some point I need to make the generic input more specific, and then allow it to flow through the rest of the program. Am I mistaken?

replies(2): >>35059519 #>>35060995 #
1. crabbone ◴[] No.35060995[source]
> first construct your input

My man... if I, the author of my program, was constructing the input, I wouldn't need no validation. Input isn't meant to be constructed by the program's author, it's supposed to be processed...

replies(1): >>35061690 #
2. PartiallyTyped ◴[] No.35061690[source]
Construct was not the correct word. The intention was to express that you do need to parse the object into something more specific that captures the properties that you require.

Take for example APIGatewayProxyEvent [1], which has a property `queryStringParameters` with type:

    export interface APIGatewayProxyEventQueryStringParameters {
        [name: string]: string | undefined;
    }
You can then create a branded type like

    type AuthCodeEvent = APIGatewayProxyEvent & {
         queryStringParameters: {
             code: string;
             state: string;
         };
    };
The branded type here means that as soon as you verify that the event has that structure above, and you can assume that it is correct in the code handles these specific cases.

Though as the blog author mentioned in the other chain, the TS compiler is not particularly sound, so it's probably entirely possible to mess the structure and break the type without the compiler knowing about it.

[1] https://github.com/DefinitelyTyped/DefinitelyTyped/blob/mast...