←back to thread

Parse, don't validate (2019)

(lexi-lambda.github.io)
398 points declanhaigh | 3 comments | | HN request time: 0s | source
Show context
bruce343434 ◴[] No.35053912[source]
Note that this basically requires your language to have ergonomic support for sum types, immutable "data classes", pattern matching.

The point is to parse the input into a structure which always upholds the predicates you care about so you don't end up continuously defensively programming in ifs and asserts.

replies(12): >>35054046 #>>35054070 #>>35054386 #>>35054514 #>>35054901 #>>35054993 #>>35055124 #>>35055230 #>>35056047 #>>35057866 #>>35058185 #>>35059271 #
crabbone ◴[] No.35054514[source]
It's not just about these limitations.

In order to be useful, type systems need to be simple, but there's no such restrictions on rules that govern our expectations of data correctness.

OP is delusional if they think that their approach can be made practical. I mean, what if the expectation from the data that an value is a prime number? -- How are they going to encode this in their type systems? And this is just a trivial example.

There are plenty of useful constraints we routinely expect in message exchanges that aren't possible to implement using even very elaborate type systems. For example, if we want to ensure that all ids in XML nodes are unique. Or that the last digit of SSN is a checksum of the previous digits using some complex formula. I mean, every Web developer worth their salt knows that regular expressions are a bad idea for testing email addresses (which would be an example of parsing), and it's really preferable to validate emails by calling a number of predicates on them.

And, of course, these aren't the only examples: password validation (the annoying part that asks for capital letter, digit, special character? -- I want to see the author implement a parser to parse possible inputs to password field, while also giving helpful error messages s.a. "you forgot to use a digit"). Even though I don't doubt it's possible to do that, the resulting code would be an abomination compared to the code that does the usual stuff, i.e. just checks if a character is in a set of characters.

replies(10): >>35054557 #>>35054562 #>>35054640 #>>35054916 #>>35054920 #>>35055046 #>>35055734 #>>35055902 #>>35056302 #>>35057473 #
PartiallyTyped ◴[] No.35054916[source]
> I mean, what if the expectation from the data that an value is a prime number? -- How are they going to encode this in their type systems? And this is just a trivial example.

In TypeScript we can define

    type Prime = number

    function isPrime(value: number) value is Prime {
        // run sieve
    }
From here, you may have e.g.

    function foo(value: Prime, ...) {

    }
And it will be typed checked.

    function fooOrFail(v: number) {
        if (isPrime(v))
            foo(v)
        else 
            throw new TypeError()
    }
replies(2): >>35055148 #>>35056821 #
crabbone ◴[] No.35056821[source]
You are doing validation! Your type has no properties of a type of prime numbers. You either didn't read the article in OP, or didn't understand what OP was arguing for.

Yes, it's fine, if you want to validate your input in this way -- I have no problems with it. It's just that you are doing validation, not parsing, at least not in the terms OP used them.

replies(1): >>35057766 #
PartiallyTyped ◴[] No.35057766[source]
The argument of OP is that you should first construct your input such that it adheres to a very specific type that contains all the information that you require, e.g. nonEmpty, and then allow that to go through the rest of your code.

Am I mistaken?

My mistake in the above snippets is precisely that TypeScript can not make the type more specific, i.e. Number to Prime, because `type Prime=number` is only creating an alias. I am not creating a type that is a more specific version of number but an alias.

Had I actually created a proper type, the parsing would have been correct. The parsing component is happening in the outer function because at some point I need to make the generic input more specific, and then allow it to flow through the rest of the program. Am I mistaken?

replies(2): >>35059519 #>>35060995 #
lexi-lambda ◴[] No.35059519{3}[source]
You are sort of mistaken. I wrote a followup blog post that discusses what you are describing at some length: https://lexi-lambda.github.io/blog/2020/11/01/names-are-not-...

However, TypeScript does not really provide any facility for nominal types, which in my opinion is something of a failure of the language, especially considering that it is at odds with the semantics of `class` and `instanceof` in dynamically-typed JavaScript (which have generative semantics). Other statically typed languages generally provide some form of nominal typing, even gradually typed ones. Flow even provided nominal types in JavaScript! But TypeScript is generally also quite unsound (https://twitter.com/lexi_lambda/status/1621973087192236038), so the type system doesn’t really provide any guarantees, anyway.

That said, TypeScript programmers have developed a way to emulate nominal typing using “brands”, which does allow you to obtain some of these benefits within the limitations of TS’s type system. You can search for “TypeScript branded types” to find some explanations.

replies(1): >>35060390 #
PartiallyTyped ◴[] No.35060390{4}[source]
This is fantastic, thank you very much!

When I wrote GP I had in mind branding as the "right" way to get those benefits - though I was unaware of the name - however I see that it is still limited by TS' compiler's limitations.

So then going back to the initial snippets, my issue is that the Prime type is essentially behaving like newtype, thus the inner calls can not actually rely on the value actually being prime, yes?

I have to admit that quite a few of the things in the blog are beyond my current understanding. Do you have any recommended reading for post grads with rudimentary understanding of Haskell who would like to get deeper into type systems?

replies(1): >>35060621 #
lexi-lambda ◴[] No.35060621{5}[source]
Haskell’s `newtype` keyword defines a genuinely fresh (nominal) type that is distinct from all other types. There is no direct analogue in TypeScript, but using branded types would be the closest you could get. That’s still not quite the same because TypeScript doesn’t really allow the same strong encapsulation guarantees that Haskell provides (which, to be clear, many other languages provide as well!), but it’s a decent approximation.

The problem with your `Prime` type is that it is just a type alias: a new way to refer to the exact same type. It’s totally interchangeable with `number`, so any `number` is necessarily also a `Prime`… which is obviously not very helpful. (As it happens, the Haskell equivalent of that would be basically identical, since Haskell also uses the `type` keyword to declare a type alias.)

As for recommended reading, it depends on what you’d like to know, really. There are lots of different perspectives on type systems, and there’s certainly a lot of stuff you can learn if you want to! But I think most working programmers probably don’t benefit terribly much from the theory (though it can certainly be interesting if you’re into that sort of thing). Perhaps you could tell me which things you specifically find difficult to understand? That would make it easier for me to provide suggestions, and it would also be useful to me, as I do try to make my blog posts as accessible as possible!

replies(1): >>35061206 #
PartiallyTyped ◴[] No.35061206{6}[source]
I'd be interested in more about Generic as that section went over my head though it's not an issue with your blog but rather inept knowledge on my part.

I find it quite interesting though I never had the time to study it further until now, so any recommendations are appreciated!

replies(1): >>35061524 #
lexi-lambda ◴[] No.35061524{7}[source]
Generic is quite specific to Haskell, so it is probably difficult to explain without a little more understanding of Haskell-like type systems. (Rust has some similar capabilities, so that would help, too.) I wouldn’t worry about it too much, though; it doesn’t contain any particularly deep knowledge about type systems in general.
replies(1): >>35061742 #
1. PartiallyTyped ◴[] No.35061742{8}[source]
Okay, is there like a book or some other resource besides your awesome blog that you'd recommend for people looking to get into this some more?
replies(1): >>35068057 #
2. lexi-lambda ◴[] No.35068057[source]
Well, like I said, the subject is extremely broad, so it is difficult to give concrete suggestions without knowing what specifically you’d like to get into. But I can give some potential options.

If you’d like to learn Haskell, I think https://www.cis.upenn.edu/~cis1940/spring13/ is still a pretty nice resource. It is quick and to the point, and it provides some exercises to work through. There are lots of things in the Haskell ecosystem that you could explore if you wanted to after getting a handle on the basics.

If you want to learn about programming languages and type systems, you could read Programming Languages: Application and Interpretation (https://cs.brown.edu/courses/cs173/2012/book/), which has a chapter on type systems. Alternatively, if you want a more thorough treatment of type systems, you could read Types and Programming Languages by Benjamin Pierce. However, both PLAI and TAPL are textbooks, and they are primarily intended to be used as supporting material in a university course with an instructor. I think PLAI is relatively accessible, but TAPL is more likely to be a challenge without some existing background in programming languages.

replies(1): >>35069344 #
3. PartiallyTyped ◴[] No.35069344[source]
The textbooks are exactly what I needed! Thank you!