Parse, don't validate (2019)

1. jameshart ◴[07 Mar 23 13:24 UTC] No.35055031[source]▶

This post absolutely captures an essential truth of good programming.

Unfortunately, it conceals it behind some examples that - while they do a good job of illustrating the generality of its applicability - don’t show as well how to use this in your own code.

Most developers are not writing their own head method on the list primitive - they are trying to build a type that encapsulates a meaningful entity in their own domain. And, let’s be honest, most developers are also not using Haskell.

As a result I have not found this article a good one to share with junior developers to help them understand how to design types to capture the notion of validity, and to replace validation with narrowing type conversions (which amount to ‘parsing’ when the original type is something very loose like a string, a JSON blob, or a dictionary).

Even though absolutely those practices follow from what is described here.

Does anyone know of a good resource that better anchors these concepts in practical examples?

replies(3): >>35056114 #>>35058281 #>>35059886 #

2. asimpletune ◴[07 Mar 23 15:05 UTC] No.35056114[source]▶

>>35055031 (TP) #

I think it's really hard to learn from reading unfortunately. It's one of those things where if you get it, you get it, but it kind of takes personal experience to fully grok it. I guess because there are a lot of subtle differences.

3. epolanski ◴[07 Mar 23 17:34 UTC] No.35058281[source]▶

>>35055031 (TP) #

> And, let’s be honest, most developers are also not using Haskell.

Everything in that post applies to the most common programming language out there: TypeScript.

And several popular others such as Rust, Kotlin or Scala.

replies(4): >>35058565 #>>35058974 #>>35059010 #>>35059347 #

4. elfprince13 ◴[07 Mar 23 17:53 UTC] No.35058565[source]▶

>>35058281 #

It also applies to C++ and Java!

5. tialaramex ◴[07 Mar 23 18:20 UTC] No.35058974[source]▶

>>35058281 #

And parse-don't-validate is often very nice to work with in Rust, I can describe how to turn some UTF-8 text into my type Foo in a function:

  impl std::str::FromStr for Foo {
    type Err = ReasonsItIsNotAFoo;
    fn from_str(s: &str) -> Result<Self, Self::Err> {
        /* etc. */
    }
  }

And then whenever I've got a string which I know ought to be a Foo, I can:

  let foo: Foo = string.parse().expect("This {string:?} ought to be a Foo but it isn't");

Since we said foo is a Foo, by inference the parsing of string needs to either succeed with a Foo, or fail while trying, so it calls that FromStr implementation we wrote earlier to achieve that.

6. jakear ◴[07 Mar 23 18:23 UTC] No.35059010[source]▶

>>35058281 #

Not quite, TypeScript provides a options beyond what the author of this article details that IMO are superior, at least in some cases. Instead of just "throw an error or return ()" or "throw an error or return NonEmpty<T>", you can declare a function's return type as "throws iff the argument isn't NonEmpty" or "true iff the argument is NonEmpty".

Compare:

    function validateNonEmpty<T>(list: T[]): void {
      if (list[0] === undefined) 
        throw Error("list cannot be empty")
    }

    function parseNonEmpty<T>(list: T[]): [T, ...T[]] {
      if (list[0] !== undefined) {
        return list as [T, ...T[]]
      } else {
        throw Error("list cannot be empty")
      }
    }

    function assertNonEmpty<T>(list: T[]): asserts list is [T, ...T[]] {
      if (list[0] === undefined) throw Error("list cannot be empty")
    }

    function checkEmptiness<T>(list: T[]): list is [T, ...T[]] {
      return list[0] !== undefined
    }

    declare const arr: number[]

    // Error: Object is possibly undefined
    console.log(arr[0].toLocaleString())

    const parsed = parseNonEmpty(arr)
    // No error
    console.log(parsed[0].toLocaleString())

    if (checkEmptiness(arr)) {
      // No error
      console.log(arr[0].toLocaleString())
    }

    assertNonEmpty(arr)
    // No error
    console.log(arr[0].toLocaleString())

For me the `${arg} is ${type}` approach is superior as you are writing the validation once and can pass the precise mechanism for handling of the error to the caller, who tends to have a better idea of what to do in degenerate cases (sometimes throwing a full on Exception is appropriate, but sometimes a different form of recovery is better).

replies(2): >>35059948 #>>35070563 #

7. jameshart ◴[07 Mar 23 18:46 UTC] No.35059347[source]▶

>>35058281 #

Absolutely - the advice is highly applicable in most modern widely used languages.

My point was merely that the examples being presented in Haskell - and in the context of talking about lists in a very functional, lispy cons-ish kind of way, makes it less accessible for programmers who are using more object-oriented type systems.

8. lexi-lambda ◴[07 Mar 23 19:21 UTC] No.35059886[source]▶

>>35055031 (TP) #

> As a result I have not found this article a good one to share with junior developers to help them understand how to design types to capture the notion of validity, and to replace validation with narrowing type conversions (which amount to ‘parsing’ when the original type is something very loose like a string, a JSON blob, or a dictionary).

This is sort of true. It is a good technique, but it is a different technique. I went into how it is different in quite some detail in this followup blog post: https://lexi-lambda.github.io/blog/2020/11/01/names-are-not-...

I think a common belief among programmers is that the true constructive modeling approach presented in the first blog post is not practical in languages that aren’t Haskell, so they do the “smart constructor” approach discussed in the link above instead. However, I think that isn’t actually true, it’s just a difference in how respective communities think about their type systems. In fact, you can definitely do constructive data modeling in other type systems, and I gave some examples using TypeScript in this blog post: https://lexi-lambda.github.io/blog/2020/08/13/types-as-axiom...

replies(1): >>35061090 #

9. lexi-lambda ◴[07 Mar 23 19:26 UTC] No.35059948{3}[source]▶

>>35059010 #

There is really no difference between doing this and returning a `Maybe`, which is the standard Haskell pattern, except that the `Maybe` result also allows the result to be structurally different rather than simply a refinement of the input type. In a sense, the TypeScript approach is a convenience feature that allows you to write a validation function that returns `Bool`, which normally erases the gained information, yet still preserve the information in the type system.

This is quite nice in situations where the type system already supports the refinement in question (which is true for this NonEmpty example), but it stops working as soon as you need to do something more complicated. I think sometimes programmers using languages where the TS-style approach is idiomatic can get a little hung up on that, since in those cases, they are more likely to blame the type system for being “insufficiently powerful” when in fact it’s just that the convenience feature isn’t sufficient in that particular case. I presented an example of one such situation in this followup blog post: https://lexi-lambda.github.io/blog/2020/08/13/types-as-axiom...

replies(1): >>35070550 #

10. jameshart ◴[07 Mar 23 20:56 UTC] No.35061090[source]▶

>>35059886 #

Thanks for responding - just to reiterate, I am a big fan of this original post, and indeed your other writing - my only critique here is that I'm looking for ways to make the insights in them more transparent to, particularly, people who aren't well-positioned to analogize how to apply Haskell concepts to other languages.

I see you read 'narrowing type conversions' rather literally in my statement - that might be my making my own analogy that doesn't go over very well. I literally mean using 'constructive modeled types' is a way to create true type-narrowing conversions, in the sense that a 'nonempty list' is a narrower type than 'list', or 'one to five' is a narrower type than 'int'.

replies(1): >>35067979 #

11. lexi-lambda ◴[08 Mar 23 12:01 UTC] No.35067979{3}[source]▶

>>35061090 #

You know what, you’re right—I misread your original comment. I was just going through this thread and replying to a number of comments making that particular misconception, since it is particularly common, but upon taking a closer look, you were saying something else. I apologize!

As for the difficulty in applying these ideas in other languages, I am sympathetic. The problem I always run into is that there is necessarily a tension between (a) presentations that are accessible to working programmers, (b) explanations that distill the essential ideas so they aren’t coupled to particular languages or language features, and (c) examples small enough to be clarifying and to fit in a blog post. Haskell is certainly not the best choice along that first axis, but it is quite exceptionally good along the second two.

For a somewhat concrete example of what I mean, see this comment I wrote a few years ago that translates the NonEmpty example into Java: https://news.ycombinator.com/item?id=21478322 I think the added verbosity and added machinery really does detract significantly from understanding. Meanwhile, a TypeScript translation would make a definition like this one quite tempting:

    type NonEmpty<T> = [T, ...T[]]

However, I find this actually obscures application of the technique because it doesn’t scale to more complex examples (for the reasons I discussed at quite some length in https://lexi-lambda.github.io/blog/2020/08/13/types-as-axiom...).

There are probably ways to thread this needle, but I don’t think any one “solution” is by any means obviously the best. I think the ways that other people have adapted the ideas to their respective ecosystems is probably a decent compromise.

12. epolanski ◴[08 Mar 23 15:54 UTC] No.35070550{4}[source]▶

>>35059948 #

Hi lexi.

Just wanted to say that fp-ts (now effect-ts, a ZIO port to TypeScript) author Giulio Canti is a great fan of your "parse don't validate" article. He's linked it many times in the TypeScript and functional programming channels (such as the fp slack).

Needless to say, both fp-ts-derived io-ts library and effect-ts library schema[1] are obviously quite advanced parsers (and in case of schema, there's decoding, encoding, APIs, guard, arbitrary and many other nice things I haven't seen in any functional language).

[1]https://github.com/Effect-TS/schema

13. epolanski ◴[08 Mar 23 15:55 UTC] No.35070563{3}[source]▶

>>35059010 #

You can also simply parse with a type guard in typescript.

Or do something more advanced like implement Decoders/Encoders.