Most active commenters

    ←back to thread

    Parse, don't validate (2019)

    (lexi-lambda.github.io)
    398 points declanhaigh | 11 comments | | HN request time: 0.001s | source | bottom
    Show context
    Octokiddie ◴[] No.35055969[source]
    I like how the author boils the idea down into a simple comparison between two alternative approaches to a simple task: getting the first element of a list. Two alternatives are presented: parseNonEmpty and validateNonEmpty. From the article:

    > The difference lies entirely in the return type: validateNonEmpty always returns (), the type that contains no information, but parseNonEmpty returns NonEmpty a, a refinement of the input type that preserves the knowledge gained in the type system. Both of these functions check the same thing, but parseNonEmpty gives the caller access to the information it learned, while validateNonEmpty just throws it away.

    This might not seem like much of a distinction, but it has far-reaching implications downstream:

    > These two functions elegantly illustrate two different perspectives on the role of a static type system: validateNonEmpty obeys the typechecker well enough, but only parseNonEmpty takes full advantage of it. If you see why parseNonEmpty is preferable, you understand what I mean by the mantra “parse, don’t validate.”

    parseNonEmpty is better because after a caller gets a NonEmpty it never has to check the boundary condition of empty again. The first element will always be available, and this is enforced by the compiler. Not only that, but functions the caller later calls never need to worry about the boundary condition, either.

    The entire concern over the first element of an empty list (and handling the runtime errors that result from failure to meet the boundary condition) disappear as a developer concern.

    replies(3): >>35056624 #>>35056955 #>>35058253 #
    waynesonfire ◴[] No.35056624[source]
    Does this scale? What if you have 10 other conditions?
    replies(6): >>35056897 #>>35056901 #>>35057510 #>>35057732 #>>35059327 #>>35061385 #
    1. dangets ◴[] No.35056901[source]
    I would think stacking 10 generic conditions wouldn't scale if you are trying to mix and match arbitrarily. If you are trying to mix NonEmpty and AllEven and AllGreaterThan100 for the List example, then you would get the combinatorial explosion of types.

    In practice, I find it is usually something like 'UnvalidatedCustomerInfo' being parsed into a 'CustomerInfo' object, where you can validate all of the fields at the same time (phone number, age, non-null name etc.). Once you have parsed it into the internal 'CustomerInfo' type - you don't need to keep re-validating the expected invariants everywhere this type is used. There was a good video on this that I wish I could find again, where the presenter gave the example of using String for a telephoneNumber field instead of a dedicated type. Any code that used the telephoneNumber field would have to re-parse and validate it "to be safe" it was in the expected format.

    The topic of having untrusted external types and internal trusted types is also explained in the book `Domain Modeling Made Functional` which I highly recommend.

    replies(4): >>35057468 #>>35058092 #>>35059831 #>>35063117 #
    2. hansvm ◴[] No.35057468[source]
    In Zig at least, the types-as-values framework means it'd be pretty easy to support arbitrary mixing and matching. If nobody beats me to it, I'll make a library to remove the boilerplate for that idea this weekend.
    replies(1): >>35105217 #
    3. JamesSwift ◴[] No.35058092[source]
    You would also run into a form of "type erasure" in most cases and would need a combinatorial explosion of methods unless the type system was able to 'decorate' the generic parameter with additional information.

    e.g. imagine a `parseNonEmpty` and a `parseAllEven` method. Both take a list and return either a `NonEmpty` or `AllEven` type. If I call `parseNonEmpty` and get a `NonEmpty`, then pass to `parseAllEven`, I now have erased the `NonEmpty` as I'm left with an `AllEven`. I would need a `parseAllEvenFromNonEmpty` to take a `NonEmpty` and return a `AllEven & NonEmpty`.

    4. lexi-lambda ◴[] No.35059831[source]
    > If you are trying to mix NonEmpty and AllEven and AllGreaterThan100 for the List example, then you would get the combinatorial explosion of types.

    This is overthinking it. Usually, when people are not used to doing constructive data modeling, they get caught up on this idea that they need to have a datatype that represents their data in some canonical representation. If you need a type that represents an even number, then clearly you must define a type that is an ordinary integer, but rules out all odd numbers, right?

    Except you don’t have to do that! If you need a number to always be even (for some reason), that suggests you are storing the wrong thing. Instead, store half that number (e.g. store a radius instead of a diameter). Now all integers are legal values, and you don’t need a separate type. Similarly, if you want to store an even number greater than 100, then use a natural number type (i.e. a type that only allows non-negative integers; Haskell calls this type `Natural`) and store half that number minus 102. This means that, for example 0 represents 102, 1 represents 104, 2 represents 106, 3 represents 108, etc.

    If you think this way, then there is no need to introduce a million new types for every little concept. You’re just distilling out the information you actually need. Of course, if this turns out to be a really important concept in your domain, then you can always add a wrapper type to make the distinction more formal:

        newtype EvenGreaterThan100 = EvenGreaterThan100 Natural
    
        evenGreaterThan100ToInteger :: EvenGreaterThan100 -> Integer
        evenGreaterThan100ToInteger (EvenGreaterThan100 n) = (toInteger n * 2) + 102
    
        integerToEvenGreaterThan100 :: Integer -> Maybe EvenGreaterThan100
        integerToEvenGreaterThan100 n
          | n < 100 = Nothing
          | otherwise = case n `quotRem` 2 of
              (q, 0) -> Just (EvenGreaterThan100 q)
              (_, _) -> Nothing
    
    Of course, this type seems completely ridiculous like this, and it is. But that’s because no real program needs “an even number greater than one hundred”. That’s just a random bag of arbitrary constraints! A real type would correspond to a domain concept, which would have a more useful name and a more useful API, anyway.

    I wrote a followup blog post here that goes into more detail about this style of data modeling, with a few more examples: https://lexi-lambda.github.io/blog/2020/08/13/types-as-axiom...

    replies(1): >>35062414 #
    5. esrauch ◴[] No.35062414[source]
    What if you want it to only be less that 100 though? Not everything is so easily expressable the way you're saying.
    replies(2): >>35062811 #>>35068165 #
    6. Quekid5 ◴[] No.35062811{3}[source]
    You can go as far (or as short) as the application warrants. The more static evidence you want, the more cumbersome it's going to become. That seems sort of a natural progression, but the the point is that you get to choose the trade-off. It's often easier to prove a special case than a more general case.

    (Certainly, Haskell is probably not the most concise language for this kind of thing. LiquidHaskell adds interesting proof capabilities wrt. arithmetic.)

    Regardless, even just parsing at the boundary and using an opaque type

        MySuperSpecialInt = Int @Even Min(2) Max(100)
    
    (or whatever syntax you want) is still better than just using Int. At least you'll know that you'll always be handed a value that's in range (post-parsing).
    7. ambicapter ◴[] No.35063117[source]
    > In practice, I find it is usually something like 'UnvalidatedCustomerInfo' being parsed into a 'CustomerInfo' object, where you can validate all of the fields at the same time

    In my experience you usually can't validate them all at the same time. For example, address. You usually don't validate that until after the customer has selected items, and then you find out that some items won't deliver to their area, so whereas you previously had a Valid basket, now it's an Invalid state.

    replies(2): >>35063216 #>>35155316 #
    8. initplus ◴[] No.35063216[source]
    Whether an item can be delivered to a particular address seems like a seperate concern to whether the address is correct/real.
    9. lexi-lambda ◴[] No.35068165{3}[source]
    Yes, certainly. Constructive data modeling is useful for many things, but it’s not a panacea. Other techniques are useful in cases where it’s not practical; I discuss some of the tradeoffs in this followup post: https://lexi-lambda.github.io/blog/2020/11/01/names-are-not-...
    10. hansvm ◴[] No.35105217[source]
    https://github.com/hmusgrave/pdv
    11. przemo_li ◴[] No.35155316[source]
    Valid basket does not imply valid order.

    Sometimes just splitting stuff into more types that are appropriate to specific portion of pipeline is a win.