←back to thread

Parse, don't validate (2019)

(lexi-lambda.github.io)
398 points declanhaigh | 4 comments | | HN request time: 0s | source
Show context
Octokiddie ◴[] No.35055969[source]
I like how the author boils the idea down into a simple comparison between two alternative approaches to a simple task: getting the first element of a list. Two alternatives are presented: parseNonEmpty and validateNonEmpty. From the article:

> The difference lies entirely in the return type: validateNonEmpty always returns (), the type that contains no information, but parseNonEmpty returns NonEmpty a, a refinement of the input type that preserves the knowledge gained in the type system. Both of these functions check the same thing, but parseNonEmpty gives the caller access to the information it learned, while validateNonEmpty just throws it away.

This might not seem like much of a distinction, but it has far-reaching implications downstream:

> These two functions elegantly illustrate two different perspectives on the role of a static type system: validateNonEmpty obeys the typechecker well enough, but only parseNonEmpty takes full advantage of it. If you see why parseNonEmpty is preferable, you understand what I mean by the mantra “parse, don’t validate.”

parseNonEmpty is better because after a caller gets a NonEmpty it never has to check the boundary condition of empty again. The first element will always be available, and this is enforced by the compiler. Not only that, but functions the caller later calls never need to worry about the boundary condition, either.

The entire concern over the first element of an empty list (and handling the runtime errors that result from failure to meet the boundary condition) disappear as a developer concern.

replies(3): >>35056624 #>>35056955 #>>35058253 #
waynesonfire ◴[] No.35056624[source]
Does this scale? What if you have 10 other conditions?
replies(6): >>35056897 #>>35056901 #>>35057510 #>>35057732 #>>35059327 #>>35061385 #
ElevenLathe ◴[] No.35056897[source]
I think what you're getting at is that it seems ponderous to have types named things like NonEmptyListWhereTheThirdElementIsTheIntegerFourAndTheOtherElementsAreStringsOfLengthSixOrUnder and the answer is that you shouldn't do that, but instead name it something in the problem domain (of whatever the program is about) like WidgetDescription or whatever.
replies(2): >>35058402 #>>35062502 #
dwohnitmok ◴[] No.35062502{3}[source]
No the trickier problem is that without dependent types you are forced into a very specific, linear chain of validation or else deal with a combinatorial explosion of functions and types.

To take your type as an example, you could imagine a function

  validation : String -> Maybe FinalWidget
but maybe `validation` is really big and unwieldy and you want to reuse parts of it elsewhere so you break it down into a pipeline of

  -- Let's say a RawWidget is, operationally, a non-empty string
  validation0 : String -> Maybe RawWidget
  -- Let's say a RefinedWidget is a string consisting only of capital letters
  validation1 : RawWidget -> Maybe RefinedWidget
  -- A FinalWidget is a non-empty string of capital letters that has no whitespace
  validation2 : RefinedWidget -> Maybe FinalWidget
This is over-constrained. You don't really want to force yourself into a scenario where you must call validation0, then validation1, and finally validation2 because maybe in another code path it's more expedient to do it in another order. But the types don't line up if you do it in another order. And maybe you don't really care about `RawWidget` and `RefinedWidget`, but you're forced to create them just to make sure that you can build up to a `FinalWidget`.

This is where dependent types would really help relax those constraints.

replies(1): >>35062924 #
Quekid5 ◴[] No.35062924{4}[source]
I don't disagree that dependent types would help (and be really cool for lots of other uses!), but let's consider what the usual validation rules that we really need are: non-empty, basic interval constraints (non-negative/positive), only contains a certain set of characters... simple stuff like that, usually. If we're going wild, an interesting case would be effectful validation and how that fits in. In practice, what happens with any non-basic validation is that the server says 3xx, try again.

Anyway, validation/parsing is mostly pretty simple stuff where the "validate" bit is a simple function... and function composition works just fine.

(Assuming you can name the result type of your parse/validate individually according to your domain.)

replies(1): >>35064351 #
1. dwohnitmok ◴[] No.35064351{5}[source]
Without dependent types you can't do your common constraints in an order independent way.

You end up with four choices:

1. Have a single function that does all the constraint checking at once

2. Have a single linear order where each constraint check feeds into the next but only in that order

3. Acquiesce to a combinatorial explosion of functions that check every possible combination of those constraints

4. Give up keeping track of the constraints at a type level.

replies(1): >>35075015 #
2. Quekid5 ◴[] No.35075015[source]
I do think you can... just via phantom type parameters and type-level programming. In Scala you'd probably use Refined.

(But I'm not expert, admittedly, and I isn't an actual problem of much consequence in practical programming in Haskell or Scala. Opaque types do the 80% bit of 80-20 just fine.)

replies(1): >>35075984 #
3. dwohnitmok ◴[] No.35075984[source]
You can't with phantom type parameters and type-level programming alone, although you can get close. Scala's and Haskell's Refined both don't let you do what I'm thinking of.

You can get very close with type-level sets although at this point compile times probably go through the roof. You're basically emulating row types at this point.

  def wrapIntoRefined(str: String): Refined[String, Unit]

  def validate0[A](str: Refined[String, A]): Either[Error, Refined[String, And[Condition0, A]]]

  def validate1[A](str: Refined[String, A]): Either[Error, Refined[String, And[Condition1, A]]]

  // This requires ordering Condition0 before Condition1 but if we resorted 
  // to a type-level set we could get around that problem
  def process(input: Refined[String, And[Condition1, And[Condition0, Unit]]]): Unit

  // But linearity is still required in some sense. We can't e.g. do our checks
  // in a parallel fashion. You still need to pipe one function right after another
The central problem is if you have two validation functions

  def validate0(str: String): Refined[String, Condition0]

  def validate1(str: String): Refined[String, Condition1]
if you try to recombine them downstream, you don't know that `Refined[String, Condition0]` and `Refined[String, Condition1]` actually refer to the same underlying `String`. They could be refined on two completely separate strings. To tie them to a single runtime String requires dependent types.

You can approximate this in Scala with path-dependent types, but it's very brittle and breaks in all sorts of ways.

> isn't an actual problem of much consequence in practical programming in Haskell or Scala. Opaque types do the 80% bit of 80-20 just fine.

I think this is only true because there isn't a production-ready dependently typed language to show how to use these patterns effectively. In much the same way that "parse don't validate" isn't really much of a problem of consequence in older style Java code because sum types aren't really a thing, if there was an ergonomic way of taking advantage of it, I firmly believe these sorts of dependently typed tagged types would show up all over the place.

replies(1): >>35103371 #
4. Quekid5 ◴[] No.35103371{3}[source]
> I think this is only true because there isn't a production-ready dependently typed language [...]

Now this I definitely agree with. I want to see what's possible!