←back to thread

Parse, don't validate (2019)

(lexi-lambda.github.io)
398 points declanhaigh | 1 comments | | HN request time: 0.228s | source
Show context
Octokiddie ◴[] No.35055969[source]
I like how the author boils the idea down into a simple comparison between two alternative approaches to a simple task: getting the first element of a list. Two alternatives are presented: parseNonEmpty and validateNonEmpty. From the article:

> The difference lies entirely in the return type: validateNonEmpty always returns (), the type that contains no information, but parseNonEmpty returns NonEmpty a, a refinement of the input type that preserves the knowledge gained in the type system. Both of these functions check the same thing, but parseNonEmpty gives the caller access to the information it learned, while validateNonEmpty just throws it away.

This might not seem like much of a distinction, but it has far-reaching implications downstream:

> These two functions elegantly illustrate two different perspectives on the role of a static type system: validateNonEmpty obeys the typechecker well enough, but only parseNonEmpty takes full advantage of it. If you see why parseNonEmpty is preferable, you understand what I mean by the mantra “parse, don’t validate.”

parseNonEmpty is better because after a caller gets a NonEmpty it never has to check the boundary condition of empty again. The first element will always be available, and this is enforced by the compiler. Not only that, but functions the caller later calls never need to worry about the boundary condition, either.

The entire concern over the first element of an empty list (and handling the runtime errors that result from failure to meet the boundary condition) disappear as a developer concern.

replies(3): >>35056624 #>>35056955 #>>35058253 #
waynesonfire ◴[] No.35056624[source]
Does this scale? What if you have 10 other conditions?
replies(6): >>35056897 #>>35056901 #>>35057510 #>>35057732 #>>35059327 #>>35061385 #
1. dwohnitmok ◴[] No.35059327[source]
This is where dependent types would shine.

  function processUserInput(input: String, requirement: InputReqs[input]): Unit = ...

  type InputReqs[input] = {
    // We'll say that a Proposition takes Boolean expressions and turn them into types
    notTooLong: Proposition[length(input) < 128],
    authorIsNotBlocked: AuthorIsNotBlocked[input],
    sanitized: Sanitized[input],
    ...
  }
where you might have the following functions (which are all examples of parse don't validate)

  function checkIfAuthorIsBlocked(author: String, input: String): Maybe[AuthorIsNotBlocked[input]] = ...

  // Create a pair that contains both the sanitized String and a tag that it has been sanitized
  function sanitizeString(input: String): (output: String, Sanitized[output]) = ...
where just by types alone I know that e.g. length checking must occur after sanitization (because sanitizeString generates a new `output` that is distinct from `input`) and don't have to write down in docs somewhere that you might cause a bug if you check lengths before sanitization because maybe sanitization changes the length of the input.

Note that this is also strictly stronger than a simple precondition/postcondition system or some sort of assertion system because properties of the input that we care about may not be observable at runtime/from the input alone (e.g. AuthorIsNotBlocked can't be asserted based only on input: you'd have to change the runtime representation of input to include that information).