←back to thread

Parse, don't validate (2019)

(lexi-lambda.github.io)
398 points declanhaigh | 2 comments | | HN request time: 0s | source
Show context
bruce343434 ◴[] No.35053912[source]
Note that this basically requires your language to have ergonomic support for sum types, immutable "data classes", pattern matching.

The point is to parse the input into a structure which always upholds the predicates you care about so you don't end up continuously defensively programming in ifs and asserts.

replies(12): >>35054046 #>>35054070 #>>35054386 #>>35054514 #>>35054901 #>>35054993 #>>35055124 #>>35055230 #>>35056047 #>>35057866 #>>35058185 #>>35059271 #
lolinder ◴[] No.35058185[source]
Sum types are used here for error handling, but if your language has a different error handling convention you can and should just use that.

In Java, you'd implement this by making a class with a private constructor, no mutator methods, and a static factory method that throws an exception if the parsing fails. Since the only way to get an instance of the class is through the factory method, you've made illegal states unrepresentable and know that the class always holds to its invariants. No methods on instances of that class will throw exceptions from then on, so you've successfully applied "Parse, Don't Validate" without needing sum types.

The point of the article isn't the particular implementation in Haskell, it's the concept of pushing all data error states to the boundaries of your code, which applies anywhere as long as you translate it into the idioms of your language.

replies(2): >>35060043 #>>35079571 #
lexi-lambda ◴[] No.35060043[source]
> In Java, you'd implement this by making a class with a private constructor, no mutator methods, and a static factory method that throws an exception if the parsing fails.

This is similar, and is indeed quite useful in many cases, but it’s not quite the same. I explained why in this comment: https://news.ycombinator.com/item?id=35059886 (The comment is talking about TypeScript, but really everything there also applies to Java.)

replies(1): >>35060531 #
1. lolinder ◴[] No.35060531[source]
Thanks for the reply! I wasn't at all expecting one from you.

If I'm understanding the difference correctly, it's that the constructive data modeling approach can be proven entirely in the type system without any trust in the library code, while the Java approach I recommended depends on there being no other way to construct an instance of the class, which can be tricky to guarantee. Is that accurate?

replies(1): >>35061617 #
2. lexi-lambda ◴[] No.35061617[source]
Yes, that’s about right. But really do read the followup blog post (https://lexi-lambda.github.io/blog/2020/11/01/names-are-not-...), as it explains that in much more depth! In particular, it says:

> To some readers, these pitfalls may seem obvious, but safety holes of this sort are remarkably common in practice. This is especially true for datatypes with more sophisticated invariants, as it may not be easy to determine whether the invariants are actually upheld by the module’s implementation. Proper use of this technique demands caution and care:

> * All invariants must be made clear to maintainers of the trusted module. For simple types, such as NonEmpty, the invariant is self-evident, but for more sophisticated types, comments are not optional.

> * Every change to the trusted module must be carefully audited to ensure it does not somehow weaken the desired invariants.

> * Discipline is needed to resist the temptation to add unsafe trapdoors that allow compromising the invariants if used incorrectly.

> * Periodic refactoring may be needed to ensure the trusted surface area remains small. It is all too easy for the responsibility of the trusted module to accumulate over time, dramatically increasing the likelihood of some subtle interaction causing an invariant violation.

> In contrast, datatypes that are correct by construction suffer none of these problems. The invariant cannot be violated without changing the datatype definition itself, which has rippling effects throughout the rest of the program to make the consequences immediately clear. Discipline on the part of the programmer is unnecessary, as the typechecker enforces the invariants automatically. There is no “trusted code” for such datatypes, since all parts of the program are equally beholden to the datatype-mandated constraints.

They are both quite useful techniques, but it’s important to understand what you’re getting (and, perhaps more importantly, what you’re not).