Stop writing CLI validation. Parse it right the first time

(hackers.pub)

203 points dahlia | 2 comments | 06 Sep 25 18:20 UTC | HN request time: 0.43s | source

Show context

jmull ◴[06 Sep 25 22:14 UTC] No.45153373[source]▶

> Think about it. When you get JSON from an API, you don't just parse it as any and then write a bunch of if-statements. You use something like Zod to parse it directly into the shape you want. Invalid data? The parser rejects it. Done.

Isn’t writing code and using zod the same thing? The difference being who wrote the code.

Of course, you hope zod is robust, tested, supported, extensible, and has docs so you can understand how to express your domain in terms it can help you with. And you hope you don’t have to spend too much time migrating as zod’s api changes.

replies(4): >>45154508 #>>45154791 #>>45155254 #>>45156821 #

bigstrat2003 ◴[07 Sep 25 02:16 UTC] No.45154791[source]▶

>>45153373 #

Yeah, the "parse, don't validate" advice seems vacuous to me because of this. Someone is doing that validation. I think the advice would perhaps be phrased better as "try to not reimplement popular libraries when you could just use them".

replies(6): >>45154814 #>>45155095 #>>45155795 #>>45156024 #>>45163088 #>>45177851 #

lock1 ◴[07 Sep 25 06:04 UTC] No.45155795[source]▶

>>45154791 #

When I first saw "Parse, don't validate" title, it struck me as a catchy but perhaps unnecessarily clever catchphrase. It's catchy, yes, but it felt too ambiguous to be meaningful for anyone outside of the target audience (Haskellers in this case).

That said, I fully agree with the article content itself. It basically just boils down to:

When you create a program, eventually you'll need to process & check whether input data is valid or not. In C-like language, you have 2 options

  void validate(struct Data d);

  struct ValidatedData;
  ValidatedData validate(struct Data d);

"Parse, don't validate" is just trying to say don't do `void validate(struct Data d)` (procedure with `void`), but do `ValidatedData validate(struct Data d)` (function returning `ValidatedData`) instead.

It doesn't mean you need to explicitly create or name everything as a "parser". It also doesn't mean "don't validate" either; in `ValidatedData validate(struct Data d)` you'll eventually have "validation" logic similar to the procedure `void` counterpart.

Specifically, the article tries to teach folks to utilize the type system to their advantage. Rather than praying to never forget invoking `validate(d)` on every single call site, make the type signature only accept `ValidatedData` type so the compiler will complain loudly if future maintainers try to shove `Data` type to it. This strategy offloads the mental burden of remembering things from the dev to the compiler.

I'm not exactly sure why the "Parse, don't validate" catchphrase keeps getting reused in other language communities. It's not clear to non-FP community what the distinction between "parser" and "validate", let alone "parser combinator". Yet somehow other articles keep reusing this same catchphrase.

replies(2): >>45158854 #>>45159046 #

1. andreygrehov ◴[07 Sep 25 15:28 UTC] No.45159046[source]▶

>>45155795 #

What is ValidatedData? A subset of the Data that is valid? This makes no sense to me. The way I see it is you use ‘validate’ when the format of the data you are validating is the exact same format you are gonna be working with right after, meaning the return type doesn’t matter. The return type implies transformation – a write operation per se, whereas validation is always a read operation only.

replies(1): >>45161636 #

2. lock1 ◴[07 Sep 25 20:01 UTC] No.45161636[source]▶

>>45159046 (TP) #

  > What is ValidatedData? A subset of the Data that is valid?

Usually, but not necessarily. `validate()` might add some additional information too, for example: `validationTime`.

More often than not, in a real case of applying algebraic data type & "Parse, don't validate", it's something like `Option<ValidatedData>` or `Result<ValidatedData,PossibleValidationError>`, borrowing Rust's names. `Option` & `Result` expand the possible return values that function can return to cover the possibility of failure in the validation process, but it's independent from possible values that `ValidatedData` itself can contain.

  > The way I see it is you use ‘validate’ when the format of the data you are validating is the exact same format you are gonna be working with right after, meaning the return type doesn’t matter.

The main point of "Parse, don't validate" is to distinguish between "machine-level data representation" vs "possible set of values" of a type and utilize this "possible set of values" property.

Your "the exact same format" point is correct; oftentimes, the underlying data representation of a type is exactly the same between pre- & post-validation. But more often than not "possible set of values" of `ValidatedData` is a subset of `Data`. These 2 different "possible set of values" are given their own names in the form of a type `Data` and `ValidatedData`.

This distinction is actually very handy because types can be checked automatically by the (nominal) type system. If you make the `ValidatedData` constructor private & the only way to produce is function `ValidatedData validate(Data)`, then in any part of the codebase, there's no way any `ValidatedData` instance is malformed (assuming `validate` doesn't have bugs).

Extra note: I forgot to mention the "Parse, don't validate" article implicitly implies a nominal type system, where 2 objects with equivalent "data representation" doesn't mean it has the same type. This differs from Typescript's structural type system, where as long as the "data representation" is the same, both object are considered to have the same type.

Typescript will happily accept something like this because of structural

  type T1 = { x: String };
  type T2 = { x: String };
  function f(T1): void { ... }
  const t2: T2 = { x: "foo" };
  f(t2);

While nominal type systems like Haskell or Java will reject such expressions

  class T1 { String x; }
  class T2 { String x; }
  void f(T1) { ... }
  // f(new T2()); // Compile error: type mismatch

Because of this, the idea of using type as a "possible set of values" probably felt unintuitive to Typescript folks, as everything is just stringly-typed and different type felt synonymous with different "underlying data representation" there.

You can simulate this "same structure, but different meaning" concept of nominal type system in Typescript with some hacky workaround with Symbol.

  > The return type implies transformation – a write operation per se, whereas validation is always a read operation only

Why does the return type need to imply transformation and why is "validation" here always read-only? No-op function will return the exact same value you give it (in other words, identity transformation), and Java & Javascript procedures never guarantee a read-only operation.

↑