Stop writing CLI validation. Parse it right the first time

1. jmull ◴[06 Sep 25 22:14 UTC] No.45153373[source]▶

> Think about it. When you get JSON from an API, you don't just parse it as any and then write a bunch of if-statements. You use something like Zod to parse it directly into the shape you want. Invalid data? The parser rejects it. Done.

Isn’t writing code and using zod the same thing? The difference being who wrote the code.

Of course, you hope zod is robust, tested, supported, extensible, and has docs so you can understand how to express your domain in terms it can help you with. And you hope you don’t have to spend too much time migrating as zod’s api changes.

replies(4): >>45154508 #>>45154791 #>>45155254 #>>45156821 #

2. akoboldfrying ◴[07 Sep 25 01:28 UTC] No.45154508[source]▶

>>45153373 (TP) #

Yes, both are writing code. But nearly all the time, the constraints you want to express can be expressed with zod, and in that case using zod means you write less code, and the code you do write is more correct.

> Of course, you hope zod is robust, tested, supported, extensible, and has docs so you can understand how to express your domain in terms it can help you with. And you hope you don’t have to spend too much time migrating as zod’s api changes.

Yes, judgement is required to make depending on zod (or any library) worthwhile. This is not different in principle from trusting those same things hold for TypeScript, or Node, or V8, or the C++ compiler V8 was compiled with, or the x86_64 chip it's running on, or the laws of physics.

replies(1): >>45155279 #

3. bigstrat2003 ◴[07 Sep 25 02:16 UTC] No.45154791[source]▶

>>45153373 (TP) #

Yeah, the "parse, don't validate" advice seems vacuous to me because of this. Someone is doing that validation. I think the advice would perhaps be phrased better as "try to not reimplement popular libraries when you could just use them".

replies(6): >>45154814 #>>45155095 #>>45155795 #>>45156024 #>>45163088 #>>45177851 #

4. remexre ◴[07 Sep 25 02:21 UTC] No.45154814[source]▶

>>45154791 #

The difference between parse and validate is

    function parse(x: Foo): Bar { ... }

    const y = parse(x);

and

    function validate(x: Foo): void { ... }

    validate(x);
    const y = x as Bar;

Zod has a parser API, not a validator API.

5. dwattttt ◴[07 Sep 25 03:14 UTC] No.45155095[source]▶

>>45154791 #

Sibling says this with code, but to distil the advice: reflect the result of your validation in the type system.

Then instead of validating a loose type & still using the loose type, you're parsing it from a loose type into a strict type.

The key point is you never need to look at a loose type and think "I don't need to check this is valid, because it was checked before"; the type system tracks that for you.

replies(1): >>45155874 #

6. MrJohz ◴[07 Sep 25 03:54 UTC] No.45155254[source]▶

>>45153373 (TP) #

I think the key part, although the author doesn't quite make it explicit, is that (a) the parsing happens all up front, rather than weaving validation and logic together, and (b) the parsing creates a new structure that encodes the invariants of the application, so that the rest of the application no longer needs to check anything.

Whether you do that with Zod or manually or whatever isn't important, the important thing is having a preprocessing step that transforms the data and doesn't just validate it.

replies(2): >>45155837 #>>45161323 #

7. jmull ◴[07 Sep 25 04:00 UTC] No.45155279[source]▶

>>45154508 #

Sure... the laws of physics last broke backwards compatibility at the Big Bang, Zod last broke backwards compatibility a few months ago.

8. lock1 ◴[07 Sep 25 06:04 UTC] No.45155795[source]▶

>>45154791 #

When I first saw "Parse, don't validate" title, it struck me as a catchy but perhaps unnecessarily clever catchphrase. It's catchy, yes, but it felt too ambiguous to be meaningful for anyone outside of the target audience (Haskellers in this case).

That said, I fully agree with the article content itself. It basically just boils down to:

When you create a program, eventually you'll need to process & check whether input data is valid or not. In C-like language, you have 2 options

  void validate(struct Data d);

  struct ValidatedData;
  ValidatedData validate(struct Data d);

"Parse, don't validate" is just trying to say don't do `void validate(struct Data d)` (procedure with `void`), but do `ValidatedData validate(struct Data d)` (function returning `ValidatedData`) instead.

It doesn't mean you need to explicitly create or name everything as a "parser". It also doesn't mean "don't validate" either; in `ValidatedData validate(struct Data d)` you'll eventually have "validation" logic similar to the procedure `void` counterpart.

Specifically, the article tries to teach folks to utilize the type system to their advantage. Rather than praying to never forget invoking `validate(d)` on every single call site, make the type signature only accept `ValidatedData` type so the compiler will complain loudly if future maintainers try to shove `Data` type to it. This strategy offloads the mental burden of remembering things from the dev to the compiler.

I'm not exactly sure why the "Parse, don't validate" catchphrase keeps getting reused in other language communities. It's not clear to non-FP community what the distinction between "parser" and "validate", let alone "parser combinator". Yet somehow other articles keep reusing this same catchphrase.

replies(2): >>45158854 #>>45159046 #

9. makeitdouble ◴[07 Sep 25 06:17 UTC] No.45155837[source]▶

>>45155254 #

The base assumption is parsing upfront cost less than validating along. I thinks it's a common case, but not common enough to apply it as a generic principle.

For instance if validating parameter values requires multiple trips to a DB or other external system, weaving the calls in the logic can spare duplicating these round trips. Light "surface" validation can still be applied, but that's not what we're talking about here I think.

replies(2): >>45155976 #>>45157256 #

10. 8n4vidtmkvmk ◴[07 Sep 25 06:26 UTC] No.45155874{3}[source]▶

>>45155095 #

Everyone seems hung up on the type system, but I think the validity of the data is the important part. I'd still want to convert strings to ints, trim whitespace, drop extraneous props and all of that jazz even if I was using plain JS without types.

I still wouldn't need to check the inputs again because I know it's already been processed, even if the type system can't help me.

replies(2): >>45156099 #>>45160080 #

11. MrJohz ◴[07 Sep 25 06:51 UTC] No.45155976{3}[source]▶

>>45155837 #

It's not about costing less, it's about program structure. The goal should be to move from interface type (in this case a series of strings passed on the command line) to internal domain type (where we can use rich data types and enforce invariants like "if server, then all server properties are specified") as quickly as possible. That way, more of the application can be written to use those rich data types, avoiding errors or unnecessary defensive programming.

Even better, that conversion from interface type to internal type should ideally happen at one explicit point in the program - a function call which rejects all invalid inputs and returns a type that enforces the invariants we're interested in. That way, we gave a clean boundary point between the outside world and the inside one.

This isn't a performance issue at all, it's closer to the "imperative shell, functional core" ideas about structuring your application and data.

12. yakshaving_jgt ◴[07 Sep 25 07:02 UTC] No.45156024[source]▶

>>45154791 #

Parsing includes validation.

The point is you don’t check that your string only contains valid characters and then continue passing that string through your system. You parse your string into a narrower type, and none of the rest of your system needs to be programmed defensively.

To describe this advice as “vacuous” says more about you than it does about the author.

13. dwattttt ◴[07 Sep 25 07:20 UTC] No.45156099{4}[source]▶

>>45155874 #

The type isn't just there to make it easy to understand when you do it, it's for you a year later when you need to make a change further inside a codebase, far from where it's validated. Or for someone else who's never even seen the validation section of code.

I'm hung up on the type system because it's a great way to convey the validity of the data; it follows the data around as it flows through your program.

I don't (yet) Typescript, but jsdoc and linting give me enough type checking for my needs.

replies(2): >>45158166 #>>45193593 #

14. ◴[07 Sep 25 09:52 UTC] No.45156821[source]▶

>>45153373 (TP) #

15. lmm ◴[07 Sep 25 11:27 UTC] No.45157256{3}[source]▶

>>45155837 #

> if validating parameter values requires multiple trips to a DB or other external system, weaving the calls in the logic can spare duplicating these round trips

Sure, but probably at the cost of leaving everything in a horribly inconsistent state when you error out partway through. Which is almost always not worth it.

16. k3vinw ◴[07 Sep 25 13:54 UTC] No.45158166{5}[source]▶

>>45156099 #

jsdoc types are better than nothing. You could switch to using Typescript today and it will understand them.

17. Lvl999Noob ◴[07 Sep 25 15:08 UTC] No.45158854{3}[source]▶

>>45155795 #

The difference, in my opinion, is that you received the cli args in the form

``` some_cli <some args> --some-option --no-some-option ```

Before parsing, the argument array contains both the flags to enable and disable the option. Validation would either throw an error or accept it as either enabled or disabled. But importantly, it wouldn't change the arguments. If the assumption is that the last option overwrites anything before it then the cli command is valid with the option disabled.

And now, correct behaviour relies on all the code using that option to always make the same assumption.

Parsing, on the other hand, would put create a new config where `option` is an enum - either enabled or disabled or not given. No confusion about multiple flags or anything. It provides a single view for the rest of the program of what the input config was.

Whether that parsing is done by a third party library or first party code, declaratively or imperatively, is besides the point.

18. andreygrehov ◴[07 Sep 25 15:28 UTC] No.45159046{3}[source]▶

>>45155795 #

What is ValidatedData? A subset of the Data that is valid? This makes no sense to me. The way I see it is you use ‘validate’ when the format of the data you are validating is the exact same format you are gonna be working with right after, meaning the return type doesn’t matter. The return type implies transformation – a write operation per se, whereas validation is always a read operation only.

replies(1): >>45161636 #

19. Lvl999Noob ◴[07 Sep 25 17:15 UTC] No.45160080{4}[source]▶

>>45155874 #

Pure js without typescript also has "types". Typescript doesn't give you nominal types either. It's only structural. So when you say that you "know it's already been processed", you just have a mental type of "Parsed" vs "Raw". With a type system, it's like you have a partner dedicated to tracking that. But without that, it doesn't mean you aren't doing any parsing or type tracking of your own.

replies(1): >>45193575 #

20. 1718627440 ◴[07 Sep 25 19:21 UTC] No.45161323[source]▶

>>45155254 #

But when you parse all arguments first before throwing error messages, you can create much better error messages, since they can be more holistic. To do that you need to represent the invalid configuration as a type.

replies(2): >>45161894 #>>45163050 #

21. lock1 ◴[07 Sep 25 20:01 UTC] No.45161636{4}[source]▶

>>45159046 #

  > What is ValidatedData? A subset of the Data that is valid?

Usually, but not necessarily. `validate()` might add some additional information too, for example: `validationTime`.

More often than not, in a real case of applying algebraic data type & "Parse, don't validate", it's something like `Option<ValidatedData>` or `Result<ValidatedData,PossibleValidationError>`, borrowing Rust's names. `Option` & `Result` expand the possible return values that function can return to cover the possibility of failure in the validation process, but it's independent from possible values that `ValidatedData` itself can contain.

  > The way I see it is you use ‘validate’ when the format of the data you are validating is the exact same format you are gonna be working with right after, meaning the return type doesn’t matter.

The main point of "Parse, don't validate" is to distinguish between "machine-level data representation" vs "possible set of values" of a type and utilize this "possible set of values" property.

Your "the exact same format" point is correct; oftentimes, the underlying data representation of a type is exactly the same between pre- & post-validation. But more often than not "possible set of values" of `ValidatedData` is a subset of `Data`. These 2 different "possible set of values" are given their own names in the form of a type `Data` and `ValidatedData`.

This distinction is actually very handy because types can be checked automatically by the (nominal) type system. If you make the `ValidatedData` constructor private & the only way to produce is function `ValidatedData validate(Data)`, then in any part of the codebase, there's no way any `ValidatedData` instance is malformed (assuming `validate` doesn't have bugs).

Extra note: I forgot to mention the "Parse, don't validate" article implicitly implies a nominal type system, where 2 objects with equivalent "data representation" doesn't mean it has the same type. This differs from Typescript's structural type system, where as long as the "data representation" is the same, both object are considered to have the same type.

Typescript will happily accept something like this because of structural

  type T1 = { x: String };
  type T2 = { x: String };
  function f(T1): void { ... }
  const t2: T2 = { x: "foo" };
  f(t2);

While nominal type systems like Haskell or Java will reject such expressions

  class T1 { String x; }
  class T2 { String x; }
  void f(T1) { ... }
  // f(new T2()); // Compile error: type mismatch

Because of this, the idea of using type as a "possible set of values" probably felt unintuitive to Typescript folks, as everything is just stringly-typed and different type felt synonymous with different "underlying data representation" there.

You can simulate this "same structure, but different meaning" concept of nominal type system in Typescript with some hacky workaround with Symbol.

  > The return type implies transformation – a write operation per se, whereas validation is always a read operation only

Why does the return type need to imply transformation and why is "validation" here always read-only? No-op function will return the exact same value you give it (in other words, identity transformation), and Java & Javascript procedures never guarantee a read-only operation.

22. 12_throw_away ◴[07 Sep 25 20:36 UTC] No.45161894{3}[source]▶

>>45161323 #

> To do that you need to represent the invalid configuration as a type

Right - and one thing that keeps coming up for me is that, if you want to maintain complex invariants, it's quite natural to express them in terms of the domain object itself (or maybe, ugh, a DTO with the same fields), rather than in terms of input constraints.

23. geon ◴[07 Sep 25 23:10 UTC] No.45163050{3}[source]▶

>>45161323 #

Sure. Then you return that validated data structure from the parsing function and never touch the invalid data structure again. That's exactly what "Parse, don't validate" means.

24. geon ◴[07 Sep 25 23:16 UTC] No.45163088[source]▶

>>45154791 #

This might be a clearer phrasing: "Parse and validate ONCE AND FOR ALL, instead of sprinkling validation everywhere you need to access the data."

But I suppose it isn't as catchy.

25. antonvs ◴[09 Sep 25 05:44 UTC] No.45177851[source]▶

>>45154791 #

> Someone is doing that validation.

The difference is (a) where and how validation happens, and (b) the type of the final result.

A parser is a function producing structured values - values of some type, usually different from the input type. In contrast, a validator is a predicate that only checks constraints on existing values.

For example, a parser can parse an email address into a variable of type EmailAddress. If the parser succeeds at doing that, assuming you're using a language with a decent type system, you now have a variable which is statically guaranteed to be an email address - not a string which you have to trust has passed validation at some point in the past.

This is part of the "Make illegal states unrepresentable" approach which allows for static debugging - debugging your code at compile time. It's a very powerful way to produce reliable systems with robust, statically proven guarantees.

But as Alexis King (who coined the phrase "Parse, don't validate") wrote, "Unless you already know what type-driven design is, my catchy slogan probably doesn’t mean all that much to you."

26. hdjrudni ◴[10 Sep 25 05:12 UTC] No.45193575{5}[source]▶

>>45160080 #

I don't think that's what people are talking about when they say types. They're talking about TypeScript types, not mental models of object structure.

27. hdjrudni ◴[10 Sep 25 05:14 UTC] No.45193593{5}[source]▶

>>45156099 #

Don't get me wrong, I love TypeScript types. And if I didn't have TypeScript, I'd use jsdoc.

I'm just saying that TypeScript and jsdoc don't actually do any runtime enforcement. It's important that the library does that part, with or without types.