Stop writing CLI validation. Parse it right the first time

(hackers.pub)

203 points dahlia | 1 comments | 06 Sep 25 18:20 UTC | HN request time: 0s | source

Show context

jmull ◴[06 Sep 25 22:14 UTC] No.45153373[source]▶

> Think about it. When you get JSON from an API, you don't just parse it as any and then write a bunch of if-statements. You use something like Zod to parse it directly into the shape you want. Invalid data? The parser rejects it. Done.

Isn’t writing code and using zod the same thing? The difference being who wrote the code.

Of course, you hope zod is robust, tested, supported, extensible, and has docs so you can understand how to express your domain in terms it can help you with. And you hope you don’t have to spend too much time migrating as zod’s api changes.

replies(4): >>45154508 #>>45154791 #>>45155254 #>>45156821 #

bigstrat2003 ◴[07 Sep 25 02:16 UTC] No.45154791[source]▶

>>45153373 #

Yeah, the "parse, don't validate" advice seems vacuous to me because of this. Someone is doing that validation. I think the advice would perhaps be phrased better as "try to not reimplement popular libraries when you could just use them".

replies(6): >>45154814 #>>45155095 #>>45155795 #>>45156024 #>>45163088 #>>45177851 #

lock1 ◴[07 Sep 25 06:04 UTC] No.45155795[source]▶

>>45154791 #

When I first saw "Parse, don't validate" title, it struck me as a catchy but perhaps unnecessarily clever catchphrase. It's catchy, yes, but it felt too ambiguous to be meaningful for anyone outside of the target audience (Haskellers in this case).

That said, I fully agree with the article content itself. It basically just boils down to:

When you create a program, eventually you'll need to process & check whether input data is valid or not. In C-like language, you have 2 options

  void validate(struct Data d);

  struct ValidatedData;
  ValidatedData validate(struct Data d);

"Parse, don't validate" is just trying to say don't do `void validate(struct Data d)` (procedure with `void`), but do `ValidatedData validate(struct Data d)` (function returning `ValidatedData`) instead.

It doesn't mean you need to explicitly create or name everything as a "parser". It also doesn't mean "don't validate" either; in `ValidatedData validate(struct Data d)` you'll eventually have "validation" logic similar to the procedure `void` counterpart.

Specifically, the article tries to teach folks to utilize the type system to their advantage. Rather than praying to never forget invoking `validate(d)` on every single call site, make the type signature only accept `ValidatedData` type so the compiler will complain loudly if future maintainers try to shove `Data` type to it. This strategy offloads the mental burden of remembering things from the dev to the compiler.

I'm not exactly sure why the "Parse, don't validate" catchphrase keeps getting reused in other language communities. It's not clear to non-FP community what the distinction between "parser" and "validate", let alone "parser combinator". Yet somehow other articles keep reusing this same catchphrase.

replies(2): >>45158854 #>>45159046 #

1. Lvl999Noob ◴[07 Sep 25 15:08 UTC] No.45158854[source]▶

>>45155795 #

The difference, in my opinion, is that you received the cli args in the form

``` some_cli <some args> --some-option --no-some-option ```

Before parsing, the argument array contains both the flags to enable and disable the option. Validation would either throw an error or accept it as either enabled or disabled. But importantly, it wouldn't change the arguments. If the assumption is that the last option overwrites anything before it then the cli command is valid with the option disabled.

And now, correct behaviour relies on all the code using that option to always make the same assumption.

Parsing, on the other hand, would put create a new config where `option` is an enum - either enabled or disabled or not given. No confusion about multiple flags or anything. It provides a single view for the rest of the program of what the input config was.

Whether that parsing is done by a third party library or first party code, declaratively or imperatively, is besides the point.

↑