Stop writing CLI validation. Parse it right the first time

(hackers.pub)

Show context

jmull ◴[06 Sep 25 22:14 UTC] No.45153373[source]▶

> Think about it. When you get JSON from an API, you don't just parse it as any and then write a bunch of if-statements. You use something like Zod to parse it directly into the shape you want. Invalid data? The parser rejects it. Done.

Isn’t writing code and using zod the same thing? The difference being who wrote the code.

Of course, you hope zod is robust, tested, supported, extensible, and has docs so you can understand how to express your domain in terms it can help you with. And you hope you don’t have to spend too much time migrating as zod’s api changes.

replies(4): >>45154508 #>>45154791 #>>45155254 #>>45156821 #

1. MrJohz ◴[07 Sep 25 03:54 UTC] No.45155254[source]▶

>>45153373 #

I think the key part, although the author doesn't quite make it explicit, is that (a) the parsing happens all up front, rather than weaving validation and logic together, and (b) the parsing creates a new structure that encodes the invariants of the application, so that the rest of the application no longer needs to check anything.

Whether you do that with Zod or manually or whatever isn't important, the important thing is having a preprocessing step that transforms the data and doesn't just validate it.

replies(2): >>45155837 #>>45161323 #

2. makeitdouble ◴[07 Sep 25 06:17 UTC] No.45155837[source]▶

>>45155254 (TP) #

The base assumption is parsing upfront cost less than validating along. I thinks it's a common case, but not common enough to apply it as a generic principle.

For instance if validating parameter values requires multiple trips to a DB or other external system, weaving the calls in the logic can spare duplicating these round trips. Light "surface" validation can still be applied, but that's not what we're talking about here I think.

replies(2): >>45155976 #>>45157256 #

3. MrJohz ◴[07 Sep 25 06:51 UTC] No.45155976[source]▶

>>45155837 #

It's not about costing less, it's about program structure. The goal should be to move from interface type (in this case a series of strings passed on the command line) to internal domain type (where we can use rich data types and enforce invariants like "if server, then all server properties are specified") as quickly as possible. That way, more of the application can be written to use those rich data types, avoiding errors or unnecessary defensive programming.

Even better, that conversion from interface type to internal type should ideally happen at one explicit point in the program - a function call which rejects all invalid inputs and returns a type that enforces the invariants we're interested in. That way, we gave a clean boundary point between the outside world and the inside one.

This isn't a performance issue at all, it's closer to the "imperative shell, functional core" ideas about structuring your application and data.

4. lmm ◴[07 Sep 25 11:27 UTC] No.45157256[source]▶

>>45155837 #

> if validating parameter values requires multiple trips to a DB or other external system, weaving the calls in the logic can spare duplicating these round trips

Sure, but probably at the cost of leaving everything in a horribly inconsistent state when you error out partway through. Which is almost always not worth it.

5. 1718627440 ◴[07 Sep 25 19:21 UTC] No.45161323[source]▶

>>45155254 (TP) #

But when you parse all arguments first before throwing error messages, you can create much better error messages, since they can be more holistic. To do that you need to represent the invalid configuration as a type.

replies(2): >>45161894 #>>45163050 #

6. 12_throw_away ◴[07 Sep 25 20:36 UTC] No.45161894[source]▶

>>45161323 #

> To do that you need to represent the invalid configuration as a type

Right - and one thing that keeps coming up for me is that, if you want to maintain complex invariants, it's quite natural to express them in terms of the domain object itself (or maybe, ugh, a DTO with the same fields), rather than in terms of input constraints.

7. geon ◴[07 Sep 25 23:10 UTC] No.45163050[source]▶

>>45161323 #

Sure. Then you return that validated data structure from the parsing function and never touch the invalid data structure again. That's exactly what "Parse, don't validate" means.

↑