Stop writing CLI validation. Parse it right the first time

(hackers.pub)

203 points dahlia | 5 comments | 06 Sep 25 18:20 UTC | HN request time: 0.016s | source

Show context

12_throw_away ◴[07 Sep 25 01:29 UTC] No.45154518[source]▶

I like this advice, and yeah, I always try to make illegal states unrepresentable, possibly even to a fault.

The problem I run into here is - how do you create good error messages when you do this? If the user has passed you input with multiple problems, how do you build a list of everything that's wrong with it if the parser crashes out halfway through?

replies(6): >>45154618 #>>45154627 #>>45155518 #>>45155610 #>>45155934 #>>45156178 #

ffsm8 ◴[07 Sep 25 04:59 UTC] No.45155518[source]▶

>>45154518 #

I think you're looking at it too literally - what people usually mean with"making invalid state unrepresentable" is in the main application which has your domain code - which should be separate from your inputs

He even gives the example of zod, which is a validation library he defines to be a parser.

What he wants to say : "I don't want to write my own validation in a CLI, give me a good API already that first validates and then converts the inputs into my declared schema"

replies(2): >>45155854 #>>45157423 #

MrJohz ◴[07 Sep 25 12:01 UTC] No.45157423[source]▶

>>45155518 #

> I don't want to write my own validation in a CLI, give me a good API already that first validates and then converts the inputs into my declared schema

But that _is_ parsing, at least in the sense of "parse, don't validate". It's about turning inputs into real objects representing the domain code that you're about to be working with. The result is still going to be a DTO of some description, but it will be a DTO with guaranteed invariants that are useful to you. For example, a post request shouldn't be parsed into a user object just because it shares a lot of fields in common with a user. Instead it should become a DTO with the invariants fulfilled that makes sense for a DTO. Some of those invariants are simple (like "dates should be valid" -> the DTO contains Date objects not strings), and some will be more complex like the "if the server is active, then the port also needs to be provided" restriction from the article.

This is one of the key ideas behind Zod - it isn't just trying to validate whether an object matches a certain schema, but it converts the result into a type that accurately expresses the invariants that must be in place if the object is valid.

replies(1): >>45157724 #

1. ffsm8 ◴[07 Sep 25 12:49 UTC] No.45157724[source]▶

>>45157423 #

I dont disagree with the desire to get a good API like that. I was just pointing out that this was the core of the desire the author had, as 12_throw_away was correctly pointing out that _true_ parsing and making invalid state unrepresentable forces you to error out on the first missmatch, which makes it impossible to raise multiple issues. the only way around that is to allow invalid state during the input phase.

zod also allows invalid state as input, then attempts to shoehorn them into the desired schema, which still runs these validations the author was complaining about - just not in the code he wrote.

replies(2): >>45160015 #>>45160595 #

2. Lvl999Noob ◴[07 Sep 25 17:09 UTC] No.45160015[source]▶

>>45157724 (TP) #

Why does "true" parsing have to error out on the very first problem? It is more than possible (though maybe not easy) to keep parsing and collecting errors as they appear. Zod, as the given example in the post, does it.

replies(1): >>45161342 #

3. MrJohz ◴[07 Sep 25 17:58 UTC] No.45160595[source]▶

>>45157724 (TP) #

I don't know that I understand why parsing necessarily has to error out on the first mismatch. Good parsers will collect errors as they go along.

Zod does take in invalid state as input, but that is what a parser does. In this case, the parser is `any -> T` as opposed to `string -> T`, but that's still a parsing operation.

replies(1): >>45161698 #

4. 1718627440 ◴[07 Sep 25 19:23 UTC] No.45161342[source]▶

>>45160015 #

Because then it would need to represent invalid data in its output type.

5. 12_throw_away ◴[07 Sep 25 20:10 UTC] No.45161698[source]▶

>>45160595 #

Well, if you want to collect errors, then you need to have a way to store the transformed input in a form that allows you to check the invariants, which can be arbitrarily complex. So naturally there must be some intermediate representations that allow illegal states. And there must be functions that take these IRs that return either domain objects or lists of errors.

So, having used this thread to rubber-duck about how the principle of "parse-don't-validate" works with the principle of "provide good error messages", I'm arriving at these rules, which are really more about encapsulation than parsing:

1. Encapsulate both parsing and validation in a single function: `parse(RawInput) -> Result<ValidDomainObject,ListOfErrors>`

2. Ideally, `parse` is implemented by a robust parsing/validation library for the type of input that you're dealing with. It will create some intermediate representations that you need not concern yourself with.

3. If there isn't a good parser library for your use case, your implementation of `parse` will necessarily contain intermediate representations of potentially illegal state. This is both fine and unavoidable, just don't let them leak out of your parser.

↑