Stop writing CLI validation. Parse it right the first time

1. 12_throw_away ◴[07 Sep 25 01:29 UTC] No.45154518[source]▶

I like this advice, and yeah, I always try to make illegal states unrepresentable, possibly even to a fault.

The problem I run into here is - how do you create good error messages when you do this? If the user has passed you input with multiple problems, how do you build a list of everything that's wrong with it if the parser crashes out halfway through?

replies(6): >>45154618 #>>45154627 #>>45155518 #>>45155610 #>>45155934 #>>45156178 #

2. ambicapter ◴[07 Sep 25 01:45 UTC] No.45154618[source]▶

>>45154518 (TP) #

Most validation libraries worth their salt give you options to deal with this sort of thing? They'll hand you an aggregate error with an 'errors' array, or they'll let you write an error message "prettify-er" to make a particular validation error easier to read.

replies(2): >>45154990 #>>45155372 #

3. akoboldfrying ◴[07 Sep 25 01:46 UTC] No.45154627[source]▶

>>45154518 (TP) #

Agree. It should definitely be possible to get error messages on par with what TypeScript gives you when you try to assign an object literal to an incompatibly typed variable; whether that's currently the case, and how difficult it would be to get there if not, I don't know.

4. Thaxll ◴[07 Sep 25 02:55 UTC] No.45154990[source]▶

>>45154618 #

This work if all errors are self contained, stoping at the first one is fine too.

5. pmarreck ◴[07 Sep 25 04:21 UTC] No.45155372[source]▶

>>45154618 #

Right, but that's validation, and this article is talking about parsing (not validating) into an already-correct structure by making invalid inputs unrepresentable.

So maybe the reason why they were able to reduce the code is because they lost the ability to do good error reporting.

replies(3): >>45156653 #>>45157249 #>>45157290 #

6. ffsm8 ◴[07 Sep 25 04:59 UTC] No.45155518[source]▶

>>45154518 (TP) #

I think you're looking at it too literally - what people usually mean with"making invalid state unrepresentable" is in the main application which has your domain code - which should be separate from your inputs

He even gives the example of zod, which is a validation library he defines to be a parser.

What he wants to say : "I don't want to write my own validation in a CLI, give me a good API already that first validates and then converts the inputs into my declared schema"

replies(2): >>45155854 #>>45157423 #

7. adinisom ◴[07 Sep 25 05:19 UTC] No.45155610[source]▶

>>45154518 (TP) #

If talking about UI, the flip side is not to harm the user's data. So despite containing errors it needs to representable, even if it can't be passed further along to back-end systems.

For parsing specifically, there's literature on error recovery to try to make progress past the error.

8. 8n4vidtmkvmk ◴[07 Sep 25 06:21 UTC] No.45155854[source]▶

>>45155518 #

Zod might be a validation library, but it also does type coercion and transforms. I believe that's what the author means by a parser.

replies(1): >>45156900 #

9. geysersam ◴[07 Sep 25 06:43 UTC] No.45155934[source]▶

>>45154518 (TP) #

Maybe you can use his `or` construct to allow a `--server` without `--port`, but then also add a default `error_message` property.

After parsing you check if `error_message` exists and raise that error.

10. mark38848 ◴[07 Sep 25 07:34 UTC] No.45156178[source]▶

>>45154518 (TP) #

Just use optparse-applicative in PureScript. Applicatives are great for this and the library gives it to you for free.

replies(1): >>45156700 #

11. jpc0 ◴[07 Sep 25 09:17 UTC] No.45156653{3}[source]▶

>>45155372 #

How is getting an error array not making invalid input unrepresentable.

You either get the correctly parsed data or you get an error array. The incorrect input was never represented in code, vs a 0 value being returned or even worse random gibberish.

A trivial example: 1/0 should return DivisionByZero not 0 or infinity or NaN or whatever else. You can then decide in your UI whether that is a case you want to handle as an error or as an edge case but the parser knows that is not possible to represent.

12. bradrn ◴[07 Sep 25 09:25 UTC] No.45156700[source]▶

>>45156178 #

> Just use optparse-applicative in PureScript.

Or in Haskell!

13. goku12 ◴[07 Sep 25 10:08 UTC] No.45156900{3}[source]▶

>>45155854 #

Apparently not. The author cites the example of json parsing for APIs. You usually don't split it into a generic parsing into native data types and then validate the result in memory (unless you're on a dynamically typed language and don't use a validation schema). Instead, the expected native data type of the result (composed using structs, enums, unions, vectors, etc) is defined first and then you try to parse the json into that data type. Any json errors and schema violations will error out in a single step.

14. lmm ◴[07 Sep 25 11:25 UTC] No.45157249{3}[source]▶

>>45155372 #

You parse into an applicative validation structure, combine those together, and then once you've brought everything together you handle that as either erroring out with all the errors or continuing with the correct config. It's easier to do that with a parsing approach than a validating approach, not harder.

15. Ygg2 ◴[07 Sep 25 11:33 UTC] No.45157290{3}[source]▶

>>45155372 #

Parsers can be made to not fail on first error. You return either a parsed structure or an array of found error.

Html5 parser is notoriously friendly to errors. See adoption agency algorithm.

16. MrJohz ◴[07 Sep 25 12:01 UTC] No.45157423[source]▶

>>45155518 #

> I don't want to write my own validation in a CLI, give me a good API already that first validates and then converts the inputs into my declared schema

But that _is_ parsing, at least in the sense of "parse, don't validate". It's about turning inputs into real objects representing the domain code that you're about to be working with. The result is still going to be a DTO of some description, but it will be a DTO with guaranteed invariants that are useful to you. For example, a post request shouldn't be parsed into a user object just because it shares a lot of fields in common with a user. Instead it should become a DTO with the invariants fulfilled that makes sense for a DTO. Some of those invariants are simple (like "dates should be valid" -> the DTO contains Date objects not strings), and some will be more complex like the "if the server is active, then the port also needs to be provided" restriction from the article.

This is one of the key ideas behind Zod - it isn't just trying to validate whether an object matches a certain schema, but it converts the result into a type that accurately expresses the invariants that must be in place if the object is valid.

replies(1): >>45157724 #

17. ffsm8 ◴[07 Sep 25 12:49 UTC] No.45157724{3}[source]▶

>>45157423 #

I dont disagree with the desire to get a good API like that. I was just pointing out that this was the core of the desire the author had, as 12_throw_away was correctly pointing out that _true_ parsing and making invalid state unrepresentable forces you to error out on the first missmatch, which makes it impossible to raise multiple issues. the only way around that is to allow invalid state during the input phase.

zod also allows invalid state as input, then attempts to shoehorn them into the desired schema, which still runs these validations the author was complaining about - just not in the code he wrote.

replies(2): >>45160015 #>>45160595 #

18. Lvl999Noob ◴[07 Sep 25 17:09 UTC] No.45160015{4}[source]▶

>>45157724 #

Why does "true" parsing have to error out on the very first problem? It is more than possible (though maybe not easy) to keep parsing and collecting errors as they appear. Zod, as the given example in the post, does it.

replies(1): >>45161342 #

19. MrJohz ◴[07 Sep 25 17:58 UTC] No.45160595{4}[source]▶

>>45157724 #

I don't know that I understand why parsing necessarily has to error out on the first mismatch. Good parsers will collect errors as they go along.

Zod does take in invalid state as input, but that is what a parser does. In this case, the parser is `any -> T` as opposed to `string -> T`, but that's still a parsing operation.

replies(1): >>45161698 #

20. 1718627440 ◴[07 Sep 25 19:23 UTC] No.45161342{5}[source]▶

>>45160015 #

Because then it would need to represent invalid data in its output type.

21. 12_throw_away ◴[07 Sep 25 20:10 UTC] No.45161698{5}[source]▶

>>45160595 #

Well, if you want to collect errors, then you need to have a way to store the transformed input in a form that allows you to check the invariants, which can be arbitrarily complex. So naturally there must be some intermediate representations that allow illegal states. And there must be functions that take these IRs that return either domain objects or lists of errors.

So, having used this thread to rubber-duck about how the principle of "parse-don't-validate" works with the principle of "provide good error messages", I'm arriving at these rules, which are really more about encapsulation than parsing:

1. Encapsulate both parsing and validation in a single function: `parse(RawInput) -> Result<ValidDomainObject,ListOfErrors>`

2. Ideally, `parse` is implemented by a robust parsing/validation library for the type of input that you're dealing with. It will create some intermediate representations that you need not concern yourself with.

3. If there isn't a good parser library for your use case, your implementation of `parse` will necessarily contain intermediate representations of potentially illegal state. This is both fine and unavoidable, just don't let them leak out of your parser.