The problem I run into here is - how do you create good error messages when you do this? If the user has passed you input with multiple problems, how do you build a list of everything that's wrong with it if the parser crashes out halfway through?
The problem I run into here is - how do you create good error messages when you do this? If the user has passed you input with multiple problems, how do you build a list of everything that's wrong with it if the parser crashes out halfway through?
So maybe the reason why they were able to reduce the code is because they lost the ability to do good error reporting.
He even gives the example of zod, which is a validation library he defines to be a parser.
What he wants to say : "I don't want to write my own validation in a CLI, give me a good API already that first validates and then converts the inputs into my declared schema"
For parsing specifically, there's literature on error recovery to try to make progress past the error.
You either get the correctly parsed data or you get an error array. The incorrect input was never represented in code, vs a 0 value being returned or even worse random gibberish.
A trivial example: 1/0 should return DivisionByZero not 0 or infinity or NaN or whatever else. You can then decide in your UI whether that is a case you want to handle as an error or as an edge case but the parser knows that is not possible to represent.
But that _is_ parsing, at least in the sense of "parse, don't validate". It's about turning inputs into real objects representing the domain code that you're about to be working with. The result is still going to be a DTO of some description, but it will be a DTO with guaranteed invariants that are useful to you. For example, a post request shouldn't be parsed into a user object just because it shares a lot of fields in common with a user. Instead it should become a DTO with the invariants fulfilled that makes sense for a DTO. Some of those invariants are simple (like "dates should be valid" -> the DTO contains Date objects not strings), and some will be more complex like the "if the server is active, then the port also needs to be provided" restriction from the article.
This is one of the key ideas behind Zod - it isn't just trying to validate whether an object matches a certain schema, but it converts the result into a type that accurately expresses the invariants that must be in place if the object is valid.
zod also allows invalid state as input, then attempts to shoehorn them into the desired schema, which still runs these validations the author was complaining about - just not in the code he wrote.
Zod does take in invalid state as input, but that is what a parser does. In this case, the parser is `any -> T` as opposed to `string -> T`, but that's still a parsing operation.
So, having used this thread to rubber-duck about how the principle of "parse-don't-validate" works with the principle of "provide good error messages", I'm arriving at these rules, which are really more about encapsulation than parsing:
1. Encapsulate both parsing and validation in a single function: `parse(RawInput) -> Result<ValidDomainObject,ListOfErrors>`
2. Ideally, `parse` is implemented by a robust parsing/validation library for the type of input that you're dealing with. It will create some intermediate representations that you need not concern yourself with.
3. If there isn't a good parser library for your use case, your implementation of `parse` will necessarily contain intermediate representations of potentially illegal state. This is both fine and unavoidable, just don't let them leak out of your parser.