Most active commenters

MrJohz(8)
yakshaving_jgt(6)
dboon(5)
bschwindHN(5)
geon(4)
dwattttt(4)
(3)
12_throw_away(3)
delta_p_delta_x(3)
craftkiller(3)

Popular/hot comments

>>45156696 #
>>45157057 #
>>45154518 #
>>45154791 #
>>45153163 #
>>45153373 #
>>45155304 #
>>45153174 #
>>45155372 #
>>45156801 #

Stop writing CLI validation. Parse it right the first time

(hackers.pub)

1. yakshaving_jgt ◴[06 Sep 25 21:46 UTC] No.45153163[source]▶

I've noticed that many programmers believe that parsing is some niche thing that the average programmer likely won't need to contend with, and that it's only applicable in a few specific low-level cases, in which you'll need to reach for a parser combinator library, etc.

But this is wrong. Programmers should be writing parsers all the time!

replies(4): >>45153317 #>>45153402 #>>45153520 #>>45153674 #

2. dvdkon ◴[06 Sep 25 21:46 UTC] No.45153168[source]▶

>>45151622 (OP) #

I, for one, do think the world needs more CLI argument parsers :)

This project looks neat, I've never thought to use parser combinators for something other than left-to-right string/token stream parsing.

And I like how it uses Typescript's metaprogramming to generate types from the parser code. I think that would be much harder (or impossible) in other languages, making the idiomatic design of a similar similar library very different.

3. HL33tibCe7 ◴[06 Sep 25 21:47 UTC] No.45153174[source]▶

>>45151622 (OP) #

Stopped reading after realising this is written by ChatGPT

replies(3): >>45153243 #>>45153278 #>>45154607 #

4. thealistra ◴[06 Sep 25 21:49 UTC] No.45153186[source]▶

>>45151622 (OP) #

Isn’t this like argparse from Python for typescript?

replies(1): >>45153443 #

5. bfung ◴[06 Sep 25 21:56 UTC] No.45153243[source]▶

>>45153174 #

Looked human-ish to me, what signs did you see?

replies(1): >>45155590 #

6. cazum ◴[06 Sep 25 22:02 UTC] No.45153278[source]▶

>>45153174 #

What makes you think that and not that it's just an average auto-translate job from the author's native language (Korean)?

replies(1): >>45153502 #

7. WJW ◴[06 Sep 25 22:08 UTC] No.45153317[source]▶

>>45153163 #

Last week my primary task was writing a github action that needed to log in to Heroku and push the current code on main and development branches to the production and staging environments respectively. The week before that, I wrote some code to make sure the type the object was included in the filters passed to an API call.

Don't get me wrong, I actually love writing parsers. It's just not required all that often in my day-to-day work. 99% of the time when I need to write a parser myself it's for and Advent of Code problem, usually I just import whatever JSON or YAML parser is provided for the platform and go from there.

replies(1): >>45153397 #

8. jmull ◴[06 Sep 25 22:14 UTC] No.45153373[source]▶

>>45151622 (OP) #

> Think about it. When you get JSON from an API, you don't just parse it as any and then write a bunch of if-statements. You use something like Zod to parse it directly into the shape you want. Invalid data? The parser rejects it. Done.

Isn’t writing code and using zod the same thing? The difference being who wrote the code.

Of course, you hope zod is robust, tested, supported, extensible, and has docs so you can understand how to express your domain in terms it can help you with. And you hope you don’t have to spend too much time migrating as zod’s api changes.

replies(4): >>45154508 #>>45154791 #>>45155254 #>>45156821 #

9. yakshaving_jgt ◴[06 Sep 25 22:17 UTC] No.45153397{3}[source]▶

>>45153317 #

Do you not write validation? Or handle user input? Or handle server responses? Surely there’s some data processing somewhere.

10. dkubb ◴[06 Sep 25 22:18 UTC] No.45153402[source]▶

>>45153163 #

The three most common things I think about when coding are DAGs, State Machines and parsing. The latter two come up all the time in regexps which I probably write at least once a day, and I’m always thinking about state transitions and dependencies.

11. whilenot-dev ◴[06 Sep 25 22:23 UTC] No.45153443[source]▶

>>45153186 #

What OP calls an "combinatorial parser" I'd call object schema validation and that's more similar to pydantic[0] than argparse in python land.

[0]: https://docs.pydantic.dev/latest/

replies(1): >>45154075 #

12. parhamn ◴[06 Sep 25 22:26 UTC] No.45153467[source]▶

>>45151622 (OP) #

> Try to access it and TypeScript yells at you. No runtime validation needed.

I was recently thinking about type safety and validation strategies are particularly thorny in languages where the typings are just annotations. E.g. the Typescript/Zod or Python/Pydantic universes. Especially in IO cases where the data doesn't originate in the same type system.

In a language like Go (just an example, not endorsing) if you parse something into say a struct you know worst case you're getting that struct with all the fields set to zero, and you just have to handle the zero values. In typescript-likes you can get a totally different structure and run into all sorts of errors.

All that is to say, the runtime validation is always somewhere (perhaps in the library, as they often are?), and the feature here isn't no runtime validation but typed cli arguments. Which is cool and great.

replies(1): >>45153829 #

13. urxvtcd ◴[06 Sep 25 22:30 UTC] No.45153502{3}[source]▶

>>45153278 #

I’ll go one step further: what makes you think it’s an average auto-translate job? I didn’t notice anything weird, felt like your average, slightly ranty HN post. I’m not a native speaker though.

14. eska ◴[06 Sep 25 22:32 UTC] No.45153520[source]▶

>>45153163 #

I think most security issues are just due to people not parsing input at all/properly. Then security consultants give each one a new name as if it was something new. :-)

15. nine_k ◴[06 Sep 25 22:45 UTC] No.45153599[source]▶

>>45151622 (OP) #

This is a recurring idea: "Parse, don't validate". Previously:

https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-va... (2019, using Haskell)

https://www.lelanthran.com/chap13/content.html (April 2025, using C)

replies(1): >>45153898 #

16. nine_k ◴[06 Sep 25 22:56 UTC] No.45153674[source]▶

>>45153163 #

I'd say that engineers should use the highest-level tools that are adequate for the task.

Sometimes it's going down to machine code, or rolling your own hash table, or writing your own recursive-descent parser from first principles. But most of the time you don't have to reach that low, and things like parsing are but a minor detail in the grand scheme. The engineer should not spend time on building them, but should be able to competently choose a ready-made part.

I mean, creating your own bolts and nuts may be fun, but mot of the time, if you want to build something, you just pick a few from an appropriate box, and this is exactly right.

replies(1): >>45154735 #

17. ThinkBeat ◴[06 Sep 25 23:21 UTC] No.45153810[source]▶

>>45151622 (OP) #

And that is why there are plenty of parser generators so you dont have to write the parser yourself every time.

18. metaltyphoon ◴[06 Sep 25 23:25 UTC] No.45153829[source]▶

>>45153467 #

> worst case you're getting that struct with all the fields set to zero, and you just have to handle the zero values

In the field I work, zero values are valid and doing it in Go would be a nightmare

replies(2): >>45154434 #>>45158290 #

19. jetrink ◴[06 Sep 25 23:38 UTC] No.45153898[source]▶

>>45153599 #

The author credits Alexis King at the beginning and links to that post.

20. bsoles ◴[06 Sep 25 23:58 UTC] No.45153983[source]▶

>>45151622 (OP) #

>> // This is a parser

>> const port = option("--port", integer());

I don't understand. Why is this a parser? Isn't it just way of enforcing a type in a language that doesn't have types?

I was expecting something like a state machine that takes the command line text and parses it to validate the syntax and values.

replies(1): >>45154702 #

21. lihaoyi ◴[07 Sep 25 00:06 UTC] No.45154029[source]▶

>>45151622 (OP) #

That's basically what my MainArgs Scala library does: take either a method definition or class structure and use it's structure to parse your command line arguments. You get the final fields you want immediately without needing to imperatively walk to args array (and probably getting it wrong!)

https://github.com/com-lihaoyi/mainargs

22. SloopJon ◴[07 Sep 25 00:06 UTC] No.45154031[source]▶

>>45151622 (OP) #

I don't see anything in the post or the linked tutorial that gives a flavor of the user experience when you supply an invalid option. I tried running the example, but I've forgotten too much about Node and TypeScript to make it work. (It can't resolve the @optique references.) What happens when you pass --foo, --target bar, or --port 3.14?

replies(1): >>45154447 #

23. nhumrich ◴[07 Sep 25 00:13 UTC] No.45154075{3}[source]▶

>>45153443 #

So, typer than

replies(1): >>45155884 #

24. SoftTalker ◴[07 Sep 25 00:52 UTC] No.45154295[source]▶

>>45151622 (OP) #

I like just writing functions for each valid combination of flags and parameters. Anything that isn’t handled is default rejected. Languages like Erlang with pattern matching and guards make this a breeze.

25. AfterHIA ◴[07 Sep 25 00:59 UTC] No.45154322[source]▶

>>45151622 (OP) #

You've got to be careful; if you validate the CLI too much you might get URA in your validator. #chugalug #house

26. parhamn ◴[07 Sep 25 01:18 UTC] No.45154434{3}[source]▶

>>45153829 #

Agreed, the pointer or "<field>_empty: bool" patterns are annoying. Point still stands though, you always get the structure you ask for.

27. macintux ◴[07 Sep 25 01:20 UTC] No.45154447[source]▶

>>45154031 #

I had a similar question: to me, the output format “or” statement looks like it might deterministically pick one winner instead of alerting the user that they erred. A good parser is terrific, but it needs to give useful feedback.

replies(1): >>45157297 #

28. akoboldfrying ◴[07 Sep 25 01:28 UTC] No.45154508[source]▶

>>45153373 #

Yes, both are writing code. But nearly all the time, the constraints you want to express can be expressed with zod, and in that case using zod means you write less code, and the code you do write is more correct.

> Of course, you hope zod is robust, tested, supported, extensible, and has docs so you can understand how to express your domain in terms it can help you with. And you hope you don’t have to spend too much time migrating as zod’s api changes.

Yes, judgement is required to make depending on zod (or any library) worthwhile. This is not different in principle from trusting those same things hold for TypeScript, or Node, or V8, or the C++ compiler V8 was compiled with, or the x86_64 chip it's running on, or the laws of physics.

replies(1): >>45155279 #

29. 12_throw_away ◴[07 Sep 25 01:29 UTC] No.45154518[source]▶

>>45151622 (OP) #

I like this advice, and yeah, I always try to make illegal states unrepresentable, possibly even to a fault.

The problem I run into here is - how do you create good error messages when you do this? If the user has passed you input with multiple problems, how do you build a list of everything that's wrong with it if the parser crashes out halfway through?

replies(6): >>45154618 #>>45154627 #>>45155518 #>>45155610 #>>45155934 #>>45156178 #

30. akoboldfrying ◴[07 Sep 25 01:43 UTC] No.45154607[source]▶

>>45153174 #

I found the content novel and helpful (applying a known but underappreciated technique (Parse, Don't Validate) to a common problem where I hadn't thought to use it before) and the tone very enjoyable. In fact, it's so idiomatically written that I can't even believe it's just a machine translation of something written in another language.

In short, a great article.

31. ambicapter ◴[07 Sep 25 01:45 UTC] No.45154618[source]▶

>>45154518 #

Most validation libraries worth their salt give you options to deal with this sort of thing? They'll hand you an aggregate error with an 'errors' array, or they'll let you write an error message "prettify-er" to make a particular validation error easier to read.

replies(2): >>45154990 #>>45155372 #

32. akoboldfrying ◴[07 Sep 25 01:46 UTC] No.45154627[source]▶

>>45154518 #

Agree. It should definitely be possible to get error messages on par with what TypeScript gives you when you try to assign an object literal to an incompatibly typed variable; whether that's currently the case, and how difficult it would be to get there if not, I don't know.

33. andrewguy9 ◴[07 Sep 25 01:48 UTC] No.45154641[source]▶

>>45151622 (OP) #

Docopt!

http://docopt.org/

Make use of the usage string be the specification!

A criminally underused library.

replies(2): >>45154764 #>>45155334 #

34. sudahtigabulan ◴[07 Sep 25 01:59 UTC] No.45154701[source]▶

>>45151622 (OP) #

Is there no getopt implementation for Typescript? The input this library tries to handle better looks to me like bad design.

"options that depend on options" should not be a thing. Every option should be optional. Even if you have working code that can handle some complex situation, this doesn't make the situation any less unintuitive for the users.

If you need more complex relationships, consider using arguments as well. Top level, or under an option. Yes, they are not named, but since they are mandatory anyway, you are likely to remember their meaning (spaced repetition and all that). They can still be optional (if they come last). Sometimes an argument may need to have multiple parts, like user@host:port You can still parse it instead of validating, if you want.

> mutually exclusive --json, --xml, --yaml.

Use something like -t TYPE instead, where TYPE can be one of json, xml, or yaml. (Make illegal states unrepresentable.)

> debug: optional(option("--debug")),

Again, I believe it's called "option" because it's meant to be optional already.

  optional(optional(option("--common-sense")))

EOR

replies(2): >>45155137 #>>45155159 #

35. hansvm ◴[07 Sep 25 02:00 UTC] No.45154702[source]▶

>>45153983 #

The heavy lifting happens in the definitions of `option` and `integer`. Those will take in whatever arguments they take in and output some sort of `Stream -> Result<Tuple<T, Stream>>` function.

That might sound messy but to the author's point about parser combinators not being complicated, they really don't take much time to get used to, and they're quite simple if you wanted to build such a library yourself. There's not much code (and certainly no magic) going on under the hood.

The advantage of that parsing approach:

It's reasonably declarative. This seems like the author's core point. Parser-combinator code largely looks like just writing out the object you want as a parse result, using your favorite combinator library as the building blocks, and everything automagically works, with amazing type-checking if your language has such features.

The disadvantages:

1. Like any parsing approach, you have to actually consider all the nuances of what you really want parsed (e.g., conditional rules around whitespace handling). It looks a little to me (just from the blog post, not having examined the inner workings yet) like this project side-stepped that by working with the `Stream` type as just the `argv` list, allowing you to be able to say things like "parse the next blob as a string" without also having to encode whitespace and blob boundaries.

2. It's definitely slower (and more memory-intensive) than a hand-rolled parser, and usually also worse in that regard than other sorts of "auto-generated" parsing code.

For CLI arguments, especially if they picked argv as their base stream type, those disadvantages mostly don't exist. I could see it performing poorly for argv parsing for something like `cp` though (maybe not -- maybe something like `git cp`, which has more potential parse failures from delimiters like `--`?), which has both options and potentially ginormous lists of files; if you're not very careful in your argument specification then you might have exponential backtracking issues, and where that would be blatantly obvious in a hand-rolled parser it'll probably get swept under the rug with parser combinators.

36. dcre ◴[07 Sep 25 02:00 UTC] No.45154704[source]▶

>>45151622 (OP) #

Some other libraries I’ve been enjoying building CLIs with in TS that do more or less the same thing, though perhaps with slightly worse composability than Optique:

https://cliffy.io/

https://github.com/tj/commander.js

37. yakshaving_jgt ◴[07 Sep 25 02:08 UTC] No.45154735{3}[source]▶

>>45153674 #

I don’t understand. Every mainstream language has libraries for parsing into general types, but none of them will have libraries for parsing values specific to your application.

TFA links to Alexis King’s Parse, Don’t Validate article, which explains this well. Did you not read it?

38. fragmede ◴[07 Sep 25 02:13 UTC] No.45154764[source]▶

>>45154641 #

My favorite. A bit too much magic for some, but it seems well specified to me.

39. bigstrat2003 ◴[07 Sep 25 02:16 UTC] No.45154791[source]▶

>>45153373 #

Yeah, the "parse, don't validate" advice seems vacuous to me because of this. Someone is doing that validation. I think the advice would perhaps be phrased better as "try to not reimplement popular libraries when you could just use them".

replies(6): >>45154814 #>>45155095 #>>45155795 #>>45156024 #>>45163088 #>>45177851 #

40. remexre ◴[07 Sep 25 02:21 UTC] No.45154814{3}[source]▶

>>45154791 #

The difference between parse and validate is

    function parse(x: Foo): Bar { ... }

    const y = parse(x);

and

    function validate(x: Foo): void { ... }

    validate(x);
    const y = x as Bar;

Zod has a parser API, not a validator API.

41. Thaxll ◴[07 Sep 25 02:55 UTC] No.45154990{3}[source]▶

>>45154618 #

This work if all errors are self contained, stoping at the first one is fine too.

42. esafak ◴[07 Sep 25 03:10 UTC] No.45155079[source]▶

>>45151622 (OP) #

The "problem" is that some languages don't have rich enough type systems to encode all the constraints that people want to support with CLI options. And many programmers aren't that great at wielding the type systems at their disposal.

43. m463 ◴[07 Sep 25 03:11 UTC] No.45155083[source]▶

>>45151622 (OP) #

This kind of stuff is what makes me appreciate python's argparse.

It's a genuine pleasure to use, and I use it often.

If you dig a little deeper into it, it does all the type and value validation, file validation, it does required and mutually exclusive args, it does subargs. And it lets you do special cases of just about anything.

And of course it does the "normal" stuff like short + long args, boolean args, args that are lists, default values, and help strings.

replies(1): >>45155304 #

44. dwattttt ◴[07 Sep 25 03:14 UTC] No.45155095{3}[source]▶

>>45154791 #

Sibling says this with code, but to distil the advice: reflect the result of your validation in the type system.

Then instead of validating a loose type & still using the loose type, you're parsing it from a loose type into a strict type.

The key point is you never need to look at a loose type and think "I don't need to check this is valid, because it was checked before"; the type system tracks that for you.

replies(1): >>45155874 #

45. dwattttt ◴[07 Sep 25 03:25 UTC] No.45155137[source]▶

>>45154701 #

> options that depend on options

What would you do for "top level option, which can be modified in two other ways"?

  (--option | --option-with-flag1 | --option-with-flag2 | --option-with-flag1-and-flag2)

would solve invalid representation, but is unwieldy.

Something that results in the usage string

  [--option [--flag1 --flag2]]

doesn't seem so bad at that point.

replies(1): >>45155348 #

46. Spivak ◴[07 Sep 25 03:31 UTC] No.45155159[source]▶

>>45154701 #

I think ultimately you're trying to tell a river that it's going the wrong way. Programs have had required options for decades at this point. I think they can make sense as alternatives to heterogeneously typed positional arguments. By making the user name them explicitly you remove ambiguity and let the user specify them in whatever order they please.

In Python this was a motivating factor for letting functions demand their arguments be passed as named keywords. Something like send("foo", "bar") is easier to understand and call correctly when you have to say send(channel="foo", message="bar")

47. MrJohz ◴[07 Sep 25 03:54 UTC] No.45155254[source]▶

>>45153373 #

I think the key part, although the author doesn't quite make it explicit, is that (a) the parsing happens all up front, rather than weaving validation and logic together, and (b) the parsing creates a new structure that encodes the invariants of the application, so that the rest of the application no longer needs to check anything.

Whether you do that with Zod or manually or whatever isn't important, the important thing is having a preprocessing step that transforms the data and doesn't just validate it.

replies(2): >>45155837 #>>45161323 #

48. jmull ◴[07 Sep 25 04:00 UTC] No.45155279{3}[source]▶

>>45154508 #

Sure... the laws of physics last broke backwards compatibility at the Big Bang, Zod last broke backwards compatibility a few months ago.

49. MrJohz ◴[07 Sep 25 04:05 UTC] No.45155304[source]▶

>>45155083 #

Actually, I think argparse falls into the same trap that the author is talking about. You can define lots of invariants in the parser, and say that these two arguments can't be passed together, or that this argument, if specified, requires these arguments to also be specified, etc. But the end result is a namespace with a bunch of key-value pairs on it, and argparse doesn't play well with typing systems like mypy or pyright. So the rest of the tool has to assume that the invariants were correctly specified up-front.

The result is that you often still this kind of defensive programming, where argparse ensures that an invariant holds, but other functions still check the same invariant later on because they might have been called a different way or just because the developer isn't sure whether everything was checked where they are in the program.

What I think the author is looking for is a combination of argparse and Pydantic, such that when you define a parser using argparse, it automatically creates the relevant Pydantic classes that define the type of the parsed arguments.

replies(4): >>45155352 #>>45155712 #>>45155993 #>>45156316 #

50. tomjakubowski ◴[07 Sep 25 04:11 UTC] No.45155334[source]▶

>>45154641 #

A great example of "declaration follows use" outside of C syntax.

51. sudahtigabulan ◴[07 Sep 25 04:14 UTC] No.45155348{3}[source]▶

>>45155137 #

I think I've seen it done like that

  --option flag1,flag2

(Maybe with another separator, as long as it doesn't need to be escaped.)

Another possibility is to make the main option an argument, like the subcommands in git, systemctl, and others:

  command option --flag1 --flag2

This depends on the specifics, though.

replies(1): >>45156175 #

52. sgarland ◴[07 Sep 25 04:14 UTC] No.45155352{3}[source]▶

>>45155304 #

Precisely my thought. I love argparse, but you can really back yourself into a corner if you aren’t careful.

53. pmarreck ◴[07 Sep 25 04:21 UTC] No.45155372{3}[source]▶

>>45154618 #

Right, but that's validation, and this article is talking about parsing (not validating) into an already-correct structure by making invalid inputs unrepresentable.

So maybe the reason why they were able to reduce the code is because they lost the ability to do good error reporting.

replies(3): >>45156653 #>>45157249 #>>45157290 #

54. ffsm8 ◴[07 Sep 25 04:59 UTC] No.45155518[source]▶

>>45154518 #

I think you're looking at it too literally - what people usually mean with"making invalid state unrepresentable" is in the main application which has your domain code - which should be separate from your inputs

He even gives the example of zod, which is a validation library he defines to be a parser.

What he wants to say : "I don't want to write my own validation in a CLI, give me a good API already that first validates and then converts the inputs into my declared schema"

replies(2): >>45155854 #>>45157423 #

55. jiggawatts ◴[07 Sep 25 05:12 UTC] No.45155583[source]▶

>>45151622 (OP) #

This is one of the many reasons I like PowerShell: it parses strongly typed parameters for you and outputs human readable error messages for every kind of validation failure.

56. bobbiechen ◴[07 Sep 25 05:14 UTC] No.45155590{3}[source]▶

>>45153243 #

I thought the style was like ChatGPT in a "clever, casual, snarky" prompt flavor as well. I see it a lot on LinkedIn especially in sentence structures like these:

"Invalid data? The parser rejects it. Done."

"That validation logic that used to be 30% of my CLI code? Gone."

"Mutually exclusive groups? Sure. Context-dependent options? Why not."

For me this really piled on at the end of the blog post. But maybe it's just personal style too.

57. adinisom ◴[07 Sep 25 05:19 UTC] No.45155610[source]▶

>>45154518 #

If talking about UI, the flip side is not to harm the user's data. So despite containing errors it needs to representable, even if it can't be passed further along to back-end systems.

For parsing specifically, there's literature on error recovery to try to make progress past the error.

58. adamddev1 ◴[07 Sep 25 05:39 UTC] No.45155685[source]▶

>>45151622 (OP) #

Yay for parser combinators in the JS/TS wild!

replies(1): >>45156564 #

59. hahn-kev ◴[07 Sep 25 05:44 UTC] No.45155712{3}[source]▶

>>45155304 #

It's almost like you want compile time type safety

replies(1): >>45155927 #

60. lock1 ◴[07 Sep 25 06:04 UTC] No.45155795{3}[source]▶

>>45154791 #

When I first saw "Parse, don't validate" title, it struck me as a catchy but perhaps unnecessarily clever catchphrase. It's catchy, yes, but it felt too ambiguous to be meaningful for anyone outside of the target audience (Haskellers in this case).

That said, I fully agree with the article content itself. It basically just boils down to:

When you create a program, eventually you'll need to process & check whether input data is valid or not. In C-like language, you have 2 options

  void validate(struct Data d);

  struct ValidatedData;
  ValidatedData validate(struct Data d);

"Parse, don't validate" is just trying to say don't do `void validate(struct Data d)` (procedure with `void`), but do `ValidatedData validate(struct Data d)` (function returning `ValidatedData`) instead.

It doesn't mean you need to explicitly create or name everything as a "parser". It also doesn't mean "don't validate" either; in `ValidatedData validate(struct Data d)` you'll eventually have "validation" logic similar to the procedure `void` counterpart.

Specifically, the article tries to teach folks to utilize the type system to their advantage. Rather than praying to never forget invoking `validate(d)` on every single call site, make the type signature only accept `ValidatedData` type so the compiler will complain loudly if future maintainers try to shove `Data` type to it. This strategy offloads the mental burden of remembering things from the dev to the compiler.

I'm not exactly sure why the "Parse, don't validate" catchphrase keeps getting reused in other language communities. It's not clear to non-FP community what the distinction between "parser" and "validate", let alone "parser combinator". Yet somehow other articles keep reusing this same catchphrase.

replies(2): >>45158854 #>>45159046 #

61. makeitdouble ◴[07 Sep 25 06:17 UTC] No.45155837{3}[source]▶

>>45155254 #

The base assumption is parsing upfront cost less than validating along. I thinks it's a common case, but not common enough to apply it as a generic principle.

For instance if validating parameter values requires multiple trips to a DB or other external system, weaving the calls in the logic can spare duplicating these round trips. Light "surface" validation can still be applied, but that's not what we're talking about here I think.

replies(2): >>45155976 #>>45157256 #

62. 8n4vidtmkvmk ◴[07 Sep 25 06:21 UTC] No.45155854{3}[source]▶

>>45155518 #

Zod might be a validation library, but it also does type coercion and transforms. I believe that's what the author means by a parser.

replies(1): >>45156900 #

63. 8n4vidtmkvmk ◴[07 Sep 25 06:26 UTC] No.45155874{4}[source]▶

>>45155095 #

Everyone seems hung up on the type system, but I think the validity of the data is the important part. I'd still want to convert strings to ints, trim whitespace, drop extraneous props and all of that jazz even if I was using plain JS without types.

I still wouldn't need to check the inputs again because I know it's already been processed, even if the type system can't help me.

replies(2): >>45156099 #>>45160080 #

64. mrugge ◴[07 Sep 25 06:31 UTC] No.45155884{4}[source]▶

>>45154075 #

Or click

65. MrJohz ◴[07 Sep 25 06:42 UTC] No.45155927{4}[source]▶

>>45155712 #

You can have that with Mypy and friends in Python, and Typescript in the JS world. The problem is that older libraries often don't utilise that type safety very well because their API wasn't designed for it.

The library in the original post is essentially a Javascript library, but it's one designed so that if you use it with Typescript, it provides that type safety.

66. geysersam ◴[07 Sep 25 06:43 UTC] No.45155934[source]▶

>>45154518 #

Maybe you can use his `or` construct to allow a `--server` without `--port`, but then also add a default `error_message` property.

After parsing you check if `error_message` exists and raise that error.

67. MrJohz ◴[07 Sep 25 06:51 UTC] No.45155976{4}[source]▶

>>45155837 #

It's not about costing less, it's about program structure. The goal should be to move from interface type (in this case a series of strings passed on the command line) to internal domain type (where we can use rich data types and enforce invariants like "if server, then all server properties are specified") as quickly as possible. That way, more of the application can be written to use those rich data types, avoiding errors or unnecessary defensive programming.

Even better, that conversion from interface type to internal type should ideally happen at one explicit point in the program - a function call which rejects all invalid inputs and returns a type that enforces the invariants we're interested in. That way, we gave a clean boundary point between the outside world and the inside one.

This isn't a performance issue at all, it's closer to the "imperative shell, functional core" ideas about structuring your application and data.

68. js2 ◴[07 Sep 25 06:55 UTC] No.45155993{3}[source]▶

>>45155304 #

> What I think the author is looking for is a combination of argparse and Pydantic

Not quite that, but https://typer.tiangolo.com/ is fully type driven.

69. yakshaving_jgt ◴[07 Sep 25 07:02 UTC] No.45156024{3}[source]▶

>>45154791 #

Parsing includes validation.

The point is you don’t check that your string only contains valid characters and then continue passing that string through your system. You parse your string into a narrower type, and none of the rest of your system needs to be programmed defensively.

To describe this advice as “vacuous” says more about you than it does about the author.

70. dwattttt ◴[07 Sep 25 07:20 UTC] No.45156099{5}[source]▶

>>45155874 #

The type isn't just there to make it easy to understand when you do it, it's for you a year later when you need to make a change further inside a codebase, far from where it's validated. Or for someone else who's never even seen the validation section of code.

I'm hung up on the type system because it's a great way to convey the validity of the data; it follows the data around as it flows through your program.

I don't (yet) Typescript, but jsdoc and linting give me enough type checking for my needs.

replies(2): >>45158166 #>>45193593 #

71. dwattttt ◴[07 Sep 25 07:34 UTC] No.45156175{4}[source]▶

>>45155348 #

> --option flag1,flag2

Embedding a second parse step that the first parser doesn't deal with is done, but it's a rough compromise.

It feels like the difficulty in dealing with

  [--option [--flag1 --flag2]]

Is more to do with its expression in the language parsed to, than CLI elegance.

72. mark38848 ◴[07 Sep 25 07:34 UTC] No.45156178[source]▶

>>45154518 #

Just use optparse-applicative in PureScript. Applicatives are great for this and the library gives it to you for free.

replies(1): >>45156700 #

73. bvrmn ◴[07 Sep 25 07:46 UTC] No.45156239[source]▶

>>45151622 (OP) #

A valid type for server and port should be a single value. Stop parse it separately please.

":3000" -> use port 3000 with a default host.

"some-host" -> use host with a default port.

"some-host:3000" -> you guess it.

It also allows to extend it to other sources/destinations like unix domain sockets and other stuff without cluttering your CLI options.

Also please consider to use DSN or URI to define database configurations. Host, port, dbname, credentials as separate options or environment variables are quite painful to use.

74. bvrmn ◴[07 Sep 25 08:00 UTC] No.45156316{3}[source]▶

>>45155304 #

In general case generating CLI options from app models leads to horrible CLI UX. Opposite is also true. Working with "nice" CLI options as direct app models is horrendous.

You need a boundary to convert nice opts into nice types. Like pydantic models could take argparse namespace and convert it to something manageable.

replies(1): >>45156960 #

75. slifin ◴[07 Sep 25 08:43 UTC] No.45156505[source]▶

>>45151622 (OP) #

So use Clojure Spec or better yet Malli to parse your input data at the edges of your program

Makes sense, I think a lot of developers would want to complect this problem with their runtime type system of choice without considering the set of downsides for the users

76. brabel ◴[07 Sep 25 08:59 UTC] No.45156564[source]▶

>>45155685 #

Exactly, the author's library is just a parser combinator [1] that specializes in providing constructs mirrorring CLI options.

[1] https://en.wikipedia.org/wiki/Parser_combinator

77. jpc0 ◴[07 Sep 25 09:17 UTC] No.45156653{4}[source]▶

>>45155372 #

How is getting an error array not making invalid input unrepresentable.

You either get the correctly parsed data or you get an error array. The incorrect input was never represented in code, vs a 0 value being returned or even worse random gibberish.

A trivial example: 1/0 should return DivisionByZero not 0 or infinity or NaN or whatever else. You can then decide in your UI whether that is a case you want to handle as an error or as an edge case but the parser knows that is not possible to represent.

78. bschwindHN ◴[07 Sep 25 09:25 UTC] No.45156696[source]▶

>>45151622 (OP) #

Rust with Clap solved this forever ago.

Also - don't write CLI programs in languages that don't compile to native binaries. I don't want to have to drag around your runtime just to execute a command line tool.

replies(9): >>45156782 #>>45156785 #>>45157057 #>>45157203 #>>45158148 #>>45159646 #>>45160365 #>>45161166 #>>45163035 #

79. bradrn ◴[07 Sep 25 09:25 UTC] No.45156700{3}[source]▶

>>45156178 #

> Just use optparse-applicative in PureScript.

Or in Haskell!

80. majorbugger ◴[07 Sep 25 09:44 UTC] No.45156782[source]▶

>>45156696 #

I will keep writing my CLI programs in the languages I want, thanks. Have it crossed your mind that these programs might be for yourself or for internal consumption? When you know runtime will be installed anyway?

replies(2): >>45156801 #>>45157855 #

81. dcminter ◴[07 Sep 25 09:45 UTC] No.45156785[source]▶

>>45156696 #

The declarative form of clap is not quite as well documented as the programmatic approach (but it's not too bad to figure out usually).

One of the things I love about clap is that you can configure it to automatically spit out --help info, and you can even get it to generate shell autocompletions for you!

I think there are some other libraries that are challenging it now (fewer dependencies or something?) but clap sets the standard to beat.

82. dcminter ◴[07 Sep 25 09:48 UTC] No.45156801{3}[source]▶

>>45156782 #

You do you, obviously, but "now let npm work its wicked way" is an offputting step for some of us when narrowing down which tool to use.

My most comfortable tool is Java, but I'm not going to persuade most of the HN crowd to install a JVM unless the software I'm offering is unbearably compelling.

Internal to work? Yeah, Java's going to be an easy sell.

I don't think OP necessarily meant it as a political statement.

replies(3): >>45156969 #>>45158151 #>>45160329 #

83. ◴[07 Sep 25 09:52 UTC] No.45156821[source]▶

>>45153373 #

84. ◴[07 Sep 25 10:01 UTC] No.45156876[source]▶

>>45151622 (OP) #

85. goku12 ◴[07 Sep 25 10:08 UTC] No.45156900{4}[source]▶

>>45155854 #

Apparently not. The author cites the example of json parsing for APIs. You usually don't split it into a generic parsing into native data types and then validate the result in memory (unless you're on a dynamically typed language and don't use a validation schema). Instead, the expected native data type of the result (composed using structs, enums, unions, vectors, etc) is defined first and then you try to parse the json into that data type. Any json errors and schema violations will error out in a single step.

86. panzi ◴[07 Sep 25 10:11 UTC] No.45156913[source]▶

>>45151622 (OP) #

No mention of yargs?

87. MrJohz ◴[07 Sep 25 10:24 UTC] No.45156960{4}[source]▶

>>45156316 #

I mean, that's much the same as working with web APIs or any other kind of interface. Your DTO will probably be different from your internal models. But that doesn't mean it can't contain invariants, or that you can't parse it into a meaningful type. A DTO that's just a grab-bag of optional values is a pain to work with.

Although in practice, I find clap's approach works pretty well: define an object that represents the parsed arguments as you want them, with annotations for details that can't be represented in the type system, and then derive a parser from that. Because Rust has ADTs and other tools for building meaningful types, and because the derive process can do so much. That creates an arguments object that you can quite easily pass to a function which runs the command.

88. goku12 ◴[07 Sep 25 10:26 UTC] No.45156969{4}[source]▶

>>45156801 #

There should be some way to define the CLI argument format and its constraints in some sort of DSL that can be compiled into the target language before the final compilation of the application. This way, it can be language agnostic (though I don't know why you would need this) without the need for another runtime. The same interface specification should be able to represent a customizable help/usage message with sane defaults, generate dynamic tab completions code for multiple shells, generate code for good quality customizable error messages in case of CLI argument errors and generate a neatly formatted man page with provisions for additional content, etc.

In fact, I think something like this already exists. I just can't recollect the project.

replies(1): >>45158329 #

89. globular-toast ◴[07 Sep 25 10:46 UTC] No.45157051[source]▶

>>45151622 (OP) #

Not all of this validation belongs in the same layer. A lot of the problems people seem to have is due to people thinking it all has to be done in the I/O layer.

A CLI and an API should indeed occupy the same layer of a program architecture, namely they are entry points that live on the periphery. But really all you should be doing there is lifting the low byte stream you are getting from users to something higher level you can use to call your internals.

So "CLI validation" should be limited to just "I need an int here, one of these strings here, optionally" etc. Stuff like "is this port out of range" or "if you give me this I need this too" should be handled by your internals by e.g. throwing an exception. Your CLI can then display that as an error message in a nice way.

90. MathMonkeyMan ◴[07 Sep 25 10:47 UTC] No.45157057[source]▶

>>45156696 #

Almost every command line tool has runtime dependencies that must be installed on your system.

    $ ldd /usr/bin/rg
    linux-vdso.so.1 (0x00007fff45dd7000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x000070764e7b1000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x000070764e6ca000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x000070764de00000)
    /lib64/ld-linux-x86-64.so.2 (0x000070764e7e6000)

The worst is compiling a C program with a compiler that uses a more recent libc than is installed on the installation host.

replies(7): >>45157725 #>>45157872 #>>45157947 #>>45157966 #>>45158276 #>>45161297 #>>45161300 #

91. AndrewDucker ◴[07 Sep 25 11:14 UTC] No.45157186[source]▶

>>45151622 (OP) #

This is one of the things that makes me glad that PowerShell does all of this intrinsically. I define the parameters, it makes sure that the arguments make sense and match them (and their validation).

92. rs186 ◴[07 Sep 25 11:16 UTC] No.45157203[source]▶

>>45156696 #

Apparently that ship has sailed. Claude Code and Gemini CLI requires Node.js installation, and Gemini README reads as if npm is a tool that everybody knows and has already installed.

https://www.anthropic.com/claude-code

https://github.com/google-gemini/gemini-cli

replies(2): >>45157956 #>>45158783 #

93. lmm ◴[07 Sep 25 11:25 UTC] No.45157249{4}[source]▶

>>45155372 #

You parse into an applicative validation structure, combine those together, and then once you've brought everything together you handle that as either erroring out with all the errors or continuing with the correct config. It's easier to do that with a parsing approach than a validating approach, not harder.

94. lmm ◴[07 Sep 25 11:27 UTC] No.45157256{4}[source]▶

>>45155837 #

> if validating parameter values requires multiple trips to a DB or other external system, weaving the calls in the logic can spare duplicating these round trips

Sure, but probably at the cost of leaving everything in a horribly inconsistent state when you error out partway through. Which is almost always not worth it.

95. Ygg2 ◴[07 Sep 25 11:33 UTC] No.45157290{4}[source]▶

>>45155372 #

Parsers can be made to not fail on first error. You return either a parsed structure or an array of found error.

Html5 parser is notoriously friendly to errors. See adoption agency algorithm.

96. Dragging-Syrup ◴[07 Sep 25 11:33 UTC] No.45157297{3}[source]▶

>>45154447 #

Absolutely; I think calling the function xor would be more appropriate.

97. einpoklum ◴[07 Sep 25 11:57 UTC] No.45157404[source]▶

>>45151622 (OP) #

Exactly the opposite of this. We should parse the command-line using _no_ strict types. Not even integers. Nothing beyond parsing its structure, e.g. which option names get which (string) values, and which flags are enabled. This can be done without knowing _anything_ about the application domain, and provide a generic options structure which is no longer a sequence of characters.

This approach IMNSHO is much cleaner than the intrication of cmdline parser libraries with application logic and application-domain-related types.

Then one can specify validation logic declaratively, and apply it generically.

This has the added benefit - for compiled rather than interpreted library - of not having to recompile the CLI parsing library for each different app and each different definition of options.

replies(2): >>45157472 #>>45185433 #

98. MrJohz ◴[07 Sep 25 12:01 UTC] No.45157423{3}[source]▶

>>45155518 #

> I don't want to write my own validation in a CLI, give me a good API already that first validates and then converts the inputs into my declared schema

But that _is_ parsing, at least in the sense of "parse, don't validate". It's about turning inputs into real objects representing the domain code that you're about to be working with. The result is still going to be a DTO of some description, but it will be a DTO with guaranteed invariants that are useful to you. For example, a post request shouldn't be parsed into a user object just because it shares a lot of fields in common with a user. Instead it should become a DTO with the invariants fulfilled that makes sense for a DTO. Some of those invariants are simple (like "dates should be valid" -> the DTO contains Date objects not strings), and some will be more complex like the "if the server is active, then the port also needs to be provided" restriction from the article.

This is one of the key ideas behind Zod - it isn't just trying to validate whether an object matches a certain schema, but it converts the result into a type that accurately expresses the invariants that must be in place if the object is valid.

replies(1): >>45157724 #

99. MrJohz ◴[07 Sep 25 12:08 UTC] No.45157472[source]▶

>>45157404 #

Can you give some examples of this working well? It certainly goes against all of my experience working with CLIs and with parsing inputs in general (e.g. web APIs etc). In general, I've found that the quicker I can convert strings into rich types, the easier that code is to work with and the less likely I am to have troubles with invalid data.

replies(1): >>45165716 #

100. ffsm8 ◴[07 Sep 25 12:49 UTC] No.45157724{4}[source]▶

>>45157423 #

I dont disagree with the desire to get a good API like that. I was just pointing out that this was the core of the desire the author had, as 12_throw_away was correctly pointing out that _true_ parsing and making invalid state unrepresentable forces you to error out on the first missmatch, which makes it impossible to raise multiple issues. the only way around that is to allow invalid state during the input phase.

zod also allows invalid state as input, then attempts to shoehorn them into the desired schema, which still runs these validations the author was complaining about - just not in the code he wrote.

replies(2): >>45160015 #>>45160595 #

101. dboon ◴[07 Sep 25 12:49 UTC] No.45157725{3}[source]▶

>>45157057 #

That’s the first rule anyone writing portable binaries learns. Compile against an old libc, and stuff tends to just work.

replies(1): >>45157910 #

102. bschwindHN ◴[07 Sep 25 13:07 UTC] No.45157855{3}[source]▶

>>45156782 #

That's fine, I'll be avoiding using them :)

replies(1): >>45158210 #

103. bschwindHN ◴[07 Sep 25 13:09 UTC] No.45157872{3}[source]▶

>>45157057 #

Yes but I've never had a native tool fail on a missing libc. I've had several Python tools and JS tools fail on missing the right version of their interpreter. Even on the right interpreter version Python tools frequently shit the bed because they're so fragile.

replies(1): >>45158259 #

104. delta_p_delta_x ◴[07 Sep 25 13:16 UTC] No.45157910{4}[source]▶

>>45157725 #

> Compile against an old libc

This clause is abstracting away a ton of work. If you want to compile the latest LLVM and get 'portable C++26', you need to bootstrap everything, including CMake from that old-hat libc on some ancient distro like CentOS 6 or Ubuntu 12.04.

I've said it before, I'll say it again: the Linux kernel may maintain ABI compatibility, but the fact that GNU libc breaks it anyway makes it a moot point. It is a pain to target older Linux with a newer distro, which is by far the most common development use case.

replies(1): >>45158772 #

105. sestep ◴[07 Sep 25 13:22 UTC] No.45157947{3}[source]▶

>>45157057 #

I statically link all my Linux CLI tools against musl for this reason. Or use Nix.

106. Sharlin ◴[07 Sep 25 13:24 UTC] No.45157956{3}[source]▶

>>45157203 #

That's terrible, but at the very least there's the tiny justification that those are web API clients rather than standalone/local tools.

107. Sharlin ◴[07 Sep 25 13:25 UTC] No.45157966{3}[source]▶

>>45157057 #

Sure, but Rust specifically uses static linking for everything but the very basics (ie. libc) in order to avoid the DLL hell.

108. perching_aix ◴[07 Sep 25 13:51 UTC] No.45158148[source]▶

>>45156696 #

Like shell scripts? Cause I mean, I agree, I think this world would be a better place if starting tomorrow shell scripts were no longer a thing. Just probably not what you meant.

replies(2): >>45158455 #>>45159160 #

109. vvillena ◴[07 Sep 25 13:51 UTC] No.45158151{4}[source]▶

>>45156801 #

This is not an issue with Java and the other JVM languages, it's simple to use GraalVM and package a static binary.

110. k3vinw ◴[07 Sep 25 13:54 UTC] No.45158166{6}[source]▶

>>45156099 #

jsdoc types are better than nothing. You could switch to using Typescript today and it will understand them.

111. jappgar ◴[07 Sep 25 13:57 UTC] No.45158196[source]▶

>>45151622 (OP) #

I really think parse don't validate gives people a false sense of security (particularly false in dynamic languages like javascript and python).

"Well, I already know this is a valid uuid, so I don't really need to worry about sql injection at this point."

Sure, this is a dumb thing to do in any case, but I've seen this exact thing happen.

Typesafety isn't safety.

replies(1): >>45158289 #

112. perching_aix ◴[07 Sep 25 13:59 UTC] No.45158210{4}[source]▶

>>45157855 #

You'll avoid using his personal tooling he doesn't share, and his internal tooling he shares where you don't work?

Are you stuck in write-only mode or something? How does this make any sense to you?

113. mjevans ◴[07 Sep 25 14:05 UTC] No.45158259{4}[source]▶

>>45157872 #

I have. During system upgrades, usually along unsupported paths.

If you're ever living dangerously, bring along busybox-static. It might not be the best, but you'll thank yourself later.

114. craftkiller ◴[07 Sep 25 14:08 UTC] No.45158276{3}[source]▶

>>45157057 #

Don't let your dreams be dreams

  $ wget 'https://github.com/BurntSushi/ripgrep/releases/download/14.1.1/ripgrep-14.1.1-x86_64-unknown-linux-musl.tar.gz'
  $ tar -xvf 'ripgrep-14.1.1-x86_64-unknown-linux-musl.tar.gz'
  $ ldd ripgrep-14.1.1-x86_64-unknown-linux-musl/rg
  ldd (0x7f1dcb927000)
  $ file ripgrep-14.1.1-x86_64-unknown-linux-musl/rg
  ripgrep-14.1.1-x86_64-unknown-linux-musl/rg: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), static-pie linked, stripped

replies(1): >>45161037 #

115. yakshaving_jgt ◴[07 Sep 25 14:10 UTC] No.45158289[source]▶

>>45158196 #

Type safety is absolutely some degree of safety. And I don’t know why anyone would think parsing a value into a type that has fewer inhabitants would absolve them of having to prevent SQL injection — these are orthogonal things.

The quote here — which I suspect is a straw man — is such a weird non sequitur. What would logically follow from “I already know this is a valid UUID” is “so I don’t need to worry about this not being a UUID at this point”.

replies(1): >>45159728 #

116. mjevans ◴[07 Sep 25 14:10 UTC] No.45158290{3}[source]▶

>>45153829 #

Database NULL is a valid pattern that any parser SHOULD support and I do consider that a design bug in every parser Go has. Offhand most of them effectively 'update' an object, but make it difficult or impossible to tell if something was __set__ with a value, or merely inherited a default.

117. craftkiller ◴[07 Sep 25 14:17 UTC] No.45158329{5}[source]▶

>>45156969 #

docopt: https://github.com/docopt/docopt

118. bschwindHN ◴[07 Sep 25 14:31 UTC] No.45158455{3}[source]▶

>>45158148 #

> I think this world would be a better place if starting tomorrow shell scripts were no longer a thing.

Pretty much agreed - once any sort of complicated logic enters a shell script it's probably better off written in C/Rust/Go or something akin to that.

119. foundart ◴[07 Sep 25 14:46 UTC] No.45158621[source]▶

>>45151622 (OP) #

The author of the article also wrote a CLI parser library for Typescript, called Optique. I really appreciate them including a "When Optique makes sense" section in the docs. It would be great if more projects did that.

https://optique.dev/why#when-optique-makes-sense

120. dboon ◴[07 Sep 25 14:59 UTC] No.45158772{5}[source]▶

>>45157910 #

Definitely, and I know this sounds like ignoring the problem, but in my experience the best solution is to just not use the bleeding edge.

Write your code such that you can load it onto (for example) the oldest supported Ubuntu and compile cleanly and you’ll have virtually zero problems. Again, I know that if your goal is to truly ship something written in e.g. C++26 portably then it’s a huge pain. But as someone who writes plain C and very much enjoys it, I think it’s better to skip this class of problem.

replies(1): >>45160093 #

121. dboon ◴[07 Sep 25 15:00 UTC] No.45158783{3}[source]▶

>>45157203 #

Opencode is a great model agnostic alternative which does not require a separate runtime

replies(1): >>45161980 #

122. Lvl999Noob ◴[07 Sep 25 15:08 UTC] No.45158854{4}[source]▶

>>45155795 #

The difference, in my opinion, is that you received the cli args in the form

``` some_cli <some args> --some-option --no-some-option ```

Before parsing, the argument array contains both the flags to enable and disable the option. Validation would either throw an error or accept it as either enabled or disabled. But importantly, it wouldn't change the arguments. If the assumption is that the last option overwrites anything before it then the cli command is valid with the option disabled.

And now, correct behaviour relies on all the code using that option to always make the same assumption.

Parsing, on the other hand, would put create a new config where `option` is an enum - either enabled or disabled or not given. No confusion about multiple flags or anything. It provides a single view for the rest of the program of what the input config was.

Whether that parsing is done by a third party library or first party code, declaratively or imperatively, is besides the point.

123. nickdothutton ◴[07 Sep 25 15:14 UTC] No.45158909[source]▶

>>45151622 (OP) #

It’s been about 30 years but I seem to remember the compiler taking care of this for me (in Ada) with types.

124. kiliancs ◴[07 Sep 25 15:17 UTC] No.45158943[source]▶

>>45151622 (OP) #

Great project. Clear goal, well executed, very nice API (safe, terse, clear).

I use Effect CLI https://github.com/Effect-TS/effect/tree/main/packages/cli for the same reasons. It has the advantage of fitting within the ecosystem. For example, I can reuse existing schemas.

125. andreygrehov ◴[07 Sep 25 15:28 UTC] No.45159046{4}[source]▶

>>45155795 #

What is ValidatedData? A subset of the Data that is valid? This makes no sense to me. The way I see it is you use ‘validate’ when the format of the data you are validating is the exact same format you are gonna be working with right after, meaning the return type doesn’t matter. The return type implies transformation – a write operation per se, whereas validation is always a read operation only.

replies(1): >>45161636 #

126. ycombobreaker ◴[07 Sep 25 15:40 UTC] No.45159160{3}[source]▶

>>45158148 #

Shell scripts are a byproduct of the shell existing. Generations of programmers have cut their teeth in CLI environments. Anything that made shell scripts "no longer a thing" would necessarily destroy the interactive environment, and sounds like a ladder-pull to the curiosity of future generations.

127. baroninthetrees ◴[07 Sep 25 16:17 UTC] No.45159525[source]▶

>>45151622 (OP) #

I too got tired of dealing with cli arg parsing and am experimenting with passing a natural language description of the program and its args to a tiny LLM to sort out, offer suggestions (did you mean?), types conversions, etc. So far, it’s working great and given enough detail is deterministic.

128. ndsipa_pomu ◴[07 Sep 25 16:29 UTC] No.45159646[source]▶

>>45156696 #

> don't write CLI programs in languages that don't compile to native binaries. I don't want to have to drag around your runtime just to execute a command line tool.

Well that's confused me. I write a lot of scripts in BASH specifically to make it easy to move them to different architectures etc. and not require a custom runtime. Interpreted scripts also have the advantage that they're human readable/editable.

129. jappgar ◴[07 Sep 25 16:38 UTC] No.45159728{3}[source]▶

>>45158289 #

In python or typescript, the most popular languages in the world, it offers no runtime safety.

Even in languages like Haskell, "safety" is an illusion. You might create a NumberGreaterThanFive type with smart constructors but that doesn't stop another dev from exporting and abusing the plain constructor somewhere else.

For the most part it's fine to assume the names of types are accurate, but for safety critical operations it absolutely makes sense to revalidate inputs.

replies(1): >>45159788 #

130. yakshaving_jgt ◴[07 Sep 25 16:45 UTC] No.45159788{4}[source]▶

>>45159728 #

> that doesn't stop another dev from exporting and abusing the plain constructor somewhere else.

That seems like a pretty unfair constraint. Yes, you can deliberately circumvent safeguards and you can deliberately write bad code. That doesn't mean those language features are bad.

131. Lvl999Noob ◴[07 Sep 25 17:09 UTC] No.45160015{5}[source]▶

>>45157724 #

Why does "true" parsing have to error out on the very first problem? It is more than possible (though maybe not easy) to keep parsing and collecting errors as they appear. Zod, as the given example in the post, does it.

replies(1): >>45161342 #

132. Lvl999Noob ◴[07 Sep 25 17:15 UTC] No.45160080{5}[source]▶

>>45155874 #

Pure js without typescript also has "types". Typescript doesn't give you nominal types either. It's only structural. So when you say that you "know it's already been processed", you just have a mental type of "Parsed" vs "Raw". With a type system, it's like you have a partner dedicated to tracking that. But without that, it doesn't mean you aren't doing any parsing or type tracking of your own.

replies(1): >>45193575 #

133. delta_p_delta_x ◴[07 Sep 25 17:16 UTC] No.45160093{6}[source]▶

>>45158772 #

> I think it’s better to skip this class of problem.

I'll keep my templates, smart pointers, concepts, RAII, and now reflection, thanks. C and its macros are good for compile times but nothing much else. Programming in C feels like banging rocks together.

replies(1): >>45172465 #

134. lazide ◴[07 Sep 25 17:35 UTC] No.45160329{4}[source]▶

>>45156801 #

most java CLIs (well, non shitty ones), and most distributed java programs in general, package their own jvms in a hermetic environment. it’s just saner.

135. LtWorf ◴[07 Sep 25 17:38 UTC] No.45160365[source]▶

>>45156696 #

> Also - don't write CLI programs in languages that don't compile to native binaries. I don't want to have to drag around your runtime just to execute a command line tool.

Go programs compile to native executables, they're still rather slow to start, especially if you just want to do --help

136. MrJohz ◴[07 Sep 25 17:58 UTC] No.45160595{5}[source]▶

>>45157724 #

I don't know that I understand why parsing necessarily has to error out on the first mismatch. Good parsers will collect errors as they go along.

Zod does take in invalid state as input, but that is what a parser does. In this case, the parser is `any -> T` as opposed to `string -> T`, but that's still a parsing operation.

replies(1): >>45161698 #

137. 3836293648 ◴[07 Sep 25 18:48 UTC] No.45161037{4}[source]▶

>>45158276 #

Which only works on linux. No other OS allows static binaries, you always need to link to libc for syscalls.

replies(2): >>45161505 #>>45162302 #

138. jampekka ◴[07 Sep 25 19:02 UTC] No.45161166[source]▶

>>45156696 #

> Also - don't write CLI programs in languages that don't compile to native binaries. I don't want to have to drag around your runtime just to execute a command line tool.

And don't write programs with languages that depend on CMake and random tarballs to build and/or shared libraries to run.

I usually have a lot less issues with dragging a runtime than fighting with builds.

139. ◴[07 Sep 25 19:18 UTC] No.45161297{3}[source]▶

>>45157057 #

140. 1718627440 ◴[07 Sep 25 19:18 UTC] No.45161300{3}[source]▶

>>45157057 #

> The worst is compiling a C program with a compiler that uses a more recent libc than is installed on the installation host.

This is only a problem, when the program USES a symbol that was only introduced in the newer libc. In other words, when the program made a choice to deliberately need that newer symbol.

141. 1718627440 ◴[07 Sep 25 19:21 UTC] No.45161323{3}[source]▶

>>45155254 #

But when you parse all arguments first before throwing error messages, you can create much better error messages, since they can be more holistic. To do that you need to represent the invalid configuration as a type.

replies(2): >>45161894 #>>45163050 #

142. 1718627440 ◴[07 Sep 25 19:23 UTC] No.45161342{6}[source]▶

>>45160015 #

Because then it would need to represent invalid data in its output type.

143. craftkiller ◴[07 Sep 25 19:44 UTC] No.45161505{5}[source]▶

>>45161037 #

Also works on FreeBSD. FreeBSD maintains ABI compatibility within each major version (so 14.0 is compatible with 14.1, 14.2, and 14.3 but not 15.0): You also can install compatibility packages that make binaries compiled for older major versions run on newer major versions.

  $ pkg install git rust
  $ git clone https://github.com/BurntSushi/ripgrep.git
  $ cd ripgrep
  $ RUSTFLAGS='-C target-feature=+crt-static' cargo build --release
  
  $ ldd target/release/rg
  ldd: target/release/rg: not a dynamic ELF executable
  $ file target/release/rg
  target/release/rg: ELF 64-bit LSB executable, x86-64, version 1 (FreeBSD), statically linked, for FreeBSD 14.3, FreeBSD-style, with debug_info, not stripped

144. lock1 ◴[07 Sep 25 20:01 UTC] No.45161636{5}[source]▶

>>45159046 #

  > What is ValidatedData? A subset of the Data that is valid?

Usually, but not necessarily. `validate()` might add some additional information too, for example: `validationTime`.

More often than not, in a real case of applying algebraic data type & "Parse, don't validate", it's something like `Option<ValidatedData>` or `Result<ValidatedData,PossibleValidationError>`, borrowing Rust's names. `Option` & `Result` expand the possible return values that function can return to cover the possibility of failure in the validation process, but it's independent from possible values that `ValidatedData` itself can contain.

  > The way I see it is you use ‘validate’ when the format of the data you are validating is the exact same format you are gonna be working with right after, meaning the return type doesn’t matter.

The main point of "Parse, don't validate" is to distinguish between "machine-level data representation" vs "possible set of values" of a type and utilize this "possible set of values" property.

Your "the exact same format" point is correct; oftentimes, the underlying data representation of a type is exactly the same between pre- & post-validation. But more often than not "possible set of values" of `ValidatedData` is a subset of `Data`. These 2 different "possible set of values" are given their own names in the form of a type `Data` and `ValidatedData`.

This distinction is actually very handy because types can be checked automatically by the (nominal) type system. If you make the `ValidatedData` constructor private & the only way to produce is function `ValidatedData validate(Data)`, then in any part of the codebase, there's no way any `ValidatedData` instance is malformed (assuming `validate` doesn't have bugs).

Extra note: I forgot to mention the "Parse, don't validate" article implicitly implies a nominal type system, where 2 objects with equivalent "data representation" doesn't mean it has the same type. This differs from Typescript's structural type system, where as long as the "data representation" is the same, both object are considered to have the same type.

Typescript will happily accept something like this because of structural

  type T1 = { x: String };
  type T2 = { x: String };
  function f(T1): void { ... }
  const t2: T2 = { x: "foo" };
  f(t2);

While nominal type systems like Haskell or Java will reject such expressions

  class T1 { String x; }
  class T2 { String x; }
  void f(T1) { ... }
  // f(new T2()); // Compile error: type mismatch

Because of this, the idea of using type as a "possible set of values" probably felt unintuitive to Typescript folks, as everything is just stringly-typed and different type felt synonymous with different "underlying data representation" there.

You can simulate this "same structure, but different meaning" concept of nominal type system in Typescript with some hacky workaround with Symbol.

  > The return type implies transformation – a write operation per se, whereas validation is always a read operation only

Why does the return type need to imply transformation and why is "validation" here always read-only? No-op function will return the exact same value you give it (in other words, identity transformation), and Java & Javascript procedures never guarantee a read-only operation.

145. 12_throw_away ◴[07 Sep 25 20:10 UTC] No.45161698{6}[source]▶

>>45160595 #

Well, if you want to collect errors, then you need to have a way to store the transformed input in a form that allows you to check the invariants, which can be arbitrarily complex. So naturally there must be some intermediate representations that allow illegal states. And there must be functions that take these IRs that return either domain objects or lists of errors.

So, having used this thread to rubber-duck about how the principle of "parse-don't-validate" works with the principle of "provide good error messages", I'm arriving at these rules, which are really more about encapsulation than parsing:

1. Encapsulate both parsing and validation in a single function: `parse(RawInput) -> Result<ValidDomainObject,ListOfErrors>`

2. Ideally, `parse` is implemented by a robust parsing/validation library for the type of input that you're dealing with. It will create some intermediate representations that you need not concern yourself with.

3. If there isn't a good parser library for your use case, your implementation of `parse` will necessarily contain intermediate representations of potentially illegal state. This is both fine and unavoidable, just don't let them leak out of your parser.

146. 12_throw_away ◴[07 Sep 25 20:36 UTC] No.45161894{4}[source]▶

>>45161323 #

> To do that you need to represent the invalid configuration as a type

Right - and one thing that keeps coming up for me is that, if you want to maintain complex invariants, it's quite natural to express them in terms of the domain object itself (or maybe, ugh, a DTO with the same fields), rather than in terms of input constraints.

147. yunohn ◴[07 Sep 25 20:48 UTC] No.45161980{4}[source]▶

>>45158783 #

Opencode uses TS and Golang, it definitely needs a runtime for the TS part. CPU usage hovers around 100% for me on an MBP M3 Max.

replies(1): >>45174456 #

148. exe34 ◴[07 Sep 25 21:25 UTC] No.45162302{5}[source]▶

>>45161037 #

try cosmopolitan!

replies(1): >>45164575 #

149. geon ◴[07 Sep 25 23:06 UTC] No.45163035[source]▶

>>45156696 #

This seems like a really weird stance. Who are you to dictate what language people should use? Why CLIs in particular?

replies(1): >>45165616 #

150. geon ◴[07 Sep 25 23:10 UTC] No.45163050{4}[source]▶

>>45161323 #

Sure. Then you return that validated data structure from the parsing function and never touch the invalid data structure again. That's exactly what "Parse, don't validate" means.

151. AnimalMuppet ◴[07 Sep 25 23:13 UTC] No.45163063[source]▶

>>45151622 (OP) #

Well, they're dictating that if you want them to use it, do it this way. Some people want others to use the programs they write; for such people, the GP actually has been given the right to have some valid say in the matter.

Why CLIs in particular? Because they usually are smaller tools. For a big, important tool, you might be willing to jump through more hoops (installing the right runtime), but for a smaller, less important tool, it's just not worth it.

152. geon ◴[07 Sep 25 23:16 UTC] No.45163088{3}[source]▶

>>45154791 #

This might be a clearer phrasing: "Parse and validate ONCE AND FOR ALL, instead of sprinkling validation everywhere you need to access the data."

But I suppose it isn't as catchy.

153. geon ◴[07 Sep 25 23:23 UTC] No.45163124[source]▶

>>45151622 (OP) #

I just recently implemented my own parser combinator lib in typescript too. It was surprisingly simple in the end.

This function parses a number in 6502 asm. So `255` in dec or `$ff` in hex: https://github.com/geon/dumbasm/blob/main/src/parsers/parseN...

I looked at several typescript libraries but they all felt off. Writing my own at least ensured I know how it works.

154. 3836293648 ◴[08 Sep 25 04:14 UTC] No.45164575{6}[source]▶

>>45162302 #

Cosmopolitan isn't statically linked, it's just a hack that works on all systems, dynamically linking on windows and macos (I think with a statically linked custom dynamic linker?)

155. bschwindHN ◴[08 Sep 25 07:46 UTC] No.45165616{3}[source]▶

>>45163035 #

I'm just making an opinionated suggestion for the case when you're shipping a tool to end users and you don't want the tool to suck. Attaching a python or nodejs runtime to your tool is a quick way to make it suck for end users. It's laziness on the dev's part who didn't bother learning a better tool for the job.

156. einpoklum ◴[08 Sep 25 08:03 UTC] No.45165716{3}[source]▶

>>45157472 #

Think of it this way: Your code for quickly converting things into rich types - just run it on a map of argument-to-string value map rather than on a sequence of characters. It's still "quick": This is just like we current get an array-of-strings instead of a single-string command-line; it's initial domain-agnostic parsing, which can't even fail.

157. amterp ◴[08 Sep 25 09:51 UTC] No.45166356[source]▶

>>45151622 (OP) #

Very much agree with the article, this is one of the reasons why I wrote Rad [0], which people here might find interesting. The idea is you write CLI scripts with a declarative approach to script arguments, including all the constraints on them, including relational ones. So you don't write your own CLI validation - you declare the shape that args should take, let Rad check user input for you, and you can focus your script on the interesting stuff. For example

  args:
      username str           # Required string
      password str?          # Optional string
      token str?             # Optional auth token
      age int                # Required integer
      status str             # Required string
  
      username requires password     // If username is provided, password must also be provided
      token excludes password        // Token and password cannot be used together
      age range [18, 99]             // Inclusive range from 18 to 99
      status enum ["active", "inactive", "pending"]

Rad will handle all the validation for you, you can just write the rest of your script assuming the constraints you declared are met.

[0]: https://github.com/amterp/rad

158. dboon ◴[08 Sep 25 19:06 UTC] No.45172465{7}[source]▶

>>45160093 #

Agree to disagree, then! Templates, smart pointers, and RAII have cost me far, far more than they’ve paid me back. You should write whatever feels good to you.

replies(1): >>45174593 #

159. dboon ◴[08 Sep 25 21:46 UTC] No.45174456{5}[source]▶

>>45161980 #

I meant that it’s bundled with the binary such that you don’t need to make sure some random version of eg Node is available

160. delta_p_delta_x ◴[08 Sep 25 22:00 UTC] No.45174593{8}[source]▶

>>45172465 #

Happy to. If I may, I have an analogy:

C feels a little like survival mode in Minecraft; you have a set of very simple abstractions, a relatively simple language, with which one can build the world (and in many cases, we have).

C++ feels like a complex city builder, with lots of tools, designs, and paradigms available, but also allows one to screw up in bigger ways.

161. antonvs ◴[09 Sep 25 05:44 UTC] No.45177851{3}[source]▶

>>45154791 #

> Someone is doing that validation.

The difference is (a) where and how validation happens, and (b) the type of the final result.

A parser is a function producing structured values - values of some type, usually different from the input type. In contrast, a validator is a predicate that only checks constraints on existing values.

For example, a parser can parse an email address into a variable of type EmailAddress. If the parser succeeds at doing that, assuming you're using a language with a decent type system, you now have a variable which is statically guaranteed to be an email address - not a string which you have to trust has passed validation at some point in the past.

This is part of the "Make illegal states unrepresentable" approach which allows for static debugging - debugging your code at compile time. It's a very powerful way to produce reliable systems with robust, statically proven guarantees.

But as Alexis King (who coined the phrase "Parse, don't validate") wrote, "Unless you already know what type-driven design is, my catchy slogan probably doesn’t mean all that much to you."

162. bakkoting ◴[09 Sep 25 17:34 UTC] No.45185433[source]▶

>>45157404 #

This is the approach taken by node's built-in argument parser util.parseArgs.

163. hdjrudni ◴[10 Sep 25 05:12 UTC] No.45193575{6}[source]▶

>>45160080 #

I don't think that's what people are talking about when they say types. They're talking about TypeScript types, not mental models of object structure.

164. hdjrudni ◴[10 Sep 25 05:14 UTC] No.45193593{6}[source]▶

>>45156099 #

Don't get me wrong, I love TypeScript types. And if I didn't have TypeScript, I'd use jsdoc.

I'm just saying that TypeScript and jsdoc don't actually do any runtime enforcement. It's important that the library does that part, with or without types.

↑