Most active commenters
  • lexi-lambda(17)
  • crabbone(16)
  • PartiallyTyped(10)
  • jameshart(5)
  • ParetoOptimal(5)
  • dwohnitmok(5)
  • mrkeen(4)
  • epolanski(4)
  • jakelazaroff(4)
  • lmm(4)

Parse, don't validate (2019)

(lexi-lambda.github.io)
398 points declanhaigh | 224 comments | | HN request time: 2.273s | source | bottom
1. bruce343434 ◴[] No.35053912[source]
Note that this basically requires your language to have ergonomic support for sum types, immutable "data classes", pattern matching.

The point is to parse the input into a structure which always upholds the predicates you care about so you don't end up continuously defensively programming in ifs and asserts.

replies(12): >>35054046 #>>35054070 #>>35054386 #>>35054514 #>>35054901 #>>35054993 #>>35055124 #>>35055230 #>>35056047 #>>35057866 #>>35058185 #>>35059271 #
2. not_knuth ◴[] No.35053945[source]
Previously on HN:

2 years ago: https://news.ycombinator.com/item?id=27639890

3 years ago: https://news.ycombinator.com/item?id=21476261

replies(2): >>35055892 #>>35062866 #
3. artagnon ◴[] No.35054029[source]
The example described in this post is JSON parsing in Haskell, but I've implemented a complicated compiler transform that lifts loops to static control parts (SCOPs), in the past, in C++. Each inner function in the lift would switch on valid constructs, either returning a lifted integer set, or throwing an exception on match failure. Although exceptions have a non-trivial cost in C++, it was the cleanest design I could come up with at the time.
4. mtlynch ◴[] No.35054046[source]
I get a lot of value from this rule even without those language features.

I follow "Parse, Don't Validate" consistently in Go. For example, if I need to parse a JSON payload from an end-user for Foo, I define a struct called FooRequest, and I have exactly one function that creates a FooRequest instance, given a JSON stream.

Anywhere else in my application, if I have a FooRequest instance, I know that it's validated and well-formed because it had to have come from my FooRequest parsing function. I don't need sum types or any special language features beyond typing.

replies(1): >>35054157 #
5. jim-jim-jim ◴[] No.35054070[source]
I think we'll eventually come to regard `if` as we do `goto`.
replies(4): >>35054298 #>>35054351 #>>35054456 #>>35054814 #
6. kellysutton ◴[] No.35054083[source]
This post resonates with a lesson I’ve learned in my career so far: It is always easier to relax constraints than tighten them.
replies(1): >>35054715 #
7. tizzy ◴[] No.35054119[source]
I always favourite this when it comes up on HackerNews, a great principle to follow or at least have in mind all the time
8. jotaen ◴[] No.35054157{3}[source]
My main take-away is the same, I wonder though whether “parse, don’t validate” is the right term for it. To me, “parse, don’t validate” somehow suggests that you should do parsing instead of validation, but the real point for me is that I still validate (as before), plus I “capture”/preserve validation success by means of a type.
replies(8): >>35054350 #>>35054377 #>>35054626 #>>35054751 #>>35055151 #>>35055232 #>>35055382 #>>35056979 #
9. conaclos ◴[] No.35054228[source]
While theoretically I accepted with all the points, in practice it is sometimes too bloated to add so many types that are barely distinct. It is sometimes better to trade safety for simplicity.

The trade-off is always hard to make. For instance: should I introduce a branded type for unsigned 32bit integer in TypeScript?

   type u32 = number & { [U32_BRAND]: never }
   const u32 = (n: number): u32 => (n >>> 0) as u32
And then to make it hard any use of this type:

   declare let x, y: u32
   y = u32(x + 1)
replies(1): >>35054364 #
10. leetrout ◴[] No.35054298{3}[source]
Using pattern matching instead or something else?
replies(3): >>35054371 #>>35054472 #>>35063455 #
11. qsort ◴[] No.35054350{4}[source]
It's in the same sense of "whitelist, don't blacklist", or "by the love of god it's 2023, do not escape SQL".

Don't define reasons why the input is invalid, instead have a target struct/object, and parse the input into that object.

replies(1): >>35055225 #
12. quchen ◴[] No.35054351{3}[source]
If is semantically the only way to deconstruct a Boolean in any language, so as long as you have bools, you’re going to have `if`. Sure you can give if different syntax and write it with match/case/?:/whatever, but that’s not what we did to goto: introducing different language constructs to capture common useful use cases like try/catch, loops, and else-less ifs.
replies(3): >>35054547 #>>35054918 #>>35055288 #
13. Hackbraten ◴[] No.35054364[source]
I just learned that there’s an open issue [0], apparently for introducing a similar feature.

[0]: https://github.com/microsoft/TypeScript/issues/43505

14. Joeri ◴[] No.35054371{4}[source]
By using reactive programming techniques the program can be approached as a set of data streams mapping input to output, and conditional behavior becomes the application of different filters and combiners on the streams. This dovetails nicely with functional programming, which allows generic expression and reuse of those stream operations.
15. lkitching ◴[] No.35054377{4}[source]
The post is suggesting that parsing and validation are different things, since the output of a parser captures the properties being checked in the type, and validation does not. Downstream consumers of validated input cannot rely on the properties that were validated since the representation type doesn't encode them e.g. the non-emptiness of a list.
16. ocharles ◴[] No.35054386[source]
This isn't strictly true, an alternative is to have a language with enough encapsulation that you can parse into something that can only be observed as correct. The underlying parsing doesn't have to parse into sum types, provided your observation functions always preserve the parsed invariants.
17. oslac ◴[] No.35054390[source]
One of the best articles about programming tbh.
18. conaclos ◴[] No.35054432[source]
The same author wrote a follow-up article [0] "Names are not type safety".

[0] https://lexi-lambda.github.io/blog/2020/11/01/names-are-not-...

19. ckdot2 ◴[] No.35054435[source]
Please, don't write your own JSON parser/validator. There's JSON Schema https://json-schema.org which has implementations in most languages. You can valiate your JSON by a given, standardized JSON schema file - and you're basically done. After the validation, it's probably good practise to map the JSON to some DTO and may do some further validation which doesn't check the structure of the data but it's meaning.
replies(2): >>35054491 #>>35055093 #
20. raincole ◴[] No.35054456{3}[source]
I don't know about this. We all have seen this kind of code:

    if(!needToDoTheThing()) return;
    
    DoTheThing();

We could have written it this way:

    if(needToDoTheThing()) {
        DoTheThing();
    }
    else {
        return;
    }
The later is closer to how pattern match looks like. But in my experience, the majority of programmers prefer early return. I regularly see people "refactor" if-else to if-early-return, but I've never seen the opposite.
replies(4): >>35054651 #>>35054833 #>>35065147 #>>35065313 #
21. oslac ◴[] No.35054472{4}[source]
Not a perfect example, but this can be seen (pattern match replacing if) with Kotlin's when.
22. mirekrusin ◴[] No.35054491[source]
Json schema doesn't have relation with static type system, ie. in typescript it's much better to use composable, functional combinators at i/o boundaries only and don't do any extra checks anywhere where type system provides guarantees.
replies(2): >>35054548 #>>35054555 #
23. crabbone ◴[] No.35054514[source]
It's not just about these limitations.

In order to be useful, type systems need to be simple, but there's no such restrictions on rules that govern our expectations of data correctness.

OP is delusional if they think that their approach can be made practical. I mean, what if the expectation from the data that an value is a prime number? -- How are they going to encode this in their type systems? And this is just a trivial example.

There are plenty of useful constraints we routinely expect in message exchanges that aren't possible to implement using even very elaborate type systems. For example, if we want to ensure that all ids in XML nodes are unique. Or that the last digit of SSN is a checksum of the previous digits using some complex formula. I mean, every Web developer worth their salt knows that regular expressions are a bad idea for testing email addresses (which would be an example of parsing), and it's really preferable to validate emails by calling a number of predicates on them.

And, of course, these aren't the only examples: password validation (the annoying part that asks for capital letter, digit, special character? -- I want to see the author implement a parser to parse possible inputs to password field, while also giving helpful error messages s.a. "you forgot to use a digit"). Even though I don't doubt it's possible to do that, the resulting code would be an abomination compared to the code that does the usual stuff, i.e. just checks if a character is in a set of characters.

replies(10): >>35054557 #>>35054562 #>>35054640 #>>35054916 #>>35054920 #>>35055046 #>>35055734 #>>35055902 #>>35056302 #>>35057473 #
24. palotasb ◴[] No.35054547{4}[source]
To nitpick and to show a cool lambda calculus thing, you can deconstruct booleans if you define booleans and if statements the following way, using only pure functions.

  def TRUE(a, b):
    return a

  def FALSE(a, b):
    return b

  def IF(cond, a, b):
    return cond(a, b)

  assert IF(TRUE, 1, 2) == 1
  assert IF(FALSE, 1, 2) == 2
This gives you the conditional statement in most languages ("cond ? a : b" or "a if cond else b").
replies(2): >>35054736 #>>35054747 #
25. ckdot2 ◴[] No.35054548{3}[source]
I think it's good enough. Besides JSON Schema being a standard instead of custom solution, you also get nice error messages in case there's a validation issue. If your JSON schema file is properly defined it should be safe enough to just map your JSON into some static type DTO afterwards and trust your data and it's types to be valid. In JSON Schema you can validate for strings, numbers, integers, and custom objects. It's quite powerful and - personally - I wouldn't want to implement that kind of stuff on my own.
replies(2): >>35054745 #>>35058179 #
26. bertrand-caron ◴[] No.35054555{3}[source]
For anyone using both TypeScript and JSON schemas, but wanting to use TypeScript as the source of truth, I highly recommend the following library: [ts-json-schema-generator](https://github.com/YousefED/typescript-json-schema).

It does exactly what it says in the box: turns your TypeScript `types` / `interface` into machine-readable JSON schemas.

The library has a few open issues (does not deal well with some edge cases of composing Omit<> on sum types, and does not support dynamic (const) keys), but compared to manually writing JSON schemas, it's been amazing!

EDIT: I should add that the library supports adding further type constraints that are supported by JSON Schema but not by TS by using JSDoc (for instance, pattern matching on strings, ranges on numbers, etc.).

replies(2): >>35054767 #>>35054787 #
27. flupe ◴[] No.35054557{3}[source]
Both your examples (is my number prime, are my XML nodes unique) are easily expressed in a dependently-typed language.

Dependent type checkers may be hard to implement, but the typing rules are fairly simple, and people have been using this correct by construction philosophy using dependently-typed languages for a while now.

There's nothing delusional about that.

replies(1): >>35056718 #
28. ollysb ◴[] No.35054562{3}[source]
You can use opaque types to encode constraints that the type system isn't able to express. That way you can have factory functions that apply any logic that's required before allowing construction of the opaque type. Now whenever that opaque type is referred to there's a guarantee that the data it contains satisfies your desired constraint.
replies(1): >>35056632 #
29. agumonkey ◴[] No.35054614[source]
I don't know who else sees everything as language problems. Even REST seems like a concrete syntax over HTTP, mostly relational, resources
30. autophagian ◴[] No.35054626{4}[source]
I suppose that every successful parse also has an implicit validation step in it. Like you said, for me the principle is more about embedding that validation step into an actual type rather than crossing my fingers and hoping that whatever's coming into my function is what I expect it to be.
31. ocharles ◴[] No.35054640{3}[source]
> OP is delusional if they think that their approach can be made practical. I mean, what if the expectation from the data that an value is a prime number? -- How are they going to encode this in their type systems? And this is just a trivial example.

People get too caught up in thinking that the type _has_ to express intricate properties, it doesn't. How am I going to express the expectation that something is prime? With the following closed API:

  module Prime where

  data PrimeNumber

  parsePrime :: Int -> Maybe PrimeNumber
  toInt :: PrimeNumber -> Int
Now the problem is that _leaving_ this API forgets information. Whether or not that is a problem is a different question, and very dependent on the context.

The same applies to your comment about passwords. One can quite easily create a closed module that encapsulates a ValidPassword type that simply performs runtime character tests on a string.

I want to stress that this approach is making a trade off (as I earlier mentioned about leaving the API forgetting information, forcing you to re-parse). However, this puts this design somewhere in the middle of the spectrum. At one extreme end we have primitive obsession and shotgun parsing everywhere, with this we push the parsing into a sane place and try and hold on to these parsed values as long as possible, and at the extreme end we need dependent types or sophisticated encodings where the value carries a lot more information (and here we get towards propositions as types)

replies(2): >>35056482 #>>35064605 #
32. RHSeeger ◴[] No.35054651{4}[source]
I prefer the former. It separates the pre-conditions from the algorithm/logic, using gate clauses. I find this makes it easier to reason about the algorithm.
replies(1): >>35055047 #
33. aeonik ◴[] No.35054715[source]
What you say makes theoretical sense, but many bank systems still enforce weak password constraints because someone enforced those weak constraints 30 years ago in mainframe code that nobody seems to want to update.
34. quchen ◴[] No.35054736{5}[source]
Church encoding is pretty cool, yes! It encodes Booleans such that »if == id«. Likewise, natural numbers are essentially counting for-loops: 3 f x = f (f (f x)), so »for == id«.

I had to work with this for a while because I wanted to implement Hello World in Javascript. https://github.com/quchen/lambda-ski/blob/master/helloworld/...

35. mirekrusin ◴[] No.35054745{4}[source]
You don't need to implement it on your own, you can use library.

Nice error messages exist there as well.

If you're casting untyped results, you can change one side and not the other and find out about this problem when in production. Or simply any mistake will get unnoticed.

Using typescript first library allows you to do much more - supports opaque types, custom constructors and any imaginable validation that can't be expressed in json schema.

36. Joker_vD ◴[] No.35054747{5}[source]
You can do this same trick with any algebraic type, honestly (modulo lazyness):

    ## type Silly = Foo | Bar Int | Qux String Silly

    ## Constructors

    def Foo(onFoo, onBar, onQux):
        return onFoo()

    def Bar(arg0):
        return lambda onFoo, onBar, onQux: onBar(arg0)

    def Qux(arg0, arg1):
        return lambda onFoo, onBar, onQux: onQux(arg0, arg1)

    ## Values of Silly type are Foo, Bar(x) and Qux(x, y)

    ## Destructor

    def match_Silly(silly, onFoo, onBar, onQux):
        return silly(onFoo, onBar, onQux)
You can make a whole language on top of that if you don't mind effectively disabling your CPU's branch predictor.
37. masklinn ◴[] No.35054751{4}[source]
TFA does explain what they mean:

> in my mind, the difference between validation and parsing lies almost entirely in how information is preserved

“parse don’t validate” is a pithy and easy to remember maxim for this preservation.

Because validation is implicitly necessary for parsing to a representation which captures your invariants anyway, by banning validation as a separate concept you ensure sole validation doesn’t get reintroduced, because any validation step outside of a wider parsing process is considered incorrect.

38. mirekrusin ◴[] No.35054767{4}[source]
Adding extra transpilation step doesn't sound like a great solution.

It also doesn't support inlined assertions, referring to existing classes, custom validations, opaque types etc.

39. kristiandupont ◴[] No.35054787{4}[source]
I generally prefer Zod but in cases where I for one reason or another have to rely on JSON Schema, I use this package: https://www.npmjs.com/package/as-typed which infers TS types directly from a such. No extra build steps required. I then use AJV for runtime validation.
40. bob1029 ◴[] No.35054805[source]
This is something I've become fairly passionate about lately.

Any time I see some regex, I start asking probing questions about the nature of the underlying abstraction.

Being able to deterministically convert something into an AST is the ultimate test of that thing's stability at any scale.

replies(1): >>35054926 #
41. randomdata ◴[] No.35054814{3}[source]
Bridled, so that it doesn’t suffer the problems Dijkstra spoke of? Wouldn’t you say they both already are in modern languages?
42. pjc50 ◴[] No.35054833{4}[source]
It keeps the code closer to the left. It also keeps it conceptually simpler if you can discard a bunch of "obvious" cases early on.
replies(1): >>35055215 #
43. arminsergiony ◴[] No.35054835[source]
I believe that JSON Schema is a great solution because it is a standardized format and provides helpful error messages for validation issues. If the schema file is well-defined, it should be safe to map the JSON data to a static type DTO and trust that the data types are valid. JSON Schema's ability to validate strings, numbers, integers, and custom objects makes it a powerful tool, and I personally wouldn't want to attempt to implement something similar on my own.
44. Thaxll ◴[] No.35054901[source]
You're describing deserialization in a strong typing language, sometimes it's not enough, ok your email went to an empty string which is useless.
replies(2): >>35054943 #>>35055012 #
45. PartiallyTyped ◴[] No.35054916{3}[source]
> I mean, what if the expectation from the data that an value is a prime number? -- How are they going to encode this in their type systems? And this is just a trivial example.

In TypeScript we can define

    type Prime = number

    function isPrime(value: number) value is Prime {
        // run sieve
    }
From here, you may have e.g.

    function foo(value: Prime, ...) {

    }
And it will be typed checked.

    function fooOrFail(v: number) {
        if (isPrime(v))
            foo(v)
        else 
            throw new TypeError()
    }
replies(2): >>35055148 #>>35056821 #
46. chriswarbo ◴[] No.35054918{4}[source]
I agree; although there's a related problem of "boolean blindness" https://existentialtype.wordpress.com/2011/03/15/boolean-bli...

I'd summarise boolean blindness as: implicit (often unsafe) coupling/dependencies of method results; which could instead be explicit data dependencies. That article's example is 'plus x y = if x=Z then y else S(plus (pred x) y)', which uses an unsafe 'pred' call that crashes when x is 'Z'. It avoids the crash by branching on an 'x=Z' comparison. The alternative is to pattern-match on x, to get 'Z' or 'S x2'; hence avoiding the need for 'pred'.

Another alternative is to have 'pred' return 'Maybe Nat'; although that's less useful when we have more constructors and more data (e.g. the 'NonEmptyList' in this "parse, don't validate" article!)

47. thanatropism ◴[] No.35054920{3}[source]
The spirit of "parse, don't validate" is -- do all those things (SSN checksums, whatever) at the point where data enters the system, not at the point where it's used.
replies(1): >>35056790 #
48. redbar0n ◴[] No.35054922[source]
Several earlier HN threads about this article: https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...
49. thanatropism ◴[] No.35054926[source]
That's also kind of an IQ test.
50. Kinrany ◴[] No.35054943{3}[source]
You can have a ValidEmail type that performs all the checks on construction.
replies(1): >>35055657 #
51. PartiallyTyped ◴[] No.35054993[source]
I don't think sum types is necessary. TypeScript's gradual typing should suffice to capture this.
replies(1): >>35060447 #
52. nicky0 ◴[] No.35055012{3}[source]
With Zod: const info = z.object({ name: z.string().min(1), email: z.string().email() }
replies(1): >>35058737 #
53. jameshart ◴[] No.35055031[source]
This post absolutely captures an essential truth of good programming.

Unfortunately, it conceals it behind some examples that - while they do a good job of illustrating the generality of its applicability - don’t show as well how to use this in your own code.

Most developers are not writing their own head method on the list primitive - they are trying to build a type that encapsulates a meaningful entity in their own domain. And, let’s be honest, most developers are also not using Haskell.

As a result I have not found this article a good one to share with junior developers to help them understand how to design types to capture the notion of validity, and to replace validation with narrowing type conversions (which amount to ‘parsing’ when the original type is something very loose like a string, a JSON blob, or a dictionary).

Even though absolutely those practices follow from what is described here.

Does anyone know of a good resource that better anchors these concepts in practical examples?

replies(3): >>35056114 #>>35058281 #>>35059886 #
54. cjfd ◴[] No.35055046{3}[source]

  class Prime
  {
  public:
     Prime(int p): p(p)
     {
        if (!is_prime(p))
           throw std::runtime_error("Number was not a prime!");
     }

     int get_value() const
     {
        return p;
     }

   private:
      int p;
   }
replies(2): >>35056161 #>>35056800 #
55. Timon3 ◴[] No.35055047{5}[source]
It's much nicer, especially since it keeps the complexity down.

If you nest if/else, you'll quickly approach a point where you have to keep a complex logic tree in your head to determine which states the system could be in inside of any given branch. If you use guard clauses and return early, you'll keep this complexity down to a minimum, since the list of possible states changes linearly with your code instead of exponentially.

I know not everybody likes it, but I think this makes cyclomatic complexity an extremely valuable metric for measuring "ease-of-reading".

56. chriswarbo ◴[] No.35055093[source]
They're not saying you should write your own JSON parser/validator. They're saying that your existing parsing/validation/checking logic (using whatever libraries, standards, etc.) should not have a type signature like this:

  checkAgainstMySchema: JSON -> Boolean
Or this:

  checkedAgainstMySchema: JSON -> JSON
Instead, it's better to use a type signature like this;

  checkAgainstMySchema: JSON -> Either Error MyJSON
(Where MyJSON is some type which wraps-up your data; which could be the raw JSON, or perhaps some domain objects, or whatever)

The reason this is better, is that it's required for your program to work: if your processing functions take a 'MyJSON' as argument, then (a) your program must call the 'checkAgainstMySchema' function; and (b) you can only run your data processing in the successful branch (since that's the only way to get the 'MyJSON' argument you need).

In contrast, the functions which return 'Boolean' and 'JSON' are not required; in the sense that, we could completely forget to do any validation, and still end up with a runnable program. That's dangerous!

replies(1): >>35095875 #
57. jameshart ◴[] No.35055124[source]
Those aren’t exactly rare features these days though. Beyond the functional world, TypeScript, C# and Java all have them to some extent, so it’s basically conquered all the mainstream object oriented languages. There’s even a proposal to add pattern matching to C++.
58. teo_zero ◴[] No.35055148{4}[source]
If foo() happily accepts a number instead of a Prime, then this is not robust enough: you can always forget the check!

It's the compiler that should warn you about a wrong type.

replies(1): >>35056440 #
59. friendzis ◴[] No.35055151{4}[source]
> suggests that you should do parsing instead of validation

Kind of yes, but this discussion is much dependent on definitions of `parse` and `validate`, which the article does not explicitly elaborate on. The chapter "The power of parsing" captures this difference implicitly "validateNonEmpty always returns (), the type that contains no information". Validation, in the context of all of this, can be defined as "checking conformance to a set of rules" while parsing is mostly synonymous with deserialization.

In most practical application you explicitly do not want to only validate inputs as in you have no need to perform any computation on invalid input anyway. Sometimes you explicitly want to analyze invalid inputs, maybe try and recover some information or do some other magic. Sure then, go and validate input and do that magic on invalid input. In most cases you want to simply reject invalid inputs.

However, when you think about it, that is what parsing does. Validation happens during parsing implicitly: parser will either return a valid object or throw an error, but parsing has an added benefit that the end result is a datum of a known datatype. Of course it only really works in statically typed languages.

The thing is that it is rather easy to conflate the two. Take for example the following JSON `{"foo": "bar", "baz": 3}`. A "parser" can return 1) a list of `(String, String, String)` 3-tuples for (type, key, value) that downstream has to process again 2) full blown `FoobarizerParams` object or something in between.

60. roenxi ◴[] No.35055165[source]

   parseNonEmpty [] = throwIO $ userError "list cannot be empty"
How would that interact with a scenario where we want a specific error message if a specific list is empty? Eg, "you want to build the list using listBuilder()". Making illegal states unrepresentable is good advice but I don't think that escapes the value of good validation.

It is a mistake to do ad-hoc validation. But it makes a lot of sense to have a validation phase, a parse phase then an execution phase when dealing with untrusted data. The validation phase gives context-aware feedback, the parse phase catches what is left and then execution happens.

A type system doesn't seem like a good defence against end user error. The error messages in practice are mystic. I think the complaint here is if people are trying to implement a type system using ad-hoc validation which is a bad idea.

replies(2): >>35055315 #>>35059621 #
61. kybernetikos ◴[] No.35055198[source]
This is obviously good advice almost all of the time.

However, I have had to deal occasionally with http libraries that tried to parse everything and would not give you access to anything that they could not parse. This was incredibly frustrating for corner cases that the library authors hadn't considered.

If you are the one who is going to take action on the data, parse don't validate is the correct approach. If you are writing a library that deals with data that it doesn't fully understand, and you're handing that data to someone else to take action with, then it may not always be the right approach.

replies(2): >>35056991 #>>35059019 #
62. jakelazaroff ◴[] No.35055215{5}[source]
Yup, this is my exact rationale for preferring this too. Branches are a significant source of complexity and early returns are one way to tame it — have the “meat” of the function deal with as few invariants as possible.
63. blincoln ◴[] No.35055225{5}[source]
I like this explanation and approach, but how does it solve the first problem described in the article - the case where there's an array being processed that might be empty?

There are plenty of cases in real-world code where an array that's part of a struct or object may or may not contain any elements. If you're just parsing input into that, it seems like you'd either still end up doing an equivalent of checking whether the array is empty or not everywhere the array might be used later, even if that check is looking at an "array has elements" type flag in the struct/object, and so you're still maintaining a description of ways that the input may be invalid. But I'm not a world-class programmer, so maybe I'm missing something. Maybe you mean something like for branches of the code that require a non-empty array, you have a second struct/object and parser that's more strict and errors out if the array is empty?

replies(4): >>35055647 #>>35057974 #>>35058114 #>>35063412 #
64. armchairhacker ◴[] No.35055230[source]
I wish more languages had some equivalent of records, tagged unions, and pattern matching.

Don't have to be 100% immutable or perfect ADTs: see Rust, Swift, Kotlin. Even TypeScript can do this, albeit it's uglier with untagged unions and flow typing instead of pattern matching.

65. ghusbands ◴[] No.35055232{4}[source]
It's not just about the validation success, but about having only one bit of code consuming the looser input and producing a definitely-correct output. If you simply validate and preserve success, you still later need to produce the output you need, and it's hard to be sure that the earlier validation and the later parsing actually agree on what is valid.

If you're talking about consuming the looser input and producing a definitely-correct output, already, then you're talking about parsing, not validation. Most validation occurs naturally during parsing.

66. jmull ◴[] No.35055254[source]
I get the point, but I wonder at why people find this particular article compelling. To me it's weak...

It's built on a particular technical distinction between paring and validating that (1) is not all that commonly understood or consistently accepted and (2) not actually explicitly stated in the article!

(validation: check data assumptions, fail of not met; parse: check data assumptions, fail if not met, and on success return data as a new type reflecting the additional constraints of the data, which can therefore be checked at compile time. Notice parsing includes validation, which makes the title of the article quite poor.)

That's important to know because the distinction is only meaningful in the context of certain language features, which may or may not apply.

Also, this is not great general advice:

> Push the burden of proof upward as far as possible, but no further

For one, it's a mostly meaningless, since it really just says put the burden of proof in the right place. But it implies that upward is preferable. You really want to push it upward if it's a high-level concern, and downward if it's a low-level concern. E.g., suppose you're working on an app or service that accesses the database, so the database is lower-level. You'll want to push your database-specific type transformations closer to the code that accesses the database.

Honestly, I find this whole thing kind of muddled.

(Also, in my experience, the fundamental limit here isn't on validation strategies, but the human ability to break down a problem and logically organize the solution. You can just as easily end up with an unmaintainable mess of spaghetti types as with any other useful abstraction.

replies(6): >>35055366 #>>35055866 #>>35055895 #>>35056075 #>>35057758 #>>35061557 #
67. jakelazaroff ◴[] No.35055288{4}[source]
There are concepts like filtering that let you operate on booleans without branching:

   const published = posts.filter(post => !post.draft);
68. jameshart ◴[] No.35055315[source]
When you’re building a ‘parser’ (in this broad, type-narrowing sense) to handle user-supplied data, the result type really needs to be a rich mix of

- successfully parsed data objects

- error objects

- warning objects

That way your consumers can themselves decide what to do in the face of errors and warnings.

(Of course one ugly old fashioned way to add optional ‘error’ types to your return signature is checked exceptions, but we don’t talk about that model any more.)

69. jakelazaroff ◴[] No.35055366[source]
> You really want to push it upward if it's a high-level concern, and downward if it's a low-level concern. E.g., suppose you're working on an app or service that accesses the database, so the database is lower-level. You'll want to push your database-specific type transformations closer to the code that accesses the database.

IMO, database code is at exactly the same level of concern as network code or filesystem code. By “upward”, she means push parsing to the boundaries of your program — as close to the point of ingress as possible.

replies(1): >>35062429 #
70. b0afc375b5 ◴[] No.35055382{4}[source]
How about "Parsing IS validation"?
71. strgcmc ◴[] No.35055647{6}[source]
Remember, the author of the article constructed a scenario where, it was expected that the "main" function ends up treating an empty "CONFIG_DIRS" input as an uncatchable IOError; in other words, an empty array was invalid/not-allowed, per the rules of this program. Depending on the context in which you are operating, you may or may not have similar rules or requirements to follow.

Empty lists are actually generally not a big deal - they are just lists of size 0, and they generally follow all the same things you can do with non-empty lists. The fact that a "head" function throws an error on an empty list, is really just a specific form of the more general observation that: any array would throw an index-out-of-bounds exception when given an index that's... out of of bounds. So any time you are dealing with arrays, you probably need to think about, "what happens if I try to index something that's out of bounds? is that possible?"

In this particular contrived example, all that mattered was the head of the array. But what if you wanted to pick out the 3rd argument in a list of command line arguments, and instead the user only gave you 2 inputs? If 3 arguments are required, then throw an IOError as early as possible after failing to parse out 3 arguments; but once you pass the point of parsing the input into a valid object/struct/whatever, from that point forward you no longer care about checking whether the 3rd input is empty or not.

So again, it depends on your scenario. Actually the more interesting variant of this issue (in OO languages at least) is probably handling nulls, as empty lists are valid lists, but nulls are not lists, and requires some different logic usually (and hence why NullPointerExceptions aka NPEs are such a common failure mode).

replies(1): >>35059448 #
72. Thaxll ◴[] No.35055657{4}[source]
OP said to not validate.
replies(3): >>35055815 #>>35056190 #>>35060605 #
73. piaste ◴[] No.35055734{3}[source]
Password validation is a degenerate case of parsing, where your parsed type does not contain any more information than your unparsed type - both are just opaque strings.

(In fact you could use an invalid password just fine: unless you're doing something really weird, your code would not misbehave because it's too short or missing digits and symbols. It's only due to security reasons that you choose to reject that string.)

But that doesn't mean that `string -> Password` isn't parsing! As long as you're outputting distinct types for ValidPassword and InvalidPassword, you are still following the advice of this article, because you can make all your internal code use ValidPassword as a type and you will not need to ever check for password validity again.*

Compare that to e.g. adding a { IsValid = true } field to the object, which would require you to defensively sprinkle `if (user.Password.IsValid)` every time you try to actually use the password field.

* One weakness arising from the fact that this is degenerate parsing, i.e. ValidPassword is just a string, is that a very stubborn fool could build a ValidPassword from any arbitrary string instead of using the proper parse function. Depending on your language, this can be prevented by e.g. hiding the constructor so that only parsePassword has access to it.

replies(1): >>35056585 #
74. lkitching ◴[] No.35055815{5}[source]
The OP is contrasting between a 'validation' function with type e.g.

    validateEmail :: String -> IO ()
and a 'parsing' function

    validateEmail :: String -> Either EmailError ValidEmail
The property encoded by the ValidEmail type is available throughout the rest of the program, which is not the case if you only validate.
75. naasking ◴[] No.35055866[source]
> It's built on a particular technical distinction between paring and validating that (1) is not all that commonly understood or consistently accepted and (2) not actually explicitly stated in the article!

If this isn't clear to you, ask yourself why programming languages are parsed and not merely validated. Validation is a subset of parsing, so clearly there's something important added.

76. Pulz ◴[] No.35055892[source]
It's at least once per month.
77. dkarl ◴[] No.35055895[source]
I figured from the title I'd get a better explanation in the comments, and I was right, but I think the article is not nearly as bad as the clickbaity title suggests. It's a decent introduction to how to use types to simplify code, and the basic idea that your types should reflect what you know about the data is extremely powerful. If you go to the trouble of checking that your data meets some constraints, you should be able to represent it with a more constrained type afterwards, and that is the essence of parsing. It all makes sense! Even the title makes sense, as a quick way to reference and remember the idea after you've learned it.

But, yeah, the clickbait title put me off, and you're right that the terminology is unhelpful, since the distinction between parsing and validation isn't consistently made, especially in practical work. Virtually all of the "validation" code I've seen in statically typed languages, in the codebases I've worked in, would be "parsing" by this definition.

78. PhilipRoman ◴[] No.35055902{3}[source]
I think your comment was unfairly downvoted without objective reasons. This is a real issue with advanced type systems and the current solutions are not very good (although they can be practical in some cases) - you can either automatically decorate constructors with assertion code (slow) or trust external input (unsafe, something like __builtin_unreachable in C). And after you're done with that, good luck getting a deterministic and fast type checker which can verify proofs (which you have to write yourself) about arbitrary theorems in your program. Yes, I'm aware there exist languages that can do this to a degree but there is a good reason why they aren't used in mainstream software.

I genuinely wonder how one would write a proof in something like Agda, that

    parseJson("{foo:"+encodeJson(someObject)+"}") 
always succeeds
replies(1): >>35060965 #
79. Octokiddie ◴[] No.35055969[source]
I like how the author boils the idea down into a simple comparison between two alternative approaches to a simple task: getting the first element of a list. Two alternatives are presented: parseNonEmpty and validateNonEmpty. From the article:

> The difference lies entirely in the return type: validateNonEmpty always returns (), the type that contains no information, but parseNonEmpty returns NonEmpty a, a refinement of the input type that preserves the knowledge gained in the type system. Both of these functions check the same thing, but parseNonEmpty gives the caller access to the information it learned, while validateNonEmpty just throws it away.

This might not seem like much of a distinction, but it has far-reaching implications downstream:

> These two functions elegantly illustrate two different perspectives on the role of a static type system: validateNonEmpty obeys the typechecker well enough, but only parseNonEmpty takes full advantage of it. If you see why parseNonEmpty is preferable, you understand what I mean by the mantra “parse, don’t validate.”

parseNonEmpty is better because after a caller gets a NonEmpty it never has to check the boundary condition of empty again. The first element will always be available, and this is enforced by the compiler. Not only that, but functions the caller later calls never need to worry about the boundary condition, either.

The entire concern over the first element of an empty list (and handling the runtime errors that result from failure to meet the boundary condition) disappear as a developer concern.

replies(3): >>35056624 #>>35056955 #>>35058253 #
80. swsieber ◴[] No.35056047[source]
> The point is to parse the input into a structure which always upholds the predicates you care about so you don't end up continuously defensively programming in ifs and asserts.

While the article is titled "parse don't validate" I like it's first point of make illegal states unrepresentable much better.

81. mrkeen ◴[] No.35056075[source]
> (2) not actually explicitly stated in the article!

    the difference between validation and parsing lies almost entirely in how information is preserved. Consider the following pair of functions:

    validateNonEmpty :: [a] -> IO ()

    parseNonEmpty :: [a] -> IO (NonEmpty a)

    Both of these functions check the same thing, but parseNonEmpty gives the caller access to the information it learned, while validateNonEmpty just throws it away.
replies(1): >>35056504 #
82. asimpletune ◴[] No.35056114[source]
I think it's really hard to learn from reading unfortunately. It's one of those things where if you get it, you get it, but it kind of takes personal experience to fully grok it. I guess because there are a lot of subtle differences.
83. asimpletune ◴[] No.35056161{4}[source]
Just wanted to add that in some languages you could have a makePrime function that takes an int and returns a maybe[Prime]. If you don't make the constructor public this works perfect, as there is essentially no way to get a Prime without going through the pathways that library author relies upon. This is a pattern that's used in Scala a lot anyways.
84. mrkeen ◴[] No.35056190{5}[source]
That's fine.

Validation would be:

   Email email = new Email(anyString);
   email.validate();
Parsing (in OP's context) would be:

   Either<Error, ValidEmail> eEmail = Email.from(anyString);
85. mrkeen ◴[] No.35056302{3}[source]
> password validation (the annoying part that asks for capital letter, digit, special character? -- I want to see the author implement a parser to parse possible inputs to password field, while also giving helpful error messages s.a. "you forgot to use a digit").

This is what Applicative Functors were born to do. Here's a good article on it: https://www.baeldung.com/vavr-validation-api

Check the types:

    public Validation<Seq<String>, User> validateUser(...)
Even though it's called "validation", it's still the approach the OP recommends.

It reads as "If you have a Seq of Strings, you might be able to construct a User, or get back validation errors instead".

Contrast this with the wrong way of doing things:

    User user = new User(seqOfStrings);
    user.validate();
replies(1): >>35056528 #
86. harveywi ◴[] No.35056391[source]
Based on Conal Elliott's new formulation of two dual parsing implementations [1] using inductive regular expressions and coinductive tries, where the former corresponds to symbolic differentiation and the latter corresponds to automatic differentation, the dual statment of "parse, don't validate" might be: "Do or do not. There is no trie."

[1] http://conal.net/papers/language-derivatives/

87. asimpletune ◴[] No.35056413[source]
One way I like to use to understand and explain what the author is talking about is IKEA furniture that can only go together one way, the "right" way. People are inevitably going to not look at the manual and just start fiddling, so you design the pieces themselves to reject the wrong combinations.

I don't know if Ikea actually does this, but I just mean as a concept that's one way you can use to imagine it. There are so many examples of this in the wild, for important things, e.g. you can't use the washing machine unless the lid is actually closed all the way.

replies(1): >>35063989 #
88. PartiallyTyped ◴[] No.35056440{5}[source]
Ah, you are right, i was under the impression that the compiler made it more specific.

My bad.

89. crabbone ◴[] No.35056482{4}[source]
Your type doesn't describe prime numbers. You just named it "prime number", but there's no proof or any other guarantee that it's a prime number.

> People get too caught up in thinking that the type _has_ to express intricate properties

Where do you get this from? Did you even read what you are replying to? I never said anything like that... What I'm saying is that the approach taken by OP is worthless when it comes to real-life uses of validation.

So, continuing with your example: you will either end up doing validation instead of parsing (i.e. you will implement parsePrime validator function), or your will not actually validate that your input is a prime number... The whole point OP was trying to make is that they wanted to capture the constraints on data in a type describing those constraints, but outside of trivial examples s.a. non-empty list, that's used by OP, that leads to programs that are either impossible or are extremely complex.

> One can quite easily create a closed module that encapsulates a ValidPassword

And, again, that would be doing _validation_ not parsing. I'm not sure if you even understand what the conflict here is, or are you somehow agreeing with me w/o saying so?

replies(1): >>35058705 #
90. jmull ◴[] No.35056504{3}[source]
I know we can infer the point from the information buried in the middle of the article. But your quote is significantly edited for clarity, and, after all, is a code example, not a statement of definitions.
replies(1): >>35060900 #
91. crabbone ◴[] No.35056528{4}[source]
No. it's not the approach OP recommends. And that's why it's called validation. I have no idea why would you question that. OP wants to capture constraints on data as ML-style types. But, ML-style types have very limited expressive power, and, when it comes to real-life situation are practically useless outside of the most trivial cases.
replies(1): >>35058426 #
92. crabbone ◴[] No.35056585{4}[source]
> Password validation is a degenerate case of parsing

Really? How is that degenerate? Compared to what?

My guess is that you just decided to use a dictionary word you don't fully understand.

> In fact you could use an invalid password

Where does this nonsense come from? No. I cannot use invalid password. That's the whole point of validation: making sure it doesn't happen. What kind of bs is this?

> But that doesn't mean that `string -> Password` isn't parsing!

It's doing nothing useful, and that's the whole point. You just patted yourself on the head in front of the mirror for using some programming technique that you believe makes you special, but accomplished nothing of value. That was the original point: if you want results, you will end up doing validation, there's no way around it. You renaming of types is only interesting to you and a group of people who are interested in types, but doesn't advance the cause of someone who wants their password validated.

93. waynesonfire ◴[] No.35056624[source]
Does this scale? What if you have 10 other conditions?
replies(6): >>35056897 #>>35056901 #>>35057510 #>>35057732 #>>35059327 #>>35061385 #
94. cratermoon ◴[] No.35056629[source]
Currently working on a project with some inexperienced developers where I was brought on to consult. Lots of shotgun parsing. Stringly-typed code everywhere.Even the dates and timestamps are pass around as strings, resulting in code to go back and forth between proper time types and strings all over the place.
95. crabbone ◴[] No.35056632{4}[source]
> You can use opaque types to encode constraints that the type system isn't able to express.

You just admitted in this sentence that the use of opaque types achieves nothing of value. Which was my point all along: why use them if they are useless? Just to feel smart because I pulled out an academia-flavored ninety-pound dictionary word to describe it?

replies(1): >>35058732 #
96. crabbone ◴[] No.35056718{4}[source]
> dependently-typed language.

Now, are there really tools to make type systems with dependent types simple to prove? In reasonable time? How about the effort developers would have to put into just trying to express such types and into verifying that such an expression is indeed accomplishing its stated goals?

Just for a moment, imagine you filing a PR in a typical Web shop for the login form validation procedure, and sending a couple of screenfulls of Coq code or similar as a proof of your password validation procedure. How do you think your team will react to this?

Again, I didn't say it's impossible. Quite the opposite, I said that it is possible, but will be so bad in practice that nobody will want to use it, unless they are batshit crazy.

97. crabbone ◴[] No.35056790{4}[source]
That's not how OP used it. OP wants to express constraints on data through ML-style type system. They didn't even consider obvious competition s.a. systems like SQL constraints or XSL.

I have no problems with the way you want to interpret this claim. But, really, I'm responding to the article linked to this thread, which isn't about at which point in application to perform the said validation or parsing.

replies(1): >>35059337 #
98. crabbone ◴[] No.35056800{4}[source]
That's validation, my man. Which was the whole point of this.
replies(2): >>35056828 #>>35059297 #
99. crabbone ◴[] No.35056821{4}[source]
You are doing validation! Your type has no properties of a type of prime numbers. You either didn't read the article in OP, or didn't understand what OP was arguing for.

Yes, it's fine, if you want to validate your input in this way -- I have no problems with it. It's just that you are doing validation, not parsing, at least not in the terms OP used them.

replies(1): >>35057766 #
100. cjfd ◴[] No.35056828{5}[source]
It is not validation if you do it at parse time. Then you can pass a Prime around the whole time and never do the is_prime check again.
replies(1): >>35060838 #
101. ElevenLathe ◴[] No.35056897{3}[source]
I think what you're getting at is that it seems ponderous to have types named things like NonEmptyListWhereTheThirdElementIsTheIntegerFourAndTheOtherElementsAreStringsOfLengthSixOrUnder and the answer is that you shouldn't do that, but instead name it something in the problem domain (of whatever the program is about) like WidgetDescription or whatever.
replies(2): >>35058402 #>>35062502 #
102. dangets ◴[] No.35056901{3}[source]
I would think stacking 10 generic conditions wouldn't scale if you are trying to mix and match arbitrarily. If you are trying to mix NonEmpty and AllEven and AllGreaterThan100 for the List example, then you would get the combinatorial explosion of types.

In practice, I find it is usually something like 'UnvalidatedCustomerInfo' being parsed into a 'CustomerInfo' object, where you can validate all of the fields at the same time (phone number, age, non-null name etc.). Once you have parsed it into the internal 'CustomerInfo' type - you don't need to keep re-validating the expected invariants everywhere this type is used. There was a good video on this that I wish I could find again, where the presenter gave the example of using String for a telephoneNumber field instead of a dedicated type. Any code that used the telephoneNumber field would have to re-parse and validate it "to be safe" it was in the expected format.

The topic of having untrusted external types and internal trusted types is also explained in the book `Domain Modeling Made Functional` which I highly recommend.

replies(4): >>35057468 #>>35058092 #>>35059831 #>>35063117 #
103. ◴[] No.35056955[source]
104. dangets ◴[] No.35056979{4}[source]
A similar saying and popular blog post title is "Make illegal states unrepresentable"

https://fsharpforfunandprofit.com/posts/designing-with-types...

https://ybogomolov.me/making-illegal-states-unrepresentable/

105. tizzy ◴[] No.35056991[source]
This seems like good library design. As annoying as it is, it means the things you can use are well tested and supported.

What was your solution to this? Parse the things the library didn't?

replies(1): >>35057146 #
106. kybernetikos ◴[] No.35057146{3}[source]
The library didn't allow you to see the things (e.g. particular headers or options for those headers) that it didn't know to parse. Ultimately we had to migrate to a different library that didn't restrict us to just what the library knew. The decision not to let us even see things that the library didn't know about is particularly egregious where best practices are changing over time.

In my view it's a very bad design for an http library, although it would have been a lot less frustrating if it had at least provided an escape hatch.

replies(2): >>35057263 #>>35063426 #
107. jimbokun ◴[] No.35057263{4}[source]
Sounds like its model is not the HTTP RFC, but something more specific to some domain.

Which I agree, is a poor design choice. A type modeling an HTTP request should model the RFC definition as closely as possible.

replies(1): >>35058994 #
108. frankreyes ◴[] No.35057318[source]
Many years ago I had to write a code transformation for a legacy programming language. It was orders of magnitude much easier to assume that the code was syntactically valid. And we could make that assumption because we had the legacy compiler being used to compile.
109. hansvm ◴[] No.35057468{4}[source]
In Zig at least, the types-as-values framework means it'd be pretty easy to support arbitrary mixing and matching. If nobody beats me to it, I'll make a library to remove the boilerplate for that idea this weekend.
replies(1): >>35105217 #
110. andrewflnr ◴[] No.35057473{3}[source]
Your fallacy is: https://blog.jaibot.com/the-copenhagen-interpretation-of-eth...

Meanwhile, people with more than one bit in their worldview RAM can fall back to validation when it's the only option that makes sense for their domain, and use parsing when it's appropriate, which is, notwithstanding your handful of frankly niche examples compared to the vast bulk of CRUD code, most of the time in practice.

111. crvdgc ◴[] No.35057510{3}[source]
At least in Haskell this can be expressed as type classes. For each condition, you can create a (possibly empty) type class to guarantee that the condition is met. Then the call site type class constraints will be checked at compile time.
replies(1): >>35062416 #
112. ◴[] No.35057671[source]
113. layer8 ◴[] No.35057732{3}[source]
The code size of the “parse” version of a program is at worst linear in the code size of the corresponding “validate” version, so I’d say yes.
114. lolinder ◴[] No.35057758[source]
> You really want to push it upward if it's a high-level concern, and downward if it's a low-level concern. E.g., suppose you're working on an app or service that accesses the database, so the database is lower-level. You'll want to push your database-specific type transformations closer to the code that accesses the database.

This confusion is, I think, just a question of different conceptions of the system architecture.

Your terminology is drawing from a three-tier architecture [0] with a presentation layer, logic layer, and data layer. Under this model, input (data) is the bottom layer and output (HTTP/GUI) is the top layer, with your application logic in the middle.

On the other hand, she is viewing the system through an inside-outside lens similar to the hexagonal architecture [1]. All input (data) and output (HTTP/GUI) is considered to be up and out of your application logic. Rather than being the middle of a sandwich, the application logic is the kernel of a seed.

This is a common way to view the system when programming in functional languages like Haskell because the goal is usually to push all I/O to the start of the call stack so as to minimize the amount of code that has to account for side effects. The three-tier architecture isn't concerned about isolating effects, so treating the data layer as the bottom layer of the code is reasonable.

In either model, the point is to push validation to the boundaries of your code and rely on the type checker to prove you're using things right within the logic layer.

[0] https://en.wikipedia.org/wiki/Multitier_architecture

[1] https://en.wikipedia.org/wiki/Hexagonal_architecture_%28soft...

115. PartiallyTyped ◴[] No.35057766{5}[source]
The argument of OP is that you should first construct your input such that it adheres to a very specific type that contains all the information that you require, e.g. nonEmpty, and then allow that to go through the rest of your code.

Am I mistaken?

My mistake in the above snippets is precisely that TypeScript can not make the type more specific, i.e. Number to Prime, because `type Prime=number` is only creating an alias. I am not creating a type that is a more specific version of number but an alias.

Had I actually created a proper type, the parsing would have been correct. The parsing component is happening in the outer function because at some point I need to make the generic input more specific, and then allow it to flow through the rest of the program. Am I mistaken?

replies(2): >>35059519 #>>35060995 #
116. ◴[] No.35057866[source]
117. ◴[] No.35057974{6}[source]
118. JamesSwift ◴[] No.35058092{4}[source]
You would also run into a form of "type erasure" in most cases and would need a combinatorial explosion of methods unless the type system was able to 'decorate' the generic parameter with additional information.

e.g. imagine a `parseNonEmpty` and a `parseAllEven` method. Both take a list and return either a `NonEmpty` or `AllEven` type. If I call `parseNonEmpty` and get a `NonEmpty`, then pass to `parseAllEven`, I now have erased the `NonEmpty` as I'm left with an `AllEven`. I would need a `parseAllEvenFromNonEmpty` to take a `NonEmpty` and return a `AllEven & NonEmpty`.

119. bcrosby95 ◴[] No.35058114{6}[source]
Depending upon language, and what you're using to hold the array, inheritance.

A 'NotEmpty a' is just a subclass of a potentially empty 'a'. You also get the desirable behavior, in this scenario, of automatic upcasting of a 'NotEmpty a' into a regular old 'a'.

replies(1): >>35060098 #
120. lexi-lambda ◴[] No.35058179{4}[source]
Amusingly, the tweet that inspired this blog post—which is linked in the second paragraph of the article—is specifically about how automatically generating a JSON parser from your datatypes means you don’t have to implement that kind of stuff on your own, and there is no possibility of some separate “schema” going out of sync with your application logic.

Of course, if you want to share the schema with downstream clients so that other programs can use it, that is a great use case for something like JSON Schema. It is a common interface that allows two different programs—quite possibly written in completely different languages—to communicate using the same format. That’s great! But it’s only half the story, because just having the schema doesn’t help you in any way to make sure the code actually respects that schema. That’s where integration with the language’s type system can help, perhaps by automatically generating types from the schema and then generating parsing/serialization functions that use those generated types.

replies(1): >>35095782 #
121. lolinder ◴[] No.35058185[source]
Sum types are used here for error handling, but if your language has a different error handling convention you can and should just use that.

In Java, you'd implement this by making a class with a private constructor, no mutator methods, and a static factory method that throws an exception if the parsing fails. Since the only way to get an instance of the class is through the factory method, you've made illegal states unrepresentable and know that the class always holds to its invariants. No methods on instances of that class will throw exceptions from then on, so you've successfully applied "Parse, Don't Validate" without needing sum types.

The point of the article isn't the particular implementation in Haskell, it's the concept of pushing all data error states to the boundaries of your code, which applies anywhere as long as you translate it into the idioms of your language.

replies(2): >>35060043 #>>35079571 #
122. epolanski ◴[] No.35058253[source]
Also, parsing composes, validating, doesn't.

I once had to implement a feature on a real estate website.

For a given location I would get a list of stats (demographics, cost per square meter, average salary in the area, etc). Of those stats some themselves contained lists.

At the beginning I modeled everything with arrays in react. This led to passing down lists and having to check at multiple steps whether they were non-empty and handle that case.

Then I modeled everything with a NonEmptyArray guard in TypeScript

``` interface NonEmptyArray<A> extends Array<A> { 0: A // tells the typesystem that index 0 exists and is of type A }

function isNonEmpty(as: A[]): as is NonEmptyArray<A> { return as.length > 0 } ```

then after receiving the response with those lists I could parse them into NonEmptyArrays, remove all of the checks of emptiness inside the react components till handling the fact that some of these elements were empty trickled up to the outermost component and everything became very clean and simple to understand/maintain.

123. epolanski ◴[] No.35058281[source]
> And, let’s be honest, most developers are also not using Haskell.

Everything in that post applies to the most common programming language out there: TypeScript.

And several popular others such as Rust, Kotlin or Scala.

replies(4): >>35058565 #>>35058974 #>>35059010 #>>35059347 #
124. tialaramex ◴[] No.35058402{4}[source]
And naming is actually a valuable activity. Knowing this is not merely NonEmptyListWhereTheThirdElementIsTheIntegerFourAndTheOtherElementsAreStringsOfLengthSixOrUnder but actually WidgetDescription is a valuable insight.

Deciding this thing is specifically a WidgetDescription, not a Widget or a WidgetLabel, or a WidgetAssociatedText and definitely not a ThingyDescription, can help both users and other developers produce a mental model of what's going on that results in a better experience for everyone.

125. mrkeen ◴[] No.35058426{5}[source]
> No. it's not the approach OP recommends.

It absolutely is.

> I have no idea why would you question that.

I did not question [that they were different approaches], I explained, through example and counter-example, why they were the same approach. I will try again.

Alexis wrote both 'validate' and 'parse' examples in ML-style types:

    validateNonEmpty :: [a] -> IO ()            // ML-typed 'validate'

    parseNonEmpty :: [a] -> IO (NonEmpty a)     // ML-typed 'parse'
More from the article:

    The difference lies entirely in the return type: validateNonEmpty always returns (), the type that contains no information, but parseNonEmpty returns NonEmpty a, a refinement of the input type that preserves the knowledge gained in the type system. Both of these functions check the same thing, but parseNonEmpty gives the caller access to the information it learned, while validateNonEmpty just throws it away.
I chose OO-style types for my samples, because there's a large fraction of HN users who dismiss ML-ish stuff as academic, or "practically useless outside of the most trivial cases".

    // OO-typed 'validate' (my straw man)
    class User {
        // returns void aka '()' aka "the type that contains no information"
        void validateUser() throws InvalidUserEx {...}          
    }

    /* OO-typed 'parse' (as per my baeldung link)
     * "gives the caller access to the information it learned"
     * In this case it gives back MORE than just the User,
     * it also gives back 'why it went wrong', per your request above for password validation
     * (In contrast with parseNonEmpty which just throws an exception.)
     */
    class UserValidator {
        Validation<Seq<String>, User> validateUser(...) {...}   
    }
> But, ML-style types have very limited expressive power

Hindley-Milner types are a godddamned crown-jewel of computer science.

126. elfprince13 ◴[] No.35058565{3}[source]
It also applies to C++ and Java!
127. chowells ◴[] No.35058705{5}[source]
If you can't create a value of type PrimeNumber that doesn't contain a prime number, there's a bit more to it than naming. Not all type-level guarantees need to come from structural properties of the type. They can also come from structural properties of the environment of the type. Providing no public constructor is such a property.

The example was written rather badly, though. It should have pointed out that the module was exporting the type and a couple helper functions, but not the data constructor.

But despite that, the key point was correct. Validating is examining a piece of data and returning "good" or "bad". Parsing is returning a new piece of data which encodes the goodness property at the type level, or failing to return anything. It's a better paradigm because the language prevents you from forgetting what situation you're in.

replies(2): >>35059113 #>>35060307 #
128. marcelr ◴[] No.35058718[source]
Yes, and this is not tied to statically typed languages. If anything this is simpler to do in dynamic languages, but the culture isn’t there in my experience.
129. chowells ◴[] No.35058732{5}[source]
Opaque types absolutely provide something of value. They're different types. You can't pass an Integer to a function that requires a PrimeNumber. It's a compile error.
replies(1): >>35060355 #
130. ssalbdivad ◴[] No.35058737{4}[source]
Have you seen ArkType (https://github.com/arktypeio/arktype)? Similar parse-based approach to validation with a significantly more concise definition syntax:

const info = type({ name: "string>0", email: "email" })

replies(1): >>35154724 #
131. tialaramex ◴[] No.35058974{3}[source]
And parse-don't-validate is often very nice to work with in Rust, I can describe how to turn some UTF-8 text into my type Foo in a function:

  impl std::str::FromStr for Foo {
    type Err = ReasonsItIsNotAFoo;
    fn from_str(s: &str) -> Result<Self, Self::Err> {
        /* etc. */
    }
  }
And then whenever I've got a string which I know ought to be a Foo, I can:

  let foo: Foo = string.parse().expect("This {string:?} ought to be a Foo but it isn't");
Since we said foo is a Foo, by inference the parsing of string needs to either succeed with a Foo, or fail while trying, so it calls that FromStr implementation we wrote earlier to achieve that.
132. aidenn0 ◴[] No.35058994{5}[source]
> Which I agree, is a poor design choice. A type modeling an HTTP request should model the RFC definition as closely as possible.

I couldn't disagree more. A type modeling an HTTP request should model HTTP requests. Not some theoretical description of an HTTP request.

replies(2): >>35059787 #>>35068654 #
133. jakear ◴[] No.35059010{3}[source]
Not quite, TypeScript provides a options beyond what the author of this article details that IMO are superior, at least in some cases. Instead of just "throw an error or return ()" or "throw an error or return NonEmpty<T>", you can declare a function's return type as "throws iff the argument isn't NonEmpty" or "true iff the argument is NonEmpty".

Compare:

    function validateNonEmpty<T>(list: T[]): void {
      if (list[0] === undefined) 
        throw Error("list cannot be empty")
    }

    function parseNonEmpty<T>(list: T[]): [T, ...T[]] {
      if (list[0] !== undefined) {
        return list as [T, ...T[]]
      } else {
        throw Error("list cannot be empty")
      }
    }

    function assertNonEmpty<T>(list: T[]): asserts list is [T, ...T[]] {
      if (list[0] === undefined) throw Error("list cannot be empty")
    }

    function checkEmptiness<T>(list: T[]): list is [T, ...T[]] {
      return list[0] !== undefined
    }

    declare const arr: number[]

    // Error: Object is possibly undefined
    console.log(arr[0].toLocaleString())

    const parsed = parseNonEmpty(arr)
    // No error
    console.log(parsed[0].toLocaleString())

    if (checkEmptiness(arr)) {
      // No error
      console.log(arr[0].toLocaleString())
    }

    assertNonEmpty(arr)
    // No error
    console.log(arr[0].toLocaleString())
For me the `${arg} is ${type}` approach is superior as you are writing the validation once and can pass the precise mechanism for handling of the error to the caller, who tends to have a better idea of what to do in degenerate cases (sometimes throwing a full on Exception is appropriate, but sometimes a different form of recovery is better).
replies(2): >>35059948 #>>35070563 #
134. lexi-lambda ◴[] No.35059019[source]
I discussed how/why the point of this article is very much not to “parse everything” in this followup: https://lexi-lambda.github.io/blog/2020/01/19/no-dynamic-typ... In particular, it articulates precisely why it is fine to use relatively wide types for any input which the program treats opaquely, and it gives an example of how the same techniques can still be useful even in that context.
135. lexi-lambda ◴[] No.35059113{6}[source]
> If you can't create a value of type PrimeNumber that doesn't contain a prime number, there's a bit more to it than naming.

Yes, indeed. This is quite useful! But crabbone isn’t entirely wrong that it isn’t quite what the original article was about.

I’ve written quite a bit of code where constructive data modeling (which is what the original article is really about) was both practical and useful. Obviously it is not a practical approach everywhere, and there are lots of examples where other techniques are necessary. But it would be quite silly to write it off as useless outside of toy examples. A pretty massive number of domain concepts really can be modeled constructively!

But when they can’t, using encapsulation to accomplish similar things is absolutely a helpful approach. It’s just important to be thoughtful about what guarantees you’re actually getting. I wrote more about that in a followup post here: https://lexi-lambda.github.io/blog/2020/11/01/names-are-not-...

136. philsnow ◴[] No.35059271[source]
> this basically requires your language to have ergonomic support for sum types, immutable "data classes", pattern matching

Not at all, the article is about pushing complexity to the "edges" of your code so that the gooey center doesn't have to faff around with (re-)checking the same invariants over and over... but its examples are also in Haskell, in which it would be weird to do this without the type system.

In python or java or whatever you'd just parse your received_api_request_could_be_sketchy or fetched_db_records_but_are_they_really into a BlessedApiRequest or DefinitelyForRealDBRecords in their constructors or builder methods or whatever, disallow any other ways of creating those types, and then exclusively using those types.

edit: wait, actually no we agree, I must have glossed over your second sentence, sorry

137. chowells ◴[] No.35059297{5}[source]
Nope. If it was validation, it would return a boolean indicating if the value was... Valid.

Instead it's parsing. It takes in a value of one type and returns a value of a different type that is known good. Or it fails. But what it never does is let you continue forward with an invalid value as if it was valid. This is because it's doing more than just validation.

replies(1): >>35060815 #
138. dwohnitmok ◴[] No.35059327{3}[source]
This is where dependent types would shine.

  function processUserInput(input: String, requirement: InputReqs[input]): Unit = ...

  type InputReqs[input] = {
    // We'll say that a Proposition takes Boolean expressions and turn them into types
    notTooLong: Proposition[length(input) < 128],
    authorIsNotBlocked: AuthorIsNotBlocked[input],
    sanitized: Sanitized[input],
    ...
  }
where you might have the following functions (which are all examples of parse don't validate)

  function checkIfAuthorIsBlocked(author: String, input: String): Maybe[AuthorIsNotBlocked[input]] = ...

  // Create a pair that contains both the sanitized String and a tag that it has been sanitized
  function sanitizeString(input: String): (output: String, Sanitized[output]) = ...
where just by types alone I know that e.g. length checking must occur after sanitization (because sanitizeString generates a new `output` that is distinct from `input`) and don't have to write down in docs somewhere that you might cause a bug if you check lengths before sanitization because maybe sanitization changes the length of the input.

Note that this is also strictly stronger than a simple precondition/postcondition system or some sort of assertion system because properties of the input that we care about may not be observable at runtime/from the input alone (e.g. AuthorIsNotBlocked can't be asserted based only on input: you'd have to change the runtime representation of input to include that information).

139. lexi-lambda ◴[] No.35059337{5}[source]
There is a fairly obvious difference between a dynamic enforcement mechanism like contracts or SQL constraints. Though I think it is a bit silly to suggest that I have “never even considered” such things given the blog post itself is rendered using Racket, a dynamically-typed language with a fairly sophisticated higher-order contract system.

SQL constraints are certainly useful. But they don’t really solve the same problem. SQL constraints ensure integrity of your data store, which is swell, but they don’t provide the same guarantees about your program that static types do, nor do they say much at all about how to structure your code that interacts with the database. I also think it is sort of laughable to claim that XSL is a good tool for solving just about any data processing problem in 2023, but even if you disagree, the same points apply.

Obviously, constructive data modeling is hardly a panacea. There are lots of problems it does not solve or that are more usefully solved in other ways. But I really have applied it to very good effect on many, many real engineering problems, not just toys, and I think the technique provides a nice framework for reasoning about data modeling in many scenarios. Your comments here seem almost bafflingly uncharitable given the article in question doesn’t make any absolutist claims and in fact discusses at some length that the technique isn’t always applicable.

See also: my other comment about using encapsulation instead of constructive modeling (https://news.ycombinator.com/item?id=35059113) and my followup blog post about how more things can be encoded using constructive data modeling than perhaps you think (https://lexi-lambda.github.io/blog/2020/08/13/types-as-axiom...).

replies(1): >>35060678 #
140. jameshart ◴[] No.35059347{3}[source]
Absolutely - the advice is highly applicable in most modern widely used languages.

My point was merely that the examples being presented in Haskell - and in the context of talking about lists in a very functional, lispy cons-ish kind of way, makes it less accessible for programmers who are using more object-oriented type systems.

141. blincoln ◴[] No.35059448{7}[source]
I see what you're saying, but I'm still not understanding how it becomes a generalizable rule for real-world code without adding a lot of exceptions to the rule, or doing something over-engineered like parsing into an increasingly specific number of customized structs/objects in different branches of the code.

Just to be clear, I actually really like the idea of parsing the input into a structure. I do the same thing in a lot of my code. I just don't see how it removes the need to also perform validation in many (maybe most) cases as soon as one gets beyond contrived examples.

The empty array example seems to be a can of worms. Maybe it's specific to the kinds of software that I've written, but in most of the cases I can think of, I wouldn't know if it was OK for a particular array within a structure to be empty until after the code had made some other determinations and branched based on them. And yet, like the example, once it got to the real handling for that case, it would be a problem if the array were empty. So the image in my mind is many layers of parsing that are much more complicated and confusing to read than validating the length of the array.

I still think it's a great idea for a lot of things, just that the "parse, don't validate" name seems really misleading. I might go with something like "parse first, validate where necessary".

142. romankolpak ◴[] No.35059491[source]
this generalizes more broadly in fact and applies to not just parsing and validating data. very often you want to reject all problematic states so you can conveniently code the happy path assuming all preconditions are met (inputs are correct, permissions are granted, etc) using meaningful data structures free from the messiness of the real world.

it's often a matter of experience to get this nuance of programming. you just learn with time that it's very inconvenient to test for emptiness multiple levels deep in the callstack again and again and you go "why can't i just assume good data here?". and then you figure out a way to write the code so you can.

143. lexi-lambda ◴[] No.35059519{6}[source]
You are sort of mistaken. I wrote a followup blog post that discusses what you are describing at some length: https://lexi-lambda.github.io/blog/2020/11/01/names-are-not-...

However, TypeScript does not really provide any facility for nominal types, which in my opinion is something of a failure of the language, especially considering that it is at odds with the semantics of `class` and `instanceof` in dynamically-typed JavaScript (which have generative semantics). Other statically typed languages generally provide some form of nominal typing, even gradually typed ones. Flow even provided nominal types in JavaScript! But TypeScript is generally also quite unsound (https://twitter.com/lexi_lambda/status/1621973087192236038), so the type system doesn’t really provide any guarantees, anyway.

That said, TypeScript programmers have developed a way to emulate nominal typing using “brands”, which does allow you to obtain some of these benefits within the limitations of TS’s type system. You can search for “TypeScript branded types” to find some explanations.

replies(1): >>35060390 #
144. lexi-lambda ◴[] No.35059621[source]
Certainly I don’t think `parseNonEmpty` would be especially useful in a real program, it’s only there as an example to provide a particularly simple contrasting example against `validateNonEmpty`. The example earlier in the blog post using the `nonEmpty` function (which returns an optional result) is a more realistic example of how such things are actually used in practice, since that allows you to raise a domain-appropriate error message.

Tangentially, in Haskell specifically, I have actually written a library specifically designed for checking the structure of input data and raising useful error messages, which is somewhat ironically named `monad-validate` (https://hackage.haskell.org/package/monad-validate). But it has that name because similar types have historically been named `Validation` within the Haskell community; using the library properly involves doing “parsing” in the way this blog post advocates.

145. recursive ◴[] No.35059787{6}[source]
HTTP requests are not things that humans discovered in nature. They are abstractions, created entirely by specification. In some sense, an HTTP request is exactly that which conforms to the specification.
replies(3): >>35060430 #>>35062064 #>>35063121 #
146. lexi-lambda ◴[] No.35059831{4}[source]
> If you are trying to mix NonEmpty and AllEven and AllGreaterThan100 for the List example, then you would get the combinatorial explosion of types.

This is overthinking it. Usually, when people are not used to doing constructive data modeling, they get caught up on this idea that they need to have a datatype that represents their data in some canonical representation. If you need a type that represents an even number, then clearly you must define a type that is an ordinary integer, but rules out all odd numbers, right?

Except you don’t have to do that! If you need a number to always be even (for some reason), that suggests you are storing the wrong thing. Instead, store half that number (e.g. store a radius instead of a diameter). Now all integers are legal values, and you don’t need a separate type. Similarly, if you want to store an even number greater than 100, then use a natural number type (i.e. a type that only allows non-negative integers; Haskell calls this type `Natural`) and store half that number minus 102. This means that, for example 0 represents 102, 1 represents 104, 2 represents 106, 3 represents 108, etc.

If you think this way, then there is no need to introduce a million new types for every little concept. You’re just distilling out the information you actually need. Of course, if this turns out to be a really important concept in your domain, then you can always add a wrapper type to make the distinction more formal:

    newtype EvenGreaterThan100 = EvenGreaterThan100 Natural

    evenGreaterThan100ToInteger :: EvenGreaterThan100 -> Integer
    evenGreaterThan100ToInteger (EvenGreaterThan100 n) = (toInteger n * 2) + 102

    integerToEvenGreaterThan100 :: Integer -> Maybe EvenGreaterThan100
    integerToEvenGreaterThan100 n
      | n < 100 = Nothing
      | otherwise = case n `quotRem` 2 of
          (q, 0) -> Just (EvenGreaterThan100 q)
          (_, _) -> Nothing
Of course, this type seems completely ridiculous like this, and it is. But that’s because no real program needs “an even number greater than one hundred”. That’s just a random bag of arbitrary constraints! A real type would correspond to a domain concept, which would have a more useful name and a more useful API, anyway.

I wrote a followup blog post here that goes into more detail about this style of data modeling, with a few more examples: https://lexi-lambda.github.io/blog/2020/08/13/types-as-axiom...

replies(1): >>35062414 #
147. lexi-lambda ◴[] No.35059886[source]
> As a result I have not found this article a good one to share with junior developers to help them understand how to design types to capture the notion of validity, and to replace validation with narrowing type conversions (which amount to ‘parsing’ when the original type is something very loose like a string, a JSON blob, or a dictionary).

This is sort of true. It is a good technique, but it is a different technique. I went into how it is different in quite some detail in this followup blog post: https://lexi-lambda.github.io/blog/2020/11/01/names-are-not-...

I think a common belief among programmers is that the true constructive modeling approach presented in the first blog post is not practical in languages that aren’t Haskell, so they do the “smart constructor” approach discussed in the link above instead. However, I think that isn’t actually true, it’s just a difference in how respective communities think about their type systems. In fact, you can definitely do constructive data modeling in other type systems, and I gave some examples using TypeScript in this blog post: https://lexi-lambda.github.io/blog/2020/08/13/types-as-axiom...

replies(1): >>35061090 #
148. lexi-lambda ◴[] No.35059948{4}[source]
There is really no difference between doing this and returning a `Maybe`, which is the standard Haskell pattern, except that the `Maybe` result also allows the result to be structurally different rather than simply a refinement of the input type. In a sense, the TypeScript approach is a convenience feature that allows you to write a validation function that returns `Bool`, which normally erases the gained information, yet still preserve the information in the type system.

This is quite nice in situations where the type system already supports the refinement in question (which is true for this NonEmpty example), but it stops working as soon as you need to do something more complicated. I think sometimes programmers using languages where the TS-style approach is idiomatic can get a little hung up on that, since in those cases, they are more likely to blame the type system for being “insufficiently powerful” when in fact it’s just that the convenience feature isn’t sufficient in that particular case. I presented an example of one such situation in this followup blog post: https://lexi-lambda.github.io/blog/2020/08/13/types-as-axiom...

replies(1): >>35070550 #
149. Joel_Mckay ◴[] No.35059995[source]
In general, parsers that do not limit recursion-depth and order can be a problem.

Marshalling the data for platform traversal is also very wise. A library like Xalan/xerces using XSLT is very powerful, or something lightweight like the JSON/BSON parser in libbson.

Accordingly, one must assume the data is _always_ malformed, and assign a scoring system to the expected format at each stage of decoding. i.e. each service/function does a sanity check, then scores which data is critical, optional, and prohibited.

This way your infrastructure handles the case when (not if) someone tries to put Coffee Grounds in your garbage disposal unit. =)

150. lexi-lambda ◴[] No.35060043{3}[source]
> In Java, you'd implement this by making a class with a private constructor, no mutator methods, and a static factory method that throws an exception if the parsing fails.

This is similar, and is indeed quite useful in many cases, but it’s not quite the same. I explained why in this comment: https://news.ycombinator.com/item?id=35059886 (The comment is talking about TypeScript, but really everything there also applies to Java.)

replies(1): >>35060531 #
151. secdeal ◴[] No.35060098{7}[source]
Not quite, 'a' is the type of the elements 'NonEmpty a' contains.

It is rather the subclass of some kind of 'Iterable a'.

152. crabbone ◴[] No.35060307{6}[source]
> If you can't create a value of type PrimeNumber that doesn't contain a prime number

Because you wrote a validation function, the exact thing OP told you not to do. Hooray?!

The goal of OP was to create a type that incorporates constraints on data, just like in their example about the non-empty list they created a type that in the type itself contains the constraints s.t. it's impossible to implement this type in a way that it will have an empty list.

You did the opposite. You created a type w/o any constraints whatsoever, and then added a validation function to it to make sure you only create values validated by that function. So... you kind of proved my point: it's nigh impossible to create a program intelligible to human beings that has a "prime number" type, and that's why we use validation -- it's easy to write, easy to understand.

Your type isn't even a natural number, let alone a prime number.

replies(1): >>35061011 #
153. crabbone ◴[] No.35060355{6}[source]
Not in this context they don't. They are useless if you want to ensure that a given number is a prime number.
replies(2): >>35063518 #>>35063646 #
154. PartiallyTyped ◴[] No.35060390{7}[source]
This is fantastic, thank you very much!

When I wrote GP I had in mind branding as the "right" way to get those benefits - though I was unaware of the name - however I see that it is still limited by TS' compiler's limitations.

So then going back to the initial snippets, my issue is that the Prime type is essentially behaving like newtype, thus the inner calls can not actually rely on the value actually being prime, yes?

I have to admit that quite a few of the things in the blog are beyond my current understanding. Do you have any recommended reading for post grads with rudimentary understanding of Haskell who would like to get deeper into type systems?

replies(1): >>35060621 #
155. aidenn0 ◴[] No.35060430{7}[source]
To an extent, that sounds like saying the thing I am sitting in is not a chair since it has 5 legs.
replies(1): >>35061204 #
156. PartiallyTyped ◴[] No.35060447{3}[source]
I was wrong.

See this fantastic reply by the author:

https://news.ycombinator.com/reply?id=35059519

157. lolinder ◴[] No.35060531{4}[source]
Thanks for the reply! I wasn't at all expecting one from you.

If I'm understanding the difference correctly, it's that the constructive data modeling approach can be proven entirely in the type system without any trust in the library code, while the Java approach I recommended depends on there being no other way to construct an instance of the class, which can be tricky to guarantee. Is that accurate?

replies(1): >>35061617 #
158. gnulinux ◴[] No.35060605{5}[source]
You're misunderstanding. Validation looks like

  validateEmail : String -> String -- post-condition: String contains valid email
whereas parse looks like:

  parseEmail : String -> Either EmailError ValidEmail
There is no problem using `ValidEmail` abstraction. The problem is type stability, when your program enters a stronger state at runtime (i.e. certain validations are performed at runtime) it's best to enter a strong state at compile time (stronger types) so that compiler can verify these conditions. If you remain at String, these validations (that a string is valid email) have no compile-time counterpart so there is no way for compiler to verify. So use `ValidEmail` instead.
159. lexi-lambda ◴[] No.35060621{8}[source]
Haskell’s `newtype` keyword defines a genuinely fresh (nominal) type that is distinct from all other types. There is no direct analogue in TypeScript, but using branded types would be the closest you could get. That’s still not quite the same because TypeScript doesn’t really allow the same strong encapsulation guarantees that Haskell provides (which, to be clear, many other languages provide as well!), but it’s a decent approximation.

The problem with your `Prime` type is that it is just a type alias: a new way to refer to the exact same type. It’s totally interchangeable with `number`, so any `number` is necessarily also a `Prime`… which is obviously not very helpful. (As it happens, the Haskell equivalent of that would be basically identical, since Haskell also uses the `type` keyword to declare a type alias.)

As for recommended reading, it depends on what you’d like to know, really. There are lots of different perspectives on type systems, and there’s certainly a lot of stuff you can learn if you want to! But I think most working programmers probably don’t benefit terribly much from the theory (though it can certainly be interesting if you’re into that sort of thing). Perhaps you could tell me which things you specifically find difficult to understand? That would make it easier for me to provide suggestions, and it would also be useful to me, as I do try to make my blog posts as accessible as possible!

replies(1): >>35061206 #
160. crabbone ◴[] No.35060678{6}[source]
> There is a fairly obvious difference between a dynamic enforcement mechanism like contracts or SQL constraints.

What on Earth are you talking about? What dynamic enforcement?

> Though I think it is a bit silly to suggest that I have “never even considered”

In the context of this conversation you showed no signs of such concerns. Had you have such concerns previously, you wouldn't have arrived at conclusions you apparently have.

> a dynamically-typed language

There's no such thing as dynamically-typed languages, just like there aren't blue or savory programming languages. Dynamically-typed is just a word combo that a lot of wannabe computer scientists are using, but there's no real meaning behind it. When "dynamic" is used in the context of types, it refers to the concrete type obtained during program execution, whereas "static" refers to the type that can be deduced w/o executing the program. For example, union types cannot be dynamic. Similarly, it's not possible to have generic dynamic types. Every language thus has dynamic and static types, except, in some cases, the static analysis of types isn't very useful because the types aren't expressive enough, or the verification is too difficult. Conversely, in some languages there's no mechanism to find out exact runtime types because the information about types is considered to be extraneous to the program and is removed from the runtime.

The division that wannabe computer scientists are thus trying to make between "dynamically-typed" and "statically-typed" lies roughly along the lines of "languages without useful static analysis method" and "languages that may be able to erase types from the runtime in most cases". Where "useful" and "most cases" are a matter of subjective opinion. Often times such boundaries lead claimers to ironically confusing conclusions, s.a. admitting that languages like Java aren't statically typed or that languages like Bash are statically typed and so on.

Note that "wannabee computer scientist" applies also to people with degrees in CS (I've met more than a dozen), some had even published books on this subject. This only underscores the ridiculous state in which this field is.

> discusses at some length that the technique isn’t always applicable.

This technique is not applicable to overwhelming majority of everyday problems. It's so niche it doesn't warrant a discussion, but it's instead presented as a thing to strive for. It's not a useful approach and at the moment, there's no hope of making it useful.

Validation, on the other hand, is a very difficult subject, but, I think that if we really want to deal with this problem, then TLA+ is a good approach for example. But it's still too difficult to the average programmer. Datalog would be my second choice, which also seems appropriate for general public. Maybe even something like XSL, which, in my view lacks a small core that would allow one to construct it from first principles, but it's still able to solve a lot of practical tasks when it comes to input validation.

ML-style types aren't competitive in this domain. They are very clumsy tools when it comes to expressing problems that programmers have to solve every day. We, as community, keep praising them because they are associated with the languages for the "enlightened" and thus must be the next best thing after sliced bread.

replies(1): >>35060984 #
161. crabbone ◴[] No.35060815{6}[source]
> If it was validation, it would return a boolean

On what grounds did you decide that this is the requirement for validation? That's truly bizarre... Sometimes validating functions return booleans... but there's no general rule that they do.

Anyways, you completely missed the point OP was trying to make. Their idea was to include constraints on data (i.e. to ensure data validity) in the type associated with the data. You've done nothing of the kind: you created a random atomic type with a validation method. Your type isn't even a natural number, you definitely cannot add other natural number to it or to multiply etc...

Worse yet, you decided to go into a language with subtyping, which completely undermines all of your efforts, even if you were able to construct all of those overloads to make this type behave like a natural number: any other type that you create by inheriting from this class has the liberty to violate all the contracts you might have created in this class, but, through the definition of your language, it would still be valid to say that the subtype thus created is a prime number, even if it implements == in a way that it returns "true" when compared to 8 (only) :D

162. crabbone ◴[] No.35060838{6}[source]
> It is not validation if you do it at parse time

Who told you so? Definitely not OP. OP doesn't believe what you just wrote.

replies(1): >>35064220 #
163. psychoslave ◴[] No.35060873[source]
>it’s only three words long: Parse, don’t validate.

My own English parser is telling me it's actually four words, however.

Your mileage may validate that differently. ;)

164. cratermoon ◴[] No.35060900{4}[source]
What is code, though, but a syntactically precise and logical way of expressing ideas?
165. crabbone ◴[] No.35060965{4}[source]
I wish this was the first such case. But, what I see happen way too often is this:

Some dude comes up with another data definition language (DDL) that uses ML-style types. Everyone jumps from their seats in standing ovation. And in the end we get another useless configuration language that cannot come anywhere close to the needs of application developers, and so they pedal away on their squared-wheel bicycles of hand-rolled very custom data validation procedures.

This is even more disheartening because we already have created tools that made some very good progress into systematic input validation. And they were with us since the down of programming (well, almost, we had SQL since early 70's, then we also had Prolog, then we had various XML schema languages, and finally TLA+). It's amazing how people keep ignoring solutions that achieved so much compared to ensuring that a list isn't empty... and yet present it as the way forward...

166. lexi-lambda ◴[] No.35060984{7}[source]
You come off as a crank.

Perhaps you are one, perhaps you are not, I don’t know, but either way, you certainly write like one. If you want people to take you seriously, I think it would behoove you to adopt a more leveled writing style.

Many of the claims in your comment are absurd. I will not pick them apart one by one because I suspect it will do little to convince you. But for the benefit of other passing readers, I will discuss a couple points.

> What on Earth are you talking about? What dynamic enforcement?

SQL constraints are enforced at runtime, which is to say, dynamically. Static types are enforced without running the program. This is a real advantage.

> There's no such thing as dynamically-typed languages, just like there aren't blue or savory programming languages. […] The division that wannabe computer scientists are thus trying to make between "dynamically-typed" and "statically-typed" lies roughly along the lines of "languages without useful static analysis method" and "languages that may be able to erase types from the runtime in most cases".

I agree that the distinction is not black and white, and in fact I am on the record in various places as saying so myself (e.g. https://twitter.com/lexi_lambda/status/1219486514905862146). Java is a good example of a language with a very significant dynamic type system while also sporting a static type system. But it is certainly still useful to use the phrase “dynamically-typed language,” because normal people know what that phrase generally refers to. It is hardly some gotcha to point out that some languages have some of both, and there is certainly no need to insult my character.

> This technique is not applicable to overwhelming majority of everyday problems. It's so niche it doesn't warrant a discussion, but it's instead presented as a thing to strive for. It's not a useful approach and at the moment, there's no hope of making it useful.

This is simply not true. I know because I have done a great deal of real software engineering in which I have applied constructive data modeling extensively, to good effect. It would be silly to list them because it would simply be listing every single software project I have worked on for the past 5+ years. Perhaps you have not worked on problems where it has been useful. Perhaps you do not like the tradeoffs of the technique. Fine. But in this discussion, it’s ultimately just your word against mine, and many other people seem to have found the techniques quite useful—and not just in Haskell. Just look at Rust!

> Datalog would be my second choice, which also seems appropriate for general public.

The idea that datalog, a first-order relational query language, solves data validation problems (without further clarification) is so laughable that merely mentioning it reveals that you are either fundamentally unserious or wildly uninformed. It is okay to be either or both of those things, of course, but most people in that position do not have the arrogance and the foolishness to leave blustering comments making an ass of themselves on the subject on an internet forum.

Please be better.

replies(1): >>35070695 #
167. crabbone ◴[] No.35060995{6}[source]
> first construct your input

My man... if I, the author of my program, was constructing the input, I wouldn't need no validation. Input isn't meant to be constructed by the program's author, it's supposed to be processed...

replies(1): >>35061690 #
168. chowells ◴[] No.35061011{7}[source]
Are you aware that it's impossible to do any kind of parsing without validating the data? Saying "you have a validation function" is not some sort of disproof of parsing.

Parsing is an additional job on top of validation - providing type-level evidence that the data is good. That's what makes it valuable. It's not some theoretical difference in power. It's better software engineering.

169. jameshart ◴[] No.35061090{3}[source]
Thanks for responding - just to reiterate, I am a big fan of this original post, and indeed your other writing - my only critique here is that I'm looking for ways to make the insights in them more transparent to, particularly, people who aren't well-positioned to analogize how to apply Haskell concepts to other languages.

I see you read 'narrowing type conversions' rather literally in my statement - that might be my making my own analogy that doesn't go over very well. I literally mean using 'constructive modeled types' is a way to create true type-narrowing conversions, in the sense that a 'nonempty list' is a narrower type than 'list', or 'one to five' is a narrower type than 'int'.

replies(1): >>35067979 #
170. recursive ◴[] No.35061204{8}[source]
I mean, if chairs were things with formal specifications, and that specification said so, yeah.

But in this universe, no.

replies(1): >>35063081 #
171. PartiallyTyped ◴[] No.35061206{9}[source]
I'd be interested in more about Generic as that section went over my head though it's not an issue with your blog but rather inept knowledge on my part.

I find it quite interesting though I never had the time to study it further until now, so any recommendations are appreciated!

replies(1): >>35061524 #
172. cdaringe ◴[] No.35061385{3}[source]
Somebody somewhere in system will be checking. At the boundary likely scales the best because you dont burn cycles that otherwise would be distributed and likely redundant in system. If you have a massive model, its feasible that it makes sense to defer partial/subparsing?

Parser combinators seem to be pretty rippin fast for the most part, at least those ive used in ocaml and rust.

173. lexi-lambda ◴[] No.35061524{10}[source]
Generic is quite specific to Haskell, so it is probably difficult to explain without a little more understanding of Haskell-like type systems. (Rust has some similar capabilities, so that would help, too.) I wouldn’t worry about it too much, though; it doesn’t contain any particularly deep knowledge about type systems in general.
replies(1): >>35061742 #
174. Vosporos ◴[] No.35061557[source]
It's okay to say that you didn't understand the article, you know.
175. lexi-lambda ◴[] No.35061617{5}[source]
Yes, that’s about right. But really do read the followup blog post (https://lexi-lambda.github.io/blog/2020/11/01/names-are-not-...), as it explains that in much more depth! In particular, it says:

> To some readers, these pitfalls may seem obvious, but safety holes of this sort are remarkably common in practice. This is especially true for datatypes with more sophisticated invariants, as it may not be easy to determine whether the invariants are actually upheld by the module’s implementation. Proper use of this technique demands caution and care:

> * All invariants must be made clear to maintainers of the trusted module. For simple types, such as NonEmpty, the invariant is self-evident, but for more sophisticated types, comments are not optional.

> * Every change to the trusted module must be carefully audited to ensure it does not somehow weaken the desired invariants.

> * Discipline is needed to resist the temptation to add unsafe trapdoors that allow compromising the invariants if used incorrectly.

> * Periodic refactoring may be needed to ensure the trusted surface area remains small. It is all too easy for the responsibility of the trusted module to accumulate over time, dramatically increasing the likelihood of some subtle interaction causing an invariant violation.

> In contrast, datatypes that are correct by construction suffer none of these problems. The invariant cannot be violated without changing the datatype definition itself, which has rippling effects throughout the rest of the program to make the consequences immediately clear. Discipline on the part of the programmer is unnecessary, as the typechecker enforces the invariants automatically. There is no “trusted code” for such datatypes, since all parts of the program are equally beholden to the datatype-mandated constraints.

They are both quite useful techniques, but it’s important to understand what you’re getting (and, perhaps more importantly, what you’re not).

176. PartiallyTyped ◴[] No.35061690{7}[source]
Construct was not the correct word. The intention was to express that you do need to parse the object into something more specific that captures the properties that you require.

Take for example APIGatewayProxyEvent [1], which has a property `queryStringParameters` with type:

    export interface APIGatewayProxyEventQueryStringParameters {
        [name: string]: string | undefined;
    }
You can then create a branded type like

    type AuthCodeEvent = APIGatewayProxyEvent & {
         queryStringParameters: {
             code: string;
             state: string;
         };
    };
The branded type here means that as soon as you verify that the event has that structure above, and you can assume that it is correct in the code handles these specific cases.

Though as the blog author mentioned in the other chain, the TS compiler is not particularly sound, so it's probably entirely possible to mess the structure and break the type without the compiler knowing about it.

[1] https://github.com/DefinitelyTyped/DefinitelyTyped/blob/mast...

177. PartiallyTyped ◴[] No.35061742{11}[source]
Okay, is there like a book or some other resource besides your awesome blog that you'd recommend for people looking to get into this some more?
replies(1): >>35068057 #
178. kybernetikos ◴[] No.35062064{7}[source]
You don't have to run an HTTP server very long on the internet to start discovering HTTP requests (including malformed HTTP requests, which are also a kind of HTTP request), 'in nature'.
179. esrauch ◴[] No.35062414{5}[source]
What if you want it to only be less that 100 though? Not everything is so easily expressable the way you're saying.
replies(2): >>35062811 #>>35068165 #
180. dwohnitmok ◴[] No.35062416{4}[source]
This doesn't really work because you still end up needing specific types for the output of every "parser" you have and then you still need a way of combining those types together.

Or you get the ability to forge evidence (e.g. you use the evidence provided by a parser for one integer as evidence for another).

This works better for dependency injection scenarios (the Has* pattern).

181. jmull ◴[] No.35062429{3}[source]
The db access is just an example. I used upward and downward working off the terminology of the article. But I can put it like this:

For a given call or request, there's input, some work done with that input, and the result. (This is true, whether we're talking about a functional or imperative style.) Your code will have some structure that reflects the work to be done. You want to push your parsing toward the input if it's concerned with the input, and toward the result if it's concerned with the result.

Whether you want to call the processing closer to the input "upward", or "earlier" or whatever, that's fine with me. If you call the processing closer to the input and closer to the result both "upward" then I think it's not a useful metaphor and you should choose a different one.

replies(1): >>35063332 #
182. dwohnitmok ◴[] No.35062502{4}[source]
No the trickier problem is that without dependent types you are forced into a very specific, linear chain of validation or else deal with a combinatorial explosion of functions and types.

To take your type as an example, you could imagine a function

  validation : String -> Maybe FinalWidget
but maybe `validation` is really big and unwieldy and you want to reuse parts of it elsewhere so you break it down into a pipeline of

  -- Let's say a RawWidget is, operationally, a non-empty string
  validation0 : String -> Maybe RawWidget
  -- Let's say a RefinedWidget is a string consisting only of capital letters
  validation1 : RawWidget -> Maybe RefinedWidget
  -- A FinalWidget is a non-empty string of capital letters that has no whitespace
  validation2 : RefinedWidget -> Maybe FinalWidget
This is over-constrained. You don't really want to force yourself into a scenario where you must call validation0, then validation1, and finally validation2 because maybe in another code path it's more expedient to do it in another order. But the types don't line up if you do it in another order. And maybe you don't really care about `RawWidget` and `RefinedWidget`, but you're forced to create them just to make sure that you can build up to a `FinalWidget`.

This is where dependent types would really help relax those constraints.

replies(1): >>35062924 #
183. Quekid5 ◴[] No.35062811{6}[source]
You can go as far (or as short) as the application warrants. The more static evidence you want, the more cumbersome it's going to become. That seems sort of a natural progression, but the the point is that you get to choose the trade-off. It's often easier to prove a special case than a more general case.

(Certainly, Haskell is probably not the most concise language for this kind of thing. LiquidHaskell adds interesting proof capabilities wrt. arithmetic.)

Regardless, even just parsing at the boundary and using an opaque type

    MySuperSpecialInt = Int @Even Min(2) Max(100)
(or whatever syntax you want) is still better than just using Int. At least you'll know that you'll always be handed a value that's in range (post-parsing).
184. dang ◴[] No.35062866[source]
Thanks! Macroexpanded:

Parse, Don't Validate (2019) - https://news.ycombinator.com/item?id=27639890 - June 2021 (270 comments)

Parse, Don’t Validate - https://news.ycombinator.com/item?id=21476261 - Nov 2019 (230 comments)

Parse, Don't Validate - https://news.ycombinator.com/item?id=21471753 - Nov 2019 (4 comments)

185. Quekid5 ◴[] No.35062924{5}[source]
I don't disagree that dependent types would help (and be really cool for lots of other uses!), but let's consider what the usual validation rules that we really need are: non-empty, basic interval constraints (non-negative/positive), only contains a certain set of characters... simple stuff like that, usually. If we're going wild, an interesting case would be effectful validation and how that fits in. In practice, what happens with any non-basic validation is that the server says 3xx, try again.

Anyway, validation/parsing is mostly pretty simple stuff where the "validate" bit is a simple function... and function composition works just fine.

(Assuming you can name the result type of your parse/validate individually according to your domain.)

replies(1): >>35064351 #
186. girvo ◴[] No.35063081{9}[source]
If you can completely ignore HTTP request data that happens to not perfectly meet the RFC at your work (or, more specifically for us, Modbus RTU responses), I salute you. Sadly, I can’t, we get some wild stuff that we still need to attempt to handle. Both HTTP and Modbus!
187. ambicapter ◴[] No.35063117{4}[source]
> In practice, I find it is usually something like 'UnvalidatedCustomerInfo' being parsed into a 'CustomerInfo' object, where you can validate all of the fields at the same time

In my experience you usually can't validate them all at the same time. For example, address. You usually don't validate that until after the customer has selected items, and then you find out that some items won't deliver to their area, so whereas you previously had a Valid basket, now it's an Invalid state.

replies(2): >>35063216 #>>35155316 #
188. lmm ◴[] No.35063121{7}[source]
> HTTP requests are not things that humans discovered in nature. They are abstractions, created entirely by specification.

They're created by something, but that something has more to do with a million blog posts and hallway conversations than it does with the formal RFC process. Certainly for most specifications of this kind, the working code came first and the specification was based largely on discovering what existing implementations did. If what the specification says is different from what HTTP clients send and HTTP servers understand, so much the worse for the specification.

189. initplus ◴[] No.35063216{5}[source]
Whether an item can be delivered to a particular address seems like a seperate concern to whether the address is correct/real.
190. jakelazaroff ◴[] No.35063332{4}[source]
Any given callee is going to deal with a bunch of both inputs and results. And it’s not clear to me what those terms mean — e.g. is the response from the database an “input” or a “result”?

I think your point of view would make more sense looking at the call stack — database access happens deeper than the code that handles the response, so you can’t push it “up” from there. And I mean, sure? But I don’t think that’s an inherently better frame than the one in which external sources are “upward” and your own application code is “downward”.

191. lmm ◴[] No.35063412{6}[source]
> There are plenty of cases in real-world code where an array that's part of a struct or object may or may not contain any elements. If you're just parsing input into that, it seems like you'd either still end up doing an equivalent of checking whether the array is empty or not everywhere the array might be used later, even if that check is looking at an "array has elements" type flag in the struct/object, and so you're still maintaining a description of ways that the input may be invalid.

You only check it if it makes a difference to validity or not. There's no scenario where you keep the array and a parallel flag - either an empty array is invalid in which case you refuse to construct if it's empty, or an empty array is valid in which case you don't even check. Same thing for if you're checking whether it's got an even number of elements, got less than five elements, etc. - you don't keep the flag around, you refuse to construct your validated structure (even if that "validated structure" is actually just a marker wrapper around the raw array, if your type system isn't good enough to express the real constraint directly).

192. ParetoOptimal ◴[] No.35063426{4}[source]
> The library didn't allow you to see the things (e.g. particular headers or options for those headers) that it didn't know to parse.

I typically design code around things like this with a sum type like:

    data Header = KnownHeader1 | KnownHeader2 | UnknownHeader String String
Then I typically don't offer any extra support or extended functionality for the cases where the type is `UnknownHeader`.
193. lmm ◴[] No.35063455{4}[source]
Most of the time you avoid having booleans in the first case, in favour of polymorphism (e.g. rather than having an "addOrMultiply" flag, you have separate "Add" and "Multiply" classes with a polymorphic method that does the addition or multiplication). You probably need some conditional logic in your "parser" (and whether that's "if" or pattern matching isn't so important IMO), but you should push booleans out of your core business logic and over to the edges of your program.
replies(1): >>35071493 #
194. ParetoOptimal ◴[] No.35063518{7}[source]
> Not in this context they don't.

What context is it exactly where they don't matter?

I can tell you in practice, in the real world, they very much do.

> They are useless if you want to ensure that a given number is a prime number.

It's not useless. The point is that once you have type `PrimeNumber` that can only be constructed after being validated, you then can write functions exist in a reality where only PrimeNumber exists.

195. ParetoOptimal ◴[] No.35063646{7}[source]
I wrote an example with prime numbers that you can run in the Haskell playground:

https://play.haskell.org/saved/gRsNcCGo

> They are useless if you want to ensure that a given number is a prime number.

This is wrong. In the example above `addPrimes` will only take prime numbers.

As such if I make a Jira story that says "add multiply/subtract functions using the PrimeNumber type" I'll know that implementation is simplified by only being able to concern itself with prime numbers.

196. dgunay ◴[] No.35063989[source]
Sometimes they do, but not always - my roommate once assembled an Ikea bookshelf with one of the panels on backwards.
197. rovolo ◴[] No.35064220{7}[source]
From the OP:

> Still, perhaps you are skeptical of parseNonEmpty’s name. Is it really parsing anything, or is it merely validating its input and returning a result? While the precise definition of what it means to parse or validate something is debatable, I believe parseNonEmpty is a bona-fide parser (albeit a particularly simple one).

> Consider: what is a parser? Really, a parser is just a function that consumes less-structured input and produces more-structured output.

The OP is saying that a validator is a function which doesn't return anything, whereas parsing is a function which returns data. (Or in other words, validation is when you keep passing around the data in the old type, and parsing is when you pass around a new type). It is true that there is code inside the parser which you can call "validation", but the OP is labeling the function based on its signature. This is made more obvious towards the end of the article:

> Use abstract datatypes to make validators "look like" parsers. Sometimes, making an illegal state truly unrepresentable is just plain impractical given the tools Haskell provides, such as ensuring an integer is in a particular range. In that case, use an abstract newtype with a smart constructor to "fake" a parser from a validator.

They are talking about the interface, not the implementation. They are saying that you should pass around a parsed type, even if it's only wrapping a raw value, because it carries proof that this data has been validated. They are saying that you shouldn't be validating this data in lots of different places.

> It may not be immediately apparent what shotgun parsing has to do with validation—after all, if you do all your validation up front, you mitigate the risk of shotgun parsing. The problem is that validation-based approaches make it extremely difficult or impossible to determine if everything was actually validated up front or if some of those so-called “impossible” cases might actually happen. The entire program must assume that raising an exception anywhere is not only possible, it’s regularly necessary.

198. dwohnitmok ◴[] No.35064351{6}[source]
Without dependent types you can't do your common constraints in an order independent way.

You end up with four choices:

1. Have a single function that does all the constraint checking at once

2. Have a single linear order where each constraint check feeds into the next but only in that order

3. Acquiesce to a combinatorial explosion of functions that check every possible combination of those constraints

4. Give up keeping track of the constraints at a type level.

replies(1): >>35075015 #
199. ParetoOptimal ◴[] No.35064605{4}[source]
> How am I going to express the expectation that something is prime? With the following closed API:

Obviously not a closed API since the playground only gives you one module, but I wrote an example on the Haskell playground:

https://play.haskell.org/saved/gRsNcCGo

200. quickthrower2 ◴[] No.35065147{4}[source]
The second looks a lot more elegant in Haskell though. Funny how syntax and influence choice of semantics!
201. ParetoOptimal ◴[] No.35065313{4}[source]
I prefer using early return in monads with guard like:

    safeDiv :: (Monad m, Alternative m) => Int -> Int -> m Int
    safeDiv x y = do
      guard (y /= 0)
      pure (x `div` y)

    main :: IO ()
    main = do
      print $ safeDiv @Maybe 1 0
      print $ safeDiv @[] 1 0
      -- print =<< safeDiv @IO 1 0 -- guard throws an error in IO
Try it out at https://play.haskell.org/saved/a6VsE3uQ
202. lexi-lambda ◴[] No.35067979{4}[source]
You know what, you’re right—I misread your original comment. I was just going through this thread and replying to a number of comments making that particular misconception, since it is particularly common, but upon taking a closer look, you were saying something else. I apologize!

As for the difficulty in applying these ideas in other languages, I am sympathetic. The problem I always run into is that there is necessarily a tension between (a) presentations that are accessible to working programmers, (b) explanations that distill the essential ideas so they aren’t coupled to particular languages or language features, and (c) examples small enough to be clarifying and to fit in a blog post. Haskell is certainly not the best choice along that first axis, but it is quite exceptionally good along the second two.

For a somewhat concrete example of what I mean, see this comment I wrote a few years ago that translates the NonEmpty example into Java: https://news.ycombinator.com/item?id=21478322 I think the added verbosity and added machinery really does detract significantly from understanding. Meanwhile, a TypeScript translation would make a definition like this one quite tempting:

    type NonEmpty<T> = [T, ...T[]]
However, I find this actually obscures application of the technique because it doesn’t scale to more complex examples (for the reasons I discussed at quite some length in https://lexi-lambda.github.io/blog/2020/08/13/types-as-axiom...).

There are probably ways to thread this needle, but I don’t think any one “solution” is by any means obviously the best. I think the ways that other people have adapted the ideas to their respective ecosystems is probably a decent compromise.

203. lexi-lambda ◴[] No.35068057{12}[source]
Well, like I said, the subject is extremely broad, so it is difficult to give concrete suggestions without knowing what specifically you’d like to get into. But I can give some potential options.

If you’d like to learn Haskell, I think https://www.cis.upenn.edu/~cis1940/spring13/ is still a pretty nice resource. It is quick and to the point, and it provides some exercises to work through. There are lots of things in the Haskell ecosystem that you could explore if you wanted to after getting a handle on the basics.

If you want to learn about programming languages and type systems, you could read Programming Languages: Application and Interpretation (https://cs.brown.edu/courses/cs173/2012/book/), which has a chapter on type systems. Alternatively, if you want a more thorough treatment of type systems, you could read Types and Programming Languages by Benjamin Pierce. However, both PLAI and TAPL are textbooks, and they are primarily intended to be used as supporting material in a university course with an instructor. I think PLAI is relatively accessible, but TAPL is more likely to be a challenge without some existing background in programming languages.

replies(1): >>35069344 #
204. lexi-lambda ◴[] No.35068165{6}[source]
Yes, certainly. Constructive data modeling is useful for many things, but it’s not a panacea. Other techniques are useful in cases where it’s not practical; I discuss some of the tradeoffs in this followup post: https://lexi-lambda.github.io/blog/2020/11/01/names-are-not-...
205. jimbokun ◴[] No.35068654{6}[source]
I’m confused, if the RFC does not accurately model HTTP requests, what does?
replies(1): >>35071251 #
206. PartiallyTyped ◴[] No.35069344{13}[source]
The textbooks are exactly what I needed! Thank you!
207. epolanski ◴[] No.35070550{5}[source]
Hi lexi.

Just wanted to say that fp-ts (now effect-ts, a ZIO port to TypeScript) author Giulio Canti is a great fan of your "parse don't validate" article. He's linked it many times in the TypeScript and functional programming channels (such as the fp slack).

Needless to say, both fp-ts-derived io-ts library and effect-ts library schema[1] are obviously quite advanced parsers (and in case of schema, there's decoding, encoding, APIs, guard, arbitrary and many other nice things I haven't seen in any functional language).

[1]https://github.com/Effect-TS/schema

208. epolanski ◴[] No.35070563{4}[source]
You can also simply parse with a type guard in typescript.

Or do something more advanced like implement Decoders/Encoders.

209. thanatropism ◴[] No.35070695{8}[source]
Since by now we're arguing style, you would achieve your purposes (whatever they are) if you completed certain aggressive sentences, for example:

> You come off as a crank. [... because of X, Y ,Z ]

..

> Please be better. [in the following manner: ... even if takes summarizing what was said]

210. aidenn0 ◴[] No.35071251{7}[source]
Your customers define what an HTTP request is.

To be less snarky, the RFC defines what a well-formed HTTP request is. In the wild there are a lot of malformed HTTP requests that business cases may require handling.

replies(1): >>35090904 #
211. leetrout ◴[] No.35071493{5}[source]
That sounds miserable. Is there blog post or something with more details that supports this? I might be having a knee jerk reaction because I can't imagine something like this being easy to work with and maintain but I recognize you were just giving a trivial example.
replies(1): >>35074824 #
212. lmm ◴[] No.35074824{6}[source]
This was a blog example I saw a few years ago, it went through making a calculator program without using ifs. Looked pretty nice. I can't find it now though.
213. Quekid5 ◴[] No.35075015{7}[source]
I do think you can... just via phantom type parameters and type-level programming. In Scala you'd probably use Refined.

(But I'm not expert, admittedly, and I isn't an actual problem of much consequence in practical programming in Haskell or Scala. Opaque types do the 80% bit of 80-20 just fine.)

replies(1): >>35075984 #
214. dwohnitmok ◴[] No.35075984{8}[source]
You can't with phantom type parameters and type-level programming alone, although you can get close. Scala's and Haskell's Refined both don't let you do what I'm thinking of.

You can get very close with type-level sets although at this point compile times probably go through the roof. You're basically emulating row types at this point.

  def wrapIntoRefined(str: String): Refined[String, Unit]

  def validate0[A](str: Refined[String, A]): Either[Error, Refined[String, And[Condition0, A]]]

  def validate1[A](str: Refined[String, A]): Either[Error, Refined[String, And[Condition1, A]]]

  // This requires ordering Condition0 before Condition1 but if we resorted 
  // to a type-level set we could get around that problem
  def process(input: Refined[String, And[Condition1, And[Condition0, Unit]]]): Unit

  // But linearity is still required in some sense. We can't e.g. do our checks
  // in a parallel fashion. You still need to pipe one function right after another
The central problem is if you have two validation functions

  def validate0(str: String): Refined[String, Condition0]

  def validate1(str: String): Refined[String, Condition1]
if you try to recombine them downstream, you don't know that `Refined[String, Condition0]` and `Refined[String, Condition1]` actually refer to the same underlying `String`. They could be refined on two completely separate strings. To tie them to a single runtime String requires dependent types.

You can approximate this in Scala with path-dependent types, but it's very brittle and breaks in all sorts of ways.

> isn't an actual problem of much consequence in practical programming in Haskell or Scala. Opaque types do the 80% bit of 80-20 just fine.

I think this is only true because there isn't a production-ready dependently typed language to show how to use these patterns effectively. In much the same way that "parse don't validate" isn't really much of a problem of consequence in older style Java code because sum types aren't really a thing, if there was an ergonomic way of taking advantage of it, I firmly believe these sorts of dependently typed tagged types would show up all over the place.

replies(1): >>35103371 #
215. bruce343434 ◴[] No.35079571{3}[source]
Sum types are much more than just error handling though. They can be used to describe any structure made of substructures, where you have _multiple kinds_ of substructures. The type of such a structure is the _sum_ of all the substructure types.
216. jimbokun ◴[] No.35090904{8}[source]
I suppose the “parse don’t validate” philosophy would recommend first transforming the I’ll formed request into a data structure that only models well formed requests, before it’s processed by any other part of the program.
217. ckdot2 ◴[] No.35095782{5}[source]
I think there's a misunderstanding here. I'm not only saying "use JSON schema files to define your expected payloads", I'm also saying, "use one of the existing JSON schema validators for your programming language". For most programming languages there's a library for that. So, if you don't need to write any code anymore that "respects that schema", the whole discussion becomes kind of obsolete.
218. ckdot2 ◴[] No.35095875{3}[source]
OK, there's a lot of discussions about if booleans, Eithers/Optionals or exceptions should be used. And I'd say it's probably best to use what's most common in the programming language to be used. What I'm saying here is, I don't want to implement that method "checkAgainstMySchema". Because I know there's already a library for that.
replies(1): >>35109184 #
219. Quekid5 ◴[] No.35103371{9}[source]
> I think this is only true because there isn't a production-ready dependently typed language [...]

Now this I definitely agree with. I want to see what's possible!

220. hansvm ◴[] No.35105217{5}[source]
https://github.com/hmusgrave/pdv
221. chriswarbo ◴[] No.35109184{4}[source]
> I'd say it's probably best to use what's most common in the programming language to be used

Sure, I agree (or perhaps: what's considered "best practice"; or whatever our existing codebase is doing)

> there's a lot of discussions about if booleans, Eithers/Optionals or exceptions should be used

That's just an implementation detail, and misses the point. For example, all of those can be used to 'validate'; e.g.

- A function/method 'v1: JSON -> Boolean'

- A function/method 'v2: JSON -> JSON', which may throw exceptions

- A function/method 'v3: JSON -> Optional JSON'

- A function/method 'v4: JSON -> Either Error JSON'

The reason these are all bad has nothing to do with the language features or error-handling mechanisms employed. The reason they are bad is that they are all completely unnecessary.

For example, here are a bunch of programs which the above validators. They're all essentially equivalent, and hence have the same fundamental flaw:

  function trigger1(userInput: JSON) {
    if (v1(userInput)) {
      print "UNAUTHORISED, ABORTING"
      sys.exit(1)
    }
    else {
      launchMissiles(authorisation=userInput)
    }
  }

  function trigger2(userInput: JSON) {
    try {
      launchMissiles(authorisation=v2(userInput))
    }
    catch {
      print "UNAUTHORISED, ABORTING"
      sys.exit(1)
    }
  }

  function trigger3(userInput: JSON) {
    v3(userInput) match {
      case None => {
        print "UNAUTHORISED, ABORTING"
        sys.exit(1)
      }
      case Some(validated) => {
        launchMissiles(authorisation=validated)
      }
    )
  }

  function trigger4(userInput: JSON) {
    v3(userInput) match {
      case Left(error) => {
        print ("UNAUTHORISED, ABORTING: " + error)
        sys.exit(1)
      }
      case Right(validated) => {
        launchMissiles(authorisation=userInput)
      }
    )
  }
The reason they're all flawed is that validation can be skipped. In other words, you can write any validation logic; implemented with any mechanism you like; in any language; but your colleague's codde might never call it! All of the above 'trigger' functions could be replaced by this, and it will still work:

  function trigger(userInput: JSON) {
    launchMissiles(authoriser=userInput)
  }
In contrast, the 'parse' approach cannot be skipped. Here are some examples:

- A function/method 'p1: JSON -> Either Error MyJSON'

- A function/method 'p2: JSON -> Optional MyJSON'

- A function/method 'p3: JSON -> MyJSON', which may throw exceptions

Here are their corresponding 'trigger' functions:

  function trigger5(userInput: JSON) {
    p1(userInput) match {
      case Left(error) => {
        print ("UNAUTHORISED, ABORTING: " + error)
        sys.exit(1)
      }
      case Right(parsed) => {
        launchMissiles(authorisation=parsed)
      }
    }
  }

  function trigger6(userInput: JSON) {
    p2(userInput) match {
      case None => {
        print "UNAUTHORISED, ABORTING"
        sys.exit(1)
      }
      case Some(parsed) => {
        launchMissiles(authorisation=parsed)
      }
    }
  }

  function trigger7(userInput: JSON) {
    try {
      launchMissiles(authorisation=p3(userInput))
    }
    catch {
      print ("UNAUTHORISED, ABORTING: " + error)
      sys.exit(1)
    }
  }
These alternatives are much safer, since the 'launchMissiles' function now takes a 'MyJSON' value as argument; so we can't do `launchMissiles(authorisation=userInput)` (since 'userInput' has the type JSON, which isn't a valid input). Our colleages cannot skip or forget to call these p1/p2/p3 functions, since that's they only way they can turn the 'userInput' value they have, into a 'MyJSON' value they need.

> I don't want to implement that method "checkAgainstMySchema". Because I know there's already a library for that.

No, there isn't. I think you may be confused about how such 'parser' functions should be implemented. Nobody is saying to ignore existing libraries, or roll our own JSON grammars, or whatever. It's purely about how your project's datatypes are constructed. For example, something like this:

  function parseMyThing(json: JSON) {
    if (SomeExistingJSONSchemaLibrary.validate(json, SomeParticularSchemaMyApplicationIsUsing)) {
      return Right(SomeDatatypeIHaveWritten(...))
    }
    else {
      // Or use exceptions, or Optional, or whatever; it doesn't matter
      return Left("Invalid")
    }
  }
(If all of your project's datatypes, schemas, class, etc. were already provided by some existing library, then that project would be a bit pointless!)
222. nicky0 ◴[] No.35154724{5}[source]
No, but I'm a heavy Zod user so ArkType it looks interesting. Thanks for the tip!

Are there any compelling reasons to switch apart from the difference in syntax?

223. przemo_li ◴[] No.35155316{5}[source]
Valid basket does not imply valid order.

Sometimes just splitting stuff into more types that are appropriate to specific portion of pipeline is a win.