Most active commenters

crabbone(16)
PartiallyTyped(10)
lexi-lambda(9)
chowells(4)
ParetoOptimal(4)
mrkeen(3)
lmm(3)

Popular/hot comments

>>35054514 #
>>35054157 #
>>35054070 #
>>35054456 #
>>35055225 #
>>35054298 #
>>35054351 #
>>35055657 #

←back to thread

Parse, don't validate (2019)

(lexi-lambda.github.io)

1. bruce343434 ◴[07 Mar 23 10:54 UTC] No.35053912[source]▶

>>35053118 (OP) #

Note that this basically requires your language to have ergonomic support for sum types, immutable "data classes", pattern matching.

The point is to parse the input into a structure which always upholds the predicates you care about so you don't end up continuously defensively programming in ifs and asserts.

replies(12): >>35054046 #>>35054070 #>>35054386 #>>35054514 #>>35054901 #>>35054993 #>>35055124 #>>35055230 #>>35056047 #>>35057866 #>>35058185 #>>35059271 #

2. mtlynch ◴[07 Mar 23 11:13 UTC] No.35054046[source]▶

>>35053912 (TP) #

I get a lot of value from this rule even without those language features.

I follow "Parse, Don't Validate" consistently in Go. For example, if I need to parse a JSON payload from an end-user for Foo, I define a struct called FooRequest, and I have exactly one function that creates a FooRequest instance, given a JSON stream.

Anywhere else in my application, if I have a FooRequest instance, I know that it's validated and well-formed because it had to have come from my FooRequest parsing function. I don't need sum types or any special language features beyond typing.

replies(1): >>35054157 #

3. jim-jim-jim ◴[07 Mar 23 11:15 UTC] No.35054070[source]▶

>>35053912 (TP) #

I think we'll eventually come to regard `if` as we do `goto`.

replies(4): >>35054298 #>>35054351 #>>35054456 #>>35054814 #

4. jotaen ◴[07 Mar 23 11:28 UTC] No.35054157[source]▶

>>35054046 #

My main take-away is the same, I wonder though whether “parse, don’t validate” is the right term for it. To me, “parse, don’t validate” somehow suggests that you should do parsing instead of validation, but the real point for me is that I still validate (as before), plus I “capture”/preserve validation success by means of a type.

replies(8): >>35054350 #>>35054377 #>>35054626 #>>35054751 #>>35055151 #>>35055232 #>>35055382 #>>35056979 #

5. leetrout ◴[07 Mar 23 11:48 UTC] No.35054298[source]▶

>>35054070 #

Using pattern matching instead or something else?

replies(3): >>35054371 #>>35054472 #>>35063455 #

6. qsort ◴[07 Mar 23 11:57 UTC] No.35054350{3}[source]▶

>>35054157 #

It's in the same sense of "whitelist, don't blacklist", or "by the love of god it's 2023, do not escape SQL".

Don't define reasons why the input is invalid, instead have a target struct/object, and parse the input into that object.

replies(1): >>35055225 #

7. quchen ◴[07 Mar 23 11:58 UTC] No.35054351[source]▶

>>35054070 #

If is semantically the only way to deconstruct a Boolean in any language, so as long as you have bools, you’re going to have `if`. Sure you can give if different syntax and write it with match/case/?:/whatever, but that’s not what we did to goto: introducing different language constructs to capture common useful use cases like try/catch, loops, and else-less ifs.

replies(3): >>35054547 #>>35054918 #>>35055288 #

8. Joeri ◴[07 Mar 23 12:00 UTC] No.35054371{3}[source]▶

>>35054298 #

By using reactive programming techniques the program can be approached as a set of data streams mapping input to output, and conditional behavior becomes the application of different filters and combiners on the streams. This dovetails nicely with functional programming, which allows generic expression and reuse of those stream operations.

9. lkitching ◴[07 Mar 23 12:01 UTC] No.35054377{3}[source]▶

>>35054157 #

The post is suggesting that parsing and validation are different things, since the output of a parser captures the properties being checked in the type, and validation does not. Downstream consumers of validated input cannot rely on the properties that were validated since the representation type doesn't encode them e.g. the non-emptiness of a list.

10. ocharles ◴[07 Mar 23 12:03 UTC] No.35054386[source]▶

>>35053912 (TP) #

This isn't strictly true, an alternative is to have a language with enough encapsulation that you can parse into something that can only be observed as correct. The underlying parsing doesn't have to parse into sum types, provided your observation functions always preserve the parsed invariants.

11. raincole ◴[07 Mar 23 12:11 UTC] No.35054456[source]▶

>>35054070 #

I don't know about this. We all have seen this kind of code:

    if(!needToDoTheThing()) return;
    
    DoTheThing();

We could have written it this way:

    if(needToDoTheThing()) {
        DoTheThing();
    }
    else {
        return;
    }

The later is closer to how pattern match looks like. But in my experience, the majority of programmers prefer early return. I regularly see people "refactor" if-else to if-early-return, but I've never seen the opposite.

replies(4): >>35054651 #>>35054833 #>>35065147 #>>35065313 #

12. oslac ◴[07 Mar 23 12:13 UTC] No.35054472{3}[source]▶

>>35054298 #

Not a perfect example, but this can be seen (pattern match replacing if) with Kotlin's when.

13. crabbone ◴[07 Mar 23 12:18 UTC] No.35054514[source]▶

>>35053912 (TP) #

It's not just about these limitations.

In order to be useful, type systems need to be simple, but there's no such restrictions on rules that govern our expectations of data correctness.

OP is delusional if they think that their approach can be made practical. I mean, what if the expectation from the data that an value is a prime number? -- How are they going to encode this in their type systems? And this is just a trivial example.

There are plenty of useful constraints we routinely expect in message exchanges that aren't possible to implement using even very elaborate type systems. For example, if we want to ensure that all ids in XML nodes are unique. Or that the last digit of SSN is a checksum of the previous digits using some complex formula. I mean, every Web developer worth their salt knows that regular expressions are a bad idea for testing email addresses (which would be an example of parsing), and it's really preferable to validate emails by calling a number of predicates on them.

And, of course, these aren't the only examples: password validation (the annoying part that asks for capital letter, digit, special character? -- I want to see the author implement a parser to parse possible inputs to password field, while also giving helpful error messages s.a. "you forgot to use a digit"). Even though I don't doubt it's possible to do that, the resulting code would be an abomination compared to the code that does the usual stuff, i.e. just checks if a character is in a set of characters.

replies(10): >>35054557 #>>35054562 #>>35054640 #>>35054916 #>>35054920 #>>35055046 #>>35055734 #>>35055902 #>>35056302 #>>35057473 #

14. palotasb ◴[07 Mar 23 12:25 UTC] No.35054547{3}[source]▶

>>35054351 #

To nitpick and to show a cool lambda calculus thing, you can deconstruct booleans if you define booleans and if statements the following way, using only pure functions.

  def TRUE(a, b):
    return a

  def FALSE(a, b):
    return b

  def IF(cond, a, b):
    return cond(a, b)

  assert IF(TRUE, 1, 2) == 1
  assert IF(FALSE, 1, 2) == 2

This gives you the conditional statement in most languages ("cond ? a : b" or "a if cond else b").

replies(2): >>35054736 #>>35054747 #

15. flupe ◴[07 Mar 23 12:26 UTC] No.35054557[source]▶

>>35054514 #

Both your examples (is my number prime, are my XML nodes unique) are easily expressed in a dependently-typed language.

Dependent type checkers may be hard to implement, but the typing rules are fairly simple, and people have been using this correct by construction philosophy using dependently-typed languages for a while now.

There's nothing delusional about that.

replies(1): >>35056718 #

16. ollysb ◴[07 Mar 23 12:28 UTC] No.35054562[source]▶

>>35054514 #

You can use opaque types to encode constraints that the type system isn't able to express. That way you can have factory functions that apply any logic that's required before allowing construction of the opaque type. Now whenever that opaque type is referred to there's a guarantee that the data it contains satisfies your desired constraint.

replies(1): >>35056632 #

17. autophagian ◴[07 Mar 23 12:36 UTC] No.35054626{3}[source]▶

>>35054157 #

I suppose that every successful parse also has an implicit validation step in it. Like you said, for me the principle is more about embedding that validation step into an actual type rather than crossing my fingers and hoping that whatever's coming into my function is what I expect it to be.

18. ocharles ◴[07 Mar 23 12:38 UTC] No.35054640[source]▶

>>35054514 #

> OP is delusional if they think that their approach can be made practical. I mean, what if the expectation from the data that an value is a prime number? -- How are they going to encode this in their type systems? And this is just a trivial example.

People get too caught up in thinking that the type _has_ to express intricate properties, it doesn't. How am I going to express the expectation that something is prime? With the following closed API:

  module Prime where

  data PrimeNumber

  parsePrime :: Int -> Maybe PrimeNumber
  toInt :: PrimeNumber -> Int

Now the problem is that _leaving_ this API forgets information. Whether or not that is a problem is a different question, and very dependent on the context.

The same applies to your comment about passwords. One can quite easily create a closed module that encapsulates a ValidPassword type that simply performs runtime character tests on a string.

I want to stress that this approach is making a trade off (as I earlier mentioned about leaving the API forgetting information, forcing you to re-parse). However, this puts this design somewhere in the middle of the spectrum. At one extreme end we have primitive obsession and shotgun parsing everywhere, with this we push the parsing into a sane place and try and hold on to these parsed values as long as possible, and at the extreme end we need dependent types or sophisticated encodings where the value carries a lot more information (and here we get towards propositions as types)

replies(2): >>35056482 #>>35064605 #

19. RHSeeger ◴[07 Mar 23 12:39 UTC] No.35054651{3}[source]▶

>>35054456 #

I prefer the former. It separates the pre-conditions from the algorithm/logic, using gate clauses. I find this makes it easier to reason about the algorithm.

replies(1): >>35055047 #

20. quchen ◴[07 Mar 23 12:51 UTC] No.35054736{4}[source]▶

>>35054547 #

Church encoding is pretty cool, yes! It encodes Booleans such that »if == id«. Likewise, natural numbers are essentially counting for-loops: 3 f x = f (f (f x)), so »for == id«.

I had to work with this for a while because I wanted to implement Hello World in Javascript. https://github.com/quchen/lambda-ski/blob/master/helloworld/...

21. Joker_vD ◴[07 Mar 23 12:53 UTC] No.35054747{4}[source]▶

>>35054547 #

You can do this same trick with any algebraic type, honestly (modulo lazyness):

    ## type Silly = Foo | Bar Int | Qux String Silly

    ## Constructors

    def Foo(onFoo, onBar, onQux):
        return onFoo()

    def Bar(arg0):
        return lambda onFoo, onBar, onQux: onBar(arg0)

    def Qux(arg0, arg1):
        return lambda onFoo, onBar, onQux: onQux(arg0, arg1)

    ## Values of Silly type are Foo, Bar(x) and Qux(x, y)

    ## Destructor

    def match_Silly(silly, onFoo, onBar, onQux):
        return silly(onFoo, onBar, onQux)

You can make a whole language on top of that if you don't mind effectively disabling your CPU's branch predictor.

22. masklinn ◴[07 Mar 23 12:53 UTC] No.35054751{3}[source]▶

>>35054157 #

TFA does explain what they mean:

> in my mind, the difference between validation and parsing lies almost entirely in how information is preserved

“parse don’t validate” is a pithy and easy to remember maxim for this preservation.

Because validation is implicitly necessary for parsing to a representation which captures your invariants anyway, by banning validation as a separate concept you ensure sole validation doesn’t get reintroduced, because any validation step outside of a wider parsing process is considered incorrect.

23. randomdata ◴[07 Mar 23 13:00 UTC] No.35054814[source]▶

>>35054070 #

Bridled, so that it doesn’t suffer the problems Dijkstra spoke of? Wouldn’t you say they both already are in modern languages?

24. pjc50 ◴[07 Mar 23 13:01 UTC] No.35054833{3}[source]▶

>>35054456 #

It keeps the code closer to the left. It also keeps it conceptually simpler if you can discard a bunch of "obvious" cases early on.

replies(1): >>35055215 #

25. Thaxll ◴[07 Mar 23 13:08 UTC] No.35054901[source]▶

>>35053912 (TP) #

You're describing deserialization in a strong typing language, sometimes it's not enough, ok your email went to an empty string which is useless.

replies(2): >>35054943 #>>35055012 #

26. PartiallyTyped ◴[07 Mar 23 13:10 UTC] No.35054916[source]▶

>>35054514 #

> I mean, what if the expectation from the data that an value is a prime number? -- How are they going to encode this in their type systems? And this is just a trivial example.

In TypeScript we can define

    type Prime = number

    function isPrime(value: number) value is Prime {
        // run sieve
    }

From here, you may have e.g.

    function foo(value: Prime, ...) {

    }

And it will be typed checked.

    function fooOrFail(v: number) {
        if (isPrime(v))
            foo(v)
        else 
            throw new TypeError()
    }

replies(2): >>35055148 #>>35056821 #

27. chriswarbo ◴[07 Mar 23 13:10 UTC] No.35054918{3}[source]▶

>>35054351 #

I agree; although there's a related problem of "boolean blindness" https://existentialtype.wordpress.com/2011/03/15/boolean-bli...

I'd summarise boolean blindness as: implicit (often unsafe) coupling/dependencies of method results; which could instead be explicit data dependencies. That article's example is 'plus x y = if x=Z then y else S(plus (pred x) y)', which uses an unsafe 'pred' call that crashes when x is 'Z'. It avoids the crash by branching on an 'x=Z' comparison. The alternative is to pattern-match on x, to get 'Z' or 'S x2'; hence avoiding the need for 'pred'.

Another alternative is to have 'pred' return 'Maybe Nat'; although that's less useful when we have more constructors and more data (e.g. the 'NonEmptyList' in this "parse, don't validate" article!)

28. thanatropism ◴[07 Mar 23 13:10 UTC] No.35054920[source]▶

>>35054514 #

The spirit of "parse, don't validate" is -- do all those things (SSN checksums, whatever) at the point where data enters the system, not at the point where it's used.

replies(1): >>35056790 #

29. Kinrany ◴[07 Mar 23 13:14 UTC] No.35054943[source]▶

>>35054901 #

You can have a ValidEmail type that performs all the checks on construction.

replies(1): >>35055657 #

30. PartiallyTyped ◴[07 Mar 23 13:20 UTC] No.35054993[source]▶

>>35053912 (TP) #

I don't think sum types is necessary. TypeScript's gradual typing should suffice to capture this.

replies(1): >>35060447 #

31. nicky0 ◴[07 Mar 23 13:22 UTC] No.35055012[source]▶

>>35054901 #

With Zod: const info = z.object({ name: z.string().min(1), email: z.string().email() }

replies(1): >>35058737 #

32. cjfd ◴[07 Mar 23 13:26 UTC] No.35055046[source]▶

>>35054514 #

  class Prime
  {
  public:
     Prime(int p): p(p)
     {
        if (!is_prime(p))
           throw std::runtime_error("Number was not a prime!");
     }

     int get_value() const
     {
        return p;
     }

   private:
      int p;
   }

replies(2): >>35056161 #>>35056800 #

33. Timon3 ◴[07 Mar 23 13:26 UTC] No.35055047{4}[source]▶

>>35054651 #

It's much nicer, especially since it keeps the complexity down.

If you nest if/else, you'll quickly approach a point where you have to keep a complex logic tree in your head to determine which states the system could be in inside of any given branch. If you use guard clauses and return early, you'll keep this complexity down to a minimum, since the list of possible states changes linearly with your code instead of exponentially.

I know not everybody likes it, but I think this makes cyclomatic complexity an extremely valuable metric for measuring "ease-of-reading".

34. jameshart ◴[07 Mar 23 13:33 UTC] No.35055124[source]▶

>>35053912 (TP) #

Those aren’t exactly rare features these days though. Beyond the functional world, TypeScript, C# and Java all have them to some extent, so it’s basically conquered all the mainstream object oriented languages. There’s even a proposal to add pattern matching to C++.

35. teo_zero ◴[07 Mar 23 13:36 UTC] No.35055148{3}[source]▶

>>35054916 #

If foo() happily accepts a number instead of a Prime, then this is not robust enough: you can always forget the check!

It's the compiler that should warn you about a wrong type.

replies(1): >>35056440 #

36. friendzis ◴[07 Mar 23 13:37 UTC] No.35055151{3}[source]▶

>>35054157 #

> suggests that you should do parsing instead of validation

Kind of yes, but this discussion is much dependent on definitions of `parse` and `validate`, which the article does not explicitly elaborate on. The chapter "The power of parsing" captures this difference implicitly "validateNonEmpty always returns (), the type that contains no information". Validation, in the context of all of this, can be defined as "checking conformance to a set of rules" while parsing is mostly synonymous with deserialization.

In most practical application you explicitly do not want to only validate inputs as in you have no need to perform any computation on invalid input anyway. Sometimes you explicitly want to analyze invalid inputs, maybe try and recover some information or do some other magic. Sure then, go and validate input and do that magic on invalid input. In most cases you want to simply reject invalid inputs.

However, when you think about it, that is what parsing does. Validation happens during parsing implicitly: parser will either return a valid object or throw an error, but parsing has an added benefit that the end result is a datum of a known datatype. Of course it only really works in statically typed languages.

The thing is that it is rather easy to conflate the two. Take for example the following JSON `{"foo": "bar", "baz": 3}`. A "parser" can return 1) a list of `(String, String, String)` 3-tuples for (type, key, value) that downstream has to process again 2) full blown `FoobarizerParams` object or something in between.

37. jakelazaroff ◴[07 Mar 23 13:43 UTC] No.35055215{4}[source]▶

>>35054833 #

Yup, this is my exact rationale for preferring this too. Branches are a significant source of complexity and early returns are one way to tame it — have the “meat” of the function deal with as few invariants as possible.

38. blincoln ◴[07 Mar 23 13:45 UTC] No.35055225{4}[source]▶

>>35054350 #

I like this explanation and approach, but how does it solve the first problem described in the article - the case where there's an array being processed that might be empty?

There are plenty of cases in real-world code where an array that's part of a struct or object may or may not contain any elements. If you're just parsing input into that, it seems like you'd either still end up doing an equivalent of checking whether the array is empty or not everywhere the array might be used later, even if that check is looking at an "array has elements" type flag in the struct/object, and so you're still maintaining a description of ways that the input may be invalid. But I'm not a world-class programmer, so maybe I'm missing something. Maybe you mean something like for branches of the code that require a non-empty array, you have a second struct/object and parser that's more strict and errors out if the array is empty?

replies(4): >>35055647 #>>35057974 #>>35058114 #>>35063412 #

39. armchairhacker ◴[07 Mar 23 13:45 UTC] No.35055230[source]▶

>>35053912 (TP) #

I wish more languages had some equivalent of records, tagged unions, and pattern matching.

Don't have to be 100% immutable or perfect ADTs: see Rust, Swift, Kotlin. Even TypeScript can do this, albeit it's uglier with untagged unions and flow typing instead of pattern matching.

40. ghusbands ◴[07 Mar 23 13:45 UTC] No.35055232{3}[source]▶

>>35054157 #

It's not just about the validation success, but about having only one bit of code consuming the looser input and producing a definitely-correct output. If you simply validate and preserve success, you still later need to produce the output you need, and it's hard to be sure that the earlier validation and the later parsing actually agree on what is valid.

If you're talking about consuming the looser input and producing a definitely-correct output, already, then you're talking about parsing, not validation. Most validation occurs naturally during parsing.

41. jakelazaroff ◴[07 Mar 23 13:51 UTC] No.35055288{3}[source]▶

>>35054351 #

There are concepts like filtering that let you operate on booleans without branching:

   const published = posts.filter(post => !post.draft);

42. b0afc375b5 ◴[07 Mar 23 14:00 UTC] No.35055382{3}[source]▶

>>35054157 #

How about "Parsing IS validation"?

43. strgcmc ◴[07 Mar 23 14:24 UTC] No.35055647{5}[source]▶

>>35055225 #

Remember, the author of the article constructed a scenario where, it was expected that the "main" function ends up treating an empty "CONFIG_DIRS" input as an uncatchable IOError; in other words, an empty array was invalid/not-allowed, per the rules of this program. Depending on the context in which you are operating, you may or may not have similar rules or requirements to follow.

Empty lists are actually generally not a big deal - they are just lists of size 0, and they generally follow all the same things you can do with non-empty lists. The fact that a "head" function throws an error on an empty list, is really just a specific form of the more general observation that: any array would throw an index-out-of-bounds exception when given an index that's... out of of bounds. So any time you are dealing with arrays, you probably need to think about, "what happens if I try to index something that's out of bounds? is that possible?"

In this particular contrived example, all that mattered was the head of the array. But what if you wanted to pick out the 3rd argument in a list of command line arguments, and instead the user only gave you 2 inputs? If 3 arguments are required, then throw an IOError as early as possible after failing to parse out 3 arguments; but once you pass the point of parsing the input into a valid object/struct/whatever, from that point forward you no longer care about checking whether the 3rd input is empty or not.

So again, it depends on your scenario. Actually the more interesting variant of this issue (in OO languages at least) is probably handling nulls, as empty lists are valid lists, but nulls are not lists, and requires some different logic usually (and hence why NullPointerExceptions aka NPEs are such a common failure mode).

replies(1): >>35059448 #

44. Thaxll ◴[07 Mar 23 14:25 UTC] No.35055657{3}[source]▶

>>35054943 #

OP said to not validate.

replies(3): >>35055815 #>>35056190 #>>35060605 #

45. piaste ◴[07 Mar 23 14:32 UTC] No.35055734[source]▶

>>35054514 #

Password validation is a degenerate case of parsing, where your parsed type does not contain any more information than your unparsed type - both are just opaque strings.

(In fact you could use an invalid password just fine: unless you're doing something really weird, your code would not misbehave because it's too short or missing digits and symbols. It's only due to security reasons that you choose to reject that string.)

But that doesn't mean that `string -> Password` isn't parsing! As long as you're outputting distinct types for ValidPassword and InvalidPassword, you are still following the advice of this article, because you can make all your internal code use ValidPassword as a type and you will not need to ever check for password validity again.*

Compare that to e.g. adding a { IsValid = true } field to the object, which would require you to defensively sprinkle `if (user.Password.IsValid)` every time you try to actually use the password field.

* One weakness arising from the fact that this is degenerate parsing, i.e. ValidPassword is just a string, is that a very stubborn fool could build a ValidPassword from any arbitrary string instead of using the proper parse function. Depending on your language, this can be prevented by e.g. hiding the constructor so that only parsePassword has access to it.

replies(1): >>35056585 #

46. lkitching ◴[07 Mar 23 14:39 UTC] No.35055815{4}[source]▶

>>35055657 #

The OP is contrasting between a 'validation' function with type e.g.

    validateEmail :: String -> IO ()

and a 'parsing' function

    validateEmail :: String -> Either EmailError ValidEmail

The property encoded by the ValidEmail type is available throughout the rest of the program, which is not the case if you only validate.

47. PhilipRoman ◴[07 Mar 23 14:47 UTC] No.35055902[source]▶

>>35054514 #

I think your comment was unfairly downvoted without objective reasons. This is a real issue with advanced type systems and the current solutions are not very good (although they can be practical in some cases) - you can either automatically decorate constructors with assertion code (slow) or trust external input (unsafe, something like __builtin_unreachable in C). And after you're done with that, good luck getting a deterministic and fast type checker which can verify proofs (which you have to write yourself) about arbitrary theorems in your program. Yes, I'm aware there exist languages that can do this to a degree but there is a good reason why they aren't used in mainstream software.

I genuinely wonder how one would write a proof in something like Agda, that

    parseJson("{foo:"+encodeJson(someObject)+"}")

always succeeds

replies(1): >>35060965 #

48. swsieber ◴[07 Mar 23 14:59 UTC] No.35056047[source]▶

>>35053912 (TP) #

> The point is to parse the input into a structure which always upholds the predicates you care about so you don't end up continuously defensively programming in ifs and asserts.

While the article is titled "parse don't validate" I like it's first point of make illegal states unrepresentable much better.

49. asimpletune ◴[07 Mar 23 15:10 UTC] No.35056161{3}[source]▶

>>35055046 #

Just wanted to add that in some languages you could have a makePrime function that takes an int and returns a maybe[Prime]. If you don't make the constructor public this works perfect, as there is essentially no way to get a Prime without going through the pathways that library author relies upon. This is a pattern that's used in Scala a lot anyways.

50. mrkeen ◴[07 Mar 23 15:13 UTC] No.35056190{4}[source]▶

>>35055657 #

That's fine.

Validation would be:

   Email email = new Email(anyString);
   email.validate();

Parsing (in OP's context) would be:

   Either<Error, ValidEmail> eEmail = Email.from(anyString);

51. mrkeen ◴[07 Mar 23 15:22 UTC] No.35056302[source]▶

>>35054514 #

> password validation (the annoying part that asks for capital letter, digit, special character? -- I want to see the author implement a parser to parse possible inputs to password field, while also giving helpful error messages s.a. "you forgot to use a digit").

This is what Applicative Functors were born to do. Here's a good article on it: https://www.baeldung.com/vavr-validation-api

Check the types:

    public Validation<Seq<String>, User> validateUser(...)

Even though it's called "validation", it's still the approach the OP recommends.

It reads as "If you have a Seq of Strings, you might be able to construct a User, or get back validation errors instead".

Contrast this with the wrong way of doing things:

    User user = new User(seqOfStrings);
    user.validate();

replies(1): >>35056528 #

52. PartiallyTyped ◴[07 Mar 23 15:32 UTC] No.35056440{4}[source]▶

>>35055148 #

Ah, you are right, i was under the impression that the compiler made it more specific.

My bad.

53. crabbone ◴[07 Mar 23 15:36 UTC] No.35056482{3}[source]▶

>>35054640 #

Your type doesn't describe prime numbers. You just named it "prime number", but there's no proof or any other guarantee that it's a prime number.

> People get too caught up in thinking that the type _has_ to express intricate properties

Where do you get this from? Did you even read what you are replying to? I never said anything like that... What I'm saying is that the approach taken by OP is worthless when it comes to real-life uses of validation.

So, continuing with your example: you will either end up doing validation instead of parsing (i.e. you will implement parsePrime validator function), or your will not actually validate that your input is a prime number... The whole point OP was trying to make is that they wanted to capture the constraints on data in a type describing those constraints, but outside of trivial examples s.a. non-empty list, that's used by OP, that leads to programs that are either impossible or are extremely complex.

> One can quite easily create a closed module that encapsulates a ValidPassword

And, again, that would be doing _validation_ not parsing. I'm not sure if you even understand what the conflict here is, or are you somehow agreeing with me w/o saying so?

replies(1): >>35058705 #

54. crabbone ◴[07 Mar 23 15:39 UTC] No.35056528{3}[source]▶

>>35056302 #

No. it's not the approach OP recommends. And that's why it's called validation. I have no idea why would you question that. OP wants to capture constraints on data as ML-style types. But, ML-style types have very limited expressive power, and, when it comes to real-life situation are practically useless outside of the most trivial cases.

replies(1): >>35058426 #

55. crabbone ◴[07 Mar 23 15:43 UTC] No.35056585{3}[source]▶

>>35055734 #

> Password validation is a degenerate case of parsing

Really? How is that degenerate? Compared to what?

My guess is that you just decided to use a dictionary word you don't fully understand.

> In fact you could use an invalid password

Where does this nonsense come from? No. I cannot use invalid password. That's the whole point of validation: making sure it doesn't happen. What kind of bs is this?

> But that doesn't mean that `string -> Password` isn't parsing!

It's doing nothing useful, and that's the whole point. You just patted yourself on the head in front of the mirror for using some programming technique that you believe makes you special, but accomplished nothing of value. That was the original point: if you want results, you will end up doing validation, there's no way around it. You renaming of types is only interesting to you and a group of people who are interested in types, but doesn't advance the cause of someone who wants their password validated.

56. crabbone ◴[07 Mar 23 15:47 UTC] No.35056632{3}[source]▶

>>35054562 #

> You can use opaque types to encode constraints that the type system isn't able to express.

You just admitted in this sentence that the use of opaque types achieves nothing of value. Which was my point all along: why use them if they are useless? Just to feel smart because I pulled out an academia-flavored ninety-pound dictionary word to describe it?

replies(1): >>35058732 #

57. crabbone ◴[07 Mar 23 15:51 UTC] No.35056718{3}[source]▶

>>35054557 #

> dependently-typed language.

Now, are there really tools to make type systems with dependent types simple to prove? In reasonable time? How about the effort developers would have to put into just trying to express such types and into verifying that such an expression is indeed accomplishing its stated goals?

Just for a moment, imagine you filing a PR in a typical Web shop for the login form validation procedure, and sending a couple of screenfulls of Coq code or similar as a proof of your password validation procedure. How do you think your team will react to this?

Again, I didn't say it's impossible. Quite the opposite, I said that it is possible, but will be so bad in practice that nobody will want to use it, unless they are batshit crazy.

58. crabbone ◴[07 Mar 23 15:56 UTC] No.35056790{3}[source]▶

>>35054920 #

That's not how OP used it. OP wants to express constraints on data through ML-style type system. They didn't even consider obvious competition s.a. systems like SQL constraints or XSL.

I have no problems with the way you want to interpret this claim. But, really, I'm responding to the article linked to this thread, which isn't about at which point in application to perform the said validation or parsing.

replies(1): >>35059337 #

59. crabbone ◴[07 Mar 23 15:57 UTC] No.35056800{3}[source]▶

>>35055046 #

That's validation, my man. Which was the whole point of this.

replies(2): >>35056828 #>>35059297 #

60. crabbone ◴[07 Mar 23 15:58 UTC] No.35056821{3}[source]▶

>>35054916 #

You are doing validation! Your type has no properties of a type of prime numbers. You either didn't read the article in OP, or didn't understand what OP was arguing for.

Yes, it's fine, if you want to validate your input in this way -- I have no problems with it. It's just that you are doing validation, not parsing, at least not in the terms OP used them.

replies(1): >>35057766 #

61. cjfd ◴[07 Mar 23 15:59 UTC] No.35056828{4}[source]▶

>>35056800 #

It is not validation if you do it at parse time. Then you can pass a Prime around the whole time and never do the is_prime check again.

replies(1): >>35060838 #

62. dangets ◴[07 Mar 23 16:09 UTC] No.35056979{3}[source]▶

>>35054157 #

A similar saying and popular blog post title is "Make illegal states unrepresentable"

https://fsharpforfunandprofit.com/posts/designing-with-types...

https://ybogomolov.me/making-illegal-states-unrepresentable/

63. andrewflnr ◴[07 Mar 23 16:45 UTC] No.35057473[source]▶

>>35054514 #

Your fallacy is: https://blog.jaibot.com/the-copenhagen-interpretation-of-eth...

Meanwhile, people with more than one bit in their worldview RAM can fall back to validation when it's the only option that makes sense for their domain, and use parsing when it's appropriate, which is, notwithstanding your handful of frankly niche examples compared to the vast bulk of CRUD code, most of the time in practice.

64. PartiallyTyped ◴[07 Mar 23 17:02 UTC] No.35057766{4}[source]▶

>>35056821 #

The argument of OP is that you should first construct your input such that it adheres to a very specific type that contains all the information that you require, e.g. nonEmpty, and then allow that to go through the rest of your code.

Am I mistaken?

My mistake in the above snippets is precisely that TypeScript can not make the type more specific, i.e. Number to Prime, because `type Prime=number` is only creating an alias. I am not creating a type that is a more specific version of number but an alias.

Had I actually created a proper type, the parsing would have been correct. The parsing component is happening in the outer function because at some point I need to make the generic input more specific, and then allow it to flow through the rest of the program. Am I mistaken?

replies(2): >>35059519 #>>35060995 #

65. ◴[07 Mar 23 17:08 UTC] No.35057866[source]▶

>>35053912 (TP) #

66. ◴[07 Mar 23 17:14 UTC] No.35057974{5}[source]▶

>>35055225 #

67. bcrosby95 ◴[07 Mar 23 17:22 UTC] No.35058114{5}[source]▶

>>35055225 #

Depending upon language, and what you're using to hold the array, inheritance.

A 'NotEmpty a' is just a subclass of a potentially empty 'a'. You also get the desirable behavior, in this scenario, of automatic upcasting of a 'NotEmpty a' into a regular old 'a'.

replies(1): >>35060098 #

68. lolinder ◴[07 Mar 23 17:27 UTC] No.35058185[source]▶

>>35053912 (TP) #

Sum types are used here for error handling, but if your language has a different error handling convention you can and should just use that.

In Java, you'd implement this by making a class with a private constructor, no mutator methods, and a static factory method that throws an exception if the parsing fails. Since the only way to get an instance of the class is through the factory method, you've made illegal states unrepresentable and know that the class always holds to its invariants. No methods on instances of that class will throw exceptions from then on, so you've successfully applied "Parse, Don't Validate" without needing sum types.

The point of the article isn't the particular implementation in Haskell, it's the concept of pushing all data error states to the boundaries of your code, which applies anywhere as long as you translate it into the idioms of your language.

replies(2): >>35060043 #>>35079571 #

69. mrkeen ◴[07 Mar 23 17:42 UTC] No.35058426{4}[source]▶

>>35056528 #

> No. it's not the approach OP recommends.

It absolutely is.

> I have no idea why would you question that.

I did not question [that they were different approaches], I explained, through example and counter-example, why they were the same approach. I will try again.

Alexis wrote both 'validate' and 'parse' examples in ML-style types:

    validateNonEmpty :: [a] -> IO ()            // ML-typed 'validate'

    parseNonEmpty :: [a] -> IO (NonEmpty a)     // ML-typed 'parse'

More from the article:

    The difference lies entirely in the return type: validateNonEmpty always returns (), the type that contains no information, but parseNonEmpty returns NonEmpty a, a refinement of the input type that preserves the knowledge gained in the type system. Both of these functions check the same thing, but parseNonEmpty gives the caller access to the information it learned, while validateNonEmpty just throws it away.

I chose OO-style types for my samples, because there's a large fraction of HN users who dismiss ML-ish stuff as academic, or "practically useless outside of the most trivial cases".

    // OO-typed 'validate' (my straw man)
    class User {
        // returns void aka '()' aka "the type that contains no information"
        void validateUser() throws InvalidUserEx {...}          
    }

    /* OO-typed 'parse' (as per my baeldung link)
     * "gives the caller access to the information it learned"
     * In this case it gives back MORE than just the User,
     * it also gives back 'why it went wrong', per your request above for password validation
     * (In contrast with parseNonEmpty which just throws an exception.)
     */
    class UserValidator {
        Validation<Seq<String>, User> validateUser(...) {...}   
    }

> But, ML-style types have very limited expressive power

Hindley-Milner types are a godddamned crown-jewel of computer science.

70. chowells ◴[07 Mar 23 18:02 UTC] No.35058705{4}[source]▶

>>35056482 #

If you can't create a value of type PrimeNumber that doesn't contain a prime number, there's a bit more to it than naming. Not all type-level guarantees need to come from structural properties of the type. They can also come from structural properties of the environment of the type. Providing no public constructor is such a property.

The example was written rather badly, though. It should have pointed out that the module was exporting the type and a couple helper functions, but not the data constructor.

But despite that, the key point was correct. Validating is examining a piece of data and returning "good" or "bad". Parsing is returning a new piece of data which encodes the goodness property at the type level, or failing to return anything. It's a better paradigm because the language prevents you from forgetting what situation you're in.

replies(2): >>35059113 #>>35060307 #

71. chowells ◴[07 Mar 23 18:04 UTC] No.35058732{4}[source]▶

>>35056632 #

Opaque types absolutely provide something of value. They're different types. You can't pass an Integer to a function that requires a PrimeNumber. It's a compile error.

replies(1): >>35060355 #

72. ssalbdivad ◴[07 Mar 23 18:04 UTC] No.35058737{3}[source]▶

>>35055012 #

Have you seen ArkType (https://github.com/arktypeio/arktype)? Similar parse-based approach to validation with a significantly more concise definition syntax:

const info = type({ name: "string>0", email: "email" })

replies(1): >>35154724 #

73. lexi-lambda ◴[07 Mar 23 18:30 UTC] No.35059113{5}[source]▶

>>35058705 #

> If you can't create a value of type PrimeNumber that doesn't contain a prime number, there's a bit more to it than naming.

Yes, indeed. This is quite useful! But crabbone isn’t entirely wrong that it isn’t quite what the original article was about.

I’ve written quite a bit of code where constructive data modeling (which is what the original article is really about) was both practical and useful. Obviously it is not a practical approach everywhere, and there are lots of examples where other techniques are necessary. But it would be quite silly to write it off as useless outside of toy examples. A pretty massive number of domain concepts really can be modeled constructively!

But when they can’t, using encapsulation to accomplish similar things is absolutely a helpful approach. It’s just important to be thoughtful about what guarantees you’re actually getting. I wrote more about that in a followup post here: https://lexi-lambda.github.io/blog/2020/11/01/names-are-not-...

74. philsnow ◴[07 Mar 23 18:40 UTC] No.35059271[source]▶

>>35053912 (TP) #

> this basically requires your language to have ergonomic support for sum types, immutable "data classes", pattern matching

Not at all, the article is about pushing complexity to the "edges" of your code so that the gooey center doesn't have to faff around with (re-)checking the same invariants over and over... but its examples are also in Haskell, in which it would be weird to do this without the type system.

In python or java or whatever you'd just parse your received_api_request_could_be_sketchy or fetched_db_records_but_are_they_really into a BlessedApiRequest or DefinitelyForRealDBRecords in their constructors or builder methods or whatever, disallow any other ways of creating those types, and then exclusively using those types.

edit: wait, actually no we agree, I must have glossed over your second sentence, sorry

75. chowells ◴[07 Mar 23 18:42 UTC] No.35059297{4}[source]▶

>>35056800 #

Nope. If it was validation, it would return a boolean indicating if the value was... Valid.

Instead it's parsing. It takes in a value of one type and returns a value of a different type that is known good. Or it fails. But what it never does is let you continue forward with an invalid value as if it was valid. This is because it's doing more than just validation.

replies(1): >>35060815 #

76. lexi-lambda ◴[07 Mar 23 18:45 UTC] No.35059337{4}[source]▶

>>35056790 #

There is a fairly obvious difference between a dynamic enforcement mechanism like contracts or SQL constraints. Though I think it is a bit silly to suggest that I have “never even considered” such things given the blog post itself is rendered using Racket, a dynamically-typed language with a fairly sophisticated higher-order contract system.

SQL constraints are certainly useful. But they don’t really solve the same problem. SQL constraints ensure integrity of your data store, which is swell, but they don’t provide the same guarantees about your program that static types do, nor do they say much at all about how to structure your code that interacts with the database. I also think it is sort of laughable to claim that XSL is a good tool for solving just about any data processing problem in 2023, but even if you disagree, the same points apply.

Obviously, constructive data modeling is hardly a panacea. There are lots of problems it does not solve or that are more usefully solved in other ways. But I really have applied it to very good effect on many, many real engineering problems, not just toys, and I think the technique provides a nice framework for reasoning about data modeling in many scenarios. Your comments here seem almost bafflingly uncharitable given the article in question doesn’t make any absolutist claims and in fact discusses at some length that the technique isn’t always applicable.

See also: my other comment about using encapsulation instead of constructive modeling (https://news.ycombinator.com/item?id=35059113) and my followup blog post about how more things can be encoded using constructive data modeling than perhaps you think (https://lexi-lambda.github.io/blog/2020/08/13/types-as-axiom...).

replies(1): >>35060678 #

77. blincoln ◴[07 Mar 23 18:52 UTC] No.35059448{6}[source]▶

>>35055647 #

I see what you're saying, but I'm still not understanding how it becomes a generalizable rule for real-world code without adding a lot of exceptions to the rule, or doing something over-engineered like parsing into an increasingly specific number of customized structs/objects in different branches of the code.

Just to be clear, I actually really like the idea of parsing the input into a structure. I do the same thing in a lot of my code. I just don't see how it removes the need to also perform validation in many (maybe most) cases as soon as one gets beyond contrived examples.

The empty array example seems to be a can of worms. Maybe it's specific to the kinds of software that I've written, but in most of the cases I can think of, I wouldn't know if it was OK for a particular array within a structure to be empty until after the code had made some other determinations and branched based on them. And yet, like the example, once it got to the real handling for that case, it would be a problem if the array were empty. So the image in my mind is many layers of parsing that are much more complicated and confusing to read than validating the length of the array.

I still think it's a great idea for a lot of things, just that the "parse, don't validate" name seems really misleading. I might go with something like "parse first, validate where necessary".

78. lexi-lambda ◴[07 Mar 23 18:56 UTC] No.35059519{5}[source]▶

>>35057766 #

You are sort of mistaken. I wrote a followup blog post that discusses what you are describing at some length: https://lexi-lambda.github.io/blog/2020/11/01/names-are-not-...

However, TypeScript does not really provide any facility for nominal types, which in my opinion is something of a failure of the language, especially considering that it is at odds with the semantics of `class` and `instanceof` in dynamically-typed JavaScript (which have generative semantics). Other statically typed languages generally provide some form of nominal typing, even gradually typed ones. Flow even provided nominal types in JavaScript! But TypeScript is generally also quite unsound (https://twitter.com/lexi_lambda/status/1621973087192236038), so the type system doesn’t really provide any guarantees, anyway.

That said, TypeScript programmers have developed a way to emulate nominal typing using “brands”, which does allow you to obtain some of these benefits within the limitations of TS’s type system. You can search for “TypeScript branded types” to find some explanations.

replies(1): >>35060390 #

79. lexi-lambda ◴[07 Mar 23 19:33 UTC] No.35060043[source]▶

>>35058185 #

> In Java, you'd implement this by making a class with a private constructor, no mutator methods, and a static factory method that throws an exception if the parsing fails.

This is similar, and is indeed quite useful in many cases, but it’s not quite the same. I explained why in this comment: https://news.ycombinator.com/item?id=35059886 (The comment is talking about TypeScript, but really everything there also applies to Java.)

replies(1): >>35060531 #

80. secdeal ◴[07 Mar 23 19:38 UTC] No.35060098{6}[source]▶

>>35058114 #

Not quite, 'a' is the type of the elements 'NonEmpty a' contains.

It is rather the subclass of some kind of 'Iterable a'.

81. crabbone ◴[07 Mar 23 19:54 UTC] No.35060307{5}[source]▶

>>35058705 #

> If you can't create a value of type PrimeNumber that doesn't contain a prime number

Because you wrote a validation function, the exact thing OP told you not to do. Hooray?!

The goal of OP was to create a type that incorporates constraints on data, just like in their example about the non-empty list they created a type that in the type itself contains the constraints s.t. it's impossible to implement this type in a way that it will have an empty list.

You did the opposite. You created a type w/o any constraints whatsoever, and then added a validation function to it to make sure you only create values validated by that function. So... you kind of proved my point: it's nigh impossible to create a program intelligible to human beings that has a "prime number" type, and that's why we use validation -- it's easy to write, easy to understand.

Your type isn't even a natural number, let alone a prime number.

replies(1): >>35061011 #

82. crabbone ◴[07 Mar 23 19:58 UTC] No.35060355{5}[source]▶

>>35058732 #

Not in this context they don't. They are useless if you want to ensure that a given number is a prime number.

replies(2): >>35063518 #>>35063646 #

83. PartiallyTyped ◴[07 Mar 23 20:01 UTC] No.35060390{6}[source]▶

>>35059519 #

This is fantastic, thank you very much!

When I wrote GP I had in mind branding as the "right" way to get those benefits - though I was unaware of the name - however I see that it is still limited by TS' compiler's limitations.

So then going back to the initial snippets, my issue is that the Prime type is essentially behaving like newtype, thus the inner calls can not actually rely on the value actually being prime, yes?

I have to admit that quite a few of the things in the blog are beyond my current understanding. Do you have any recommended reading for post grads with rudimentary understanding of Haskell who would like to get deeper into type systems?

replies(1): >>35060621 #

84. PartiallyTyped ◴[07 Mar 23 20:05 UTC] No.35060447[source]▶

>>35054993 #

I was wrong.

See this fantastic reply by the author:

https://news.ycombinator.com/reply?id=35059519

85. lolinder ◴[07 Mar 23 20:12 UTC] No.35060531{3}[source]▶

>>35060043 #

Thanks for the reply! I wasn't at all expecting one from you.

If I'm understanding the difference correctly, it's that the constructive data modeling approach can be proven entirely in the type system without any trust in the library code, while the Java approach I recommended depends on there being no other way to construct an instance of the class, which can be tricky to guarantee. Is that accurate?

replies(1): >>35061617 #

86. gnulinux ◴[07 Mar 23 20:19 UTC] No.35060605{4}[source]▶

>>35055657 #

You're misunderstanding. Validation looks like

  validateEmail : String -> String -- post-condition: String contains valid email

whereas parse looks like:

  parseEmail : String -> Either EmailError ValidEmail

There is no problem using `ValidEmail` abstraction. The problem is type stability, when your program enters a stronger state at runtime (i.e. certain validations are performed at runtime) it's best to enter a strong state at compile time (stronger types) so that compiler can verify these conditions. If you remain at String, these validations (that a string is valid email) have no compile-time counterpart so there is no way for compiler to verify. So use `ValidEmail` instead.

87. lexi-lambda ◴[07 Mar 23 20:20 UTC] No.35060621{7}[source]▶

>>35060390 #

Haskell’s `newtype` keyword defines a genuinely fresh (nominal) type that is distinct from all other types. There is no direct analogue in TypeScript, but using branded types would be the closest you could get. That’s still not quite the same because TypeScript doesn’t really allow the same strong encapsulation guarantees that Haskell provides (which, to be clear, many other languages provide as well!), but it’s a decent approximation.

The problem with your `Prime` type is that it is just a type alias: a new way to refer to the exact same type. It’s totally interchangeable with `number`, so any `number` is necessarily also a `Prime`… which is obviously not very helpful. (As it happens, the Haskell equivalent of that would be basically identical, since Haskell also uses the `type` keyword to declare a type alias.)

As for recommended reading, it depends on what you’d like to know, really. There are lots of different perspectives on type systems, and there’s certainly a lot of stuff you can learn if you want to! But I think most working programmers probably don’t benefit terribly much from the theory (though it can certainly be interesting if you’re into that sort of thing). Perhaps you could tell me which things you specifically find difficult to understand? That would make it easier for me to provide suggestions, and it would also be useful to me, as I do try to make my blog posts as accessible as possible!

replies(1): >>35061206 #

88. crabbone ◴[07 Mar 23 20:24 UTC] No.35060678{5}[source]▶

>>35059337 #

> There is a fairly obvious difference between a dynamic enforcement mechanism like contracts or SQL constraints.

What on Earth are you talking about? What dynamic enforcement?

> Though I think it is a bit silly to suggest that I have “never even considered”

In the context of this conversation you showed no signs of such concerns. Had you have such concerns previously, you wouldn't have arrived at conclusions you apparently have.

> a dynamically-typed language

There's no such thing as dynamically-typed languages, just like there aren't blue or savory programming languages. Dynamically-typed is just a word combo that a lot of wannabe computer scientists are using, but there's no real meaning behind it. When "dynamic" is used in the context of types, it refers to the concrete type obtained during program execution, whereas "static" refers to the type that can be deduced w/o executing the program. For example, union types cannot be dynamic. Similarly, it's not possible to have generic dynamic types. Every language thus has dynamic and static types, except, in some cases, the static analysis of types isn't very useful because the types aren't expressive enough, or the verification is too difficult. Conversely, in some languages there's no mechanism to find out exact runtime types because the information about types is considered to be extraneous to the program and is removed from the runtime.

The division that wannabe computer scientists are thus trying to make between "dynamically-typed" and "statically-typed" lies roughly along the lines of "languages without useful static analysis method" and "languages that may be able to erase types from the runtime in most cases". Where "useful" and "most cases" are a matter of subjective opinion. Often times such boundaries lead claimers to ironically confusing conclusions, s.a. admitting that languages like Java aren't statically typed or that languages like Bash are statically typed and so on.

Note that "wannabee computer scientist" applies also to people with degrees in CS (I've met more than a dozen), some had even published books on this subject. This only underscores the ridiculous state in which this field is.

> discusses at some length that the technique isn’t always applicable.

This technique is not applicable to overwhelming majority of everyday problems. It's so niche it doesn't warrant a discussion, but it's instead presented as a thing to strive for. It's not a useful approach and at the moment, there's no hope of making it useful.

Validation, on the other hand, is a very difficult subject, but, I think that if we really want to deal with this problem, then TLA+ is a good approach for example. But it's still too difficult to the average programmer. Datalog would be my second choice, which also seems appropriate for general public. Maybe even something like XSL, which, in my view lacks a small core that would allow one to construct it from first principles, but it's still able to solve a lot of practical tasks when it comes to input validation.

ML-style types aren't competitive in this domain. They are very clumsy tools when it comes to expressing problems that programmers have to solve every day. We, as community, keep praising them because they are associated with the languages for the "enlightened" and thus must be the next best thing after sliced bread.

replies(1): >>35060984 #

89. crabbone ◴[07 Mar 23 20:34 UTC] No.35060815{5}[source]▶

>>35059297 #

> If it was validation, it would return a boolean

On what grounds did you decide that this is the requirement for validation? That's truly bizarre... Sometimes validating functions return booleans... but there's no general rule that they do.

Anyways, you completely missed the point OP was trying to make. Their idea was to include constraints on data (i.e. to ensure data validity) in the type associated with the data. You've done nothing of the kind: you created a random atomic type with a validation method. Your type isn't even a natural number, you definitely cannot add other natural number to it or to multiply etc...

Worse yet, you decided to go into a language with subtyping, which completely undermines all of your efforts, even if you were able to construct all of those overloads to make this type behave like a natural number: any other type that you create by inheriting from this class has the liberty to violate all the contracts you might have created in this class, but, through the definition of your language, it would still be valid to say that the subtype thus created is a prime number, even if it implements == in a way that it returns "true" when compared to 8 (only) :D

90. crabbone ◴[07 Mar 23 20:36 UTC] No.35060838{5}[source]▶

>>35056828 #

> It is not validation if you do it at parse time

Who told you so? Definitely not OP. OP doesn't believe what you just wrote.

replies(1): >>35064220 #

91. crabbone ◴[07 Mar 23 20:45 UTC] No.35060965{3}[source]▶

>>35055902 #

I wish this was the first such case. But, what I see happen way too often is this:

Some dude comes up with another data definition language (DDL) that uses ML-style types. Everyone jumps from their seats in standing ovation. And in the end we get another useless configuration language that cannot come anywhere close to the needs of application developers, and so they pedal away on their squared-wheel bicycles of hand-rolled very custom data validation procedures.

This is even more disheartening because we already have created tools that made some very good progress into systematic input validation. And they were with us since the down of programming (well, almost, we had SQL since early 70's, then we also had Prolog, then we had various XML schema languages, and finally TLA+). It's amazing how people keep ignoring solutions that achieved so much compared to ensuring that a list isn't empty... and yet present it as the way forward...

92. lexi-lambda ◴[07 Mar 23 20:47 UTC] No.35060984{6}[source]▶

>>35060678 #

You come off as a crank.

Perhaps you are one, perhaps you are not, I don’t know, but either way, you certainly write like one. If you want people to take you seriously, I think it would behoove you to adopt a more leveled writing style.

Many of the claims in your comment are absurd. I will not pick them apart one by one because I suspect it will do little to convince you. But for the benefit of other passing readers, I will discuss a couple points.

> What on Earth are you talking about? What dynamic enforcement?

SQL constraints are enforced at runtime, which is to say, dynamically. Static types are enforced without running the program. This is a real advantage.

> There's no such thing as dynamically-typed languages, just like there aren't blue or savory programming languages. […] The division that wannabe computer scientists are thus trying to make between "dynamically-typed" and "statically-typed" lies roughly along the lines of "languages without useful static analysis method" and "languages that may be able to erase types from the runtime in most cases".

I agree that the distinction is not black and white, and in fact I am on the record in various places as saying so myself (e.g. https://twitter.com/lexi_lambda/status/1219486514905862146). Java is a good example of a language with a very significant dynamic type system while also sporting a static type system. But it is certainly still useful to use the phrase “dynamically-typed language,” because normal people know what that phrase generally refers to. It is hardly some gotcha to point out that some languages have some of both, and there is certainly no need to insult my character.

> This technique is not applicable to overwhelming majority of everyday problems. It's so niche it doesn't warrant a discussion, but it's instead presented as a thing to strive for. It's not a useful approach and at the moment, there's no hope of making it useful.

This is simply not true. I know because I have done a great deal of real software engineering in which I have applied constructive data modeling extensively, to good effect. It would be silly to list them because it would simply be listing every single software project I have worked on for the past 5+ years. Perhaps you have not worked on problems where it has been useful. Perhaps you do not like the tradeoffs of the technique. Fine. But in this discussion, it’s ultimately just your word against mine, and many other people seem to have found the techniques quite useful—and not just in Haskell. Just look at Rust!

> Datalog would be my second choice, which also seems appropriate for general public.

The idea that datalog, a first-order relational query language, solves data validation problems (without further clarification) is so laughable that merely mentioning it reveals that you are either fundamentally unserious or wildly uninformed. It is okay to be either or both of those things, of course, but most people in that position do not have the arrogance and the foolishness to leave blustering comments making an ass of themselves on the subject on an internet forum.

Please be better.

replies(1): >>35070695 #

93. crabbone ◴[07 Mar 23 20:48 UTC] No.35060995{5}[source]▶

>>35057766 #

> first construct your input

My man... if I, the author of my program, was constructing the input, I wouldn't need no validation. Input isn't meant to be constructed by the program's author, it's supposed to be processed...

replies(1): >>35061690 #

94. chowells ◴[07 Mar 23 20:49 UTC] No.35061011{6}[source]▶

>>35060307 #

Are you aware that it's impossible to do any kind of parsing without validating the data? Saying "you have a validation function" is not some sort of disproof of parsing.

Parsing is an additional job on top of validation - providing type-level evidence that the data is good. That's what makes it valuable. It's not some theoretical difference in power. It's better software engineering.

95. PartiallyTyped ◴[07 Mar 23 21:06 UTC] No.35061206{8}[source]▶

>>35060621 #

I'd be interested in more about Generic as that section went over my head though it's not an issue with your blog but rather inept knowledge on my part.

I find it quite interesting though I never had the time to study it further until now, so any recommendations are appreciated!

replies(1): >>35061524 #

96. lexi-lambda ◴[07 Mar 23 21:29 UTC] No.35061524{9}[source]▶

>>35061206 #

Generic is quite specific to Haskell, so it is probably difficult to explain without a little more understanding of Haskell-like type systems. (Rust has some similar capabilities, so that would help, too.) I wouldn’t worry about it too much, though; it doesn’t contain any particularly deep knowledge about type systems in general.

replies(1): >>35061742 #

97. lexi-lambda ◴[07 Mar 23 21:38 UTC] No.35061617{4}[source]▶

>>35060531 #

Yes, that’s about right. But really do read the followup blog post (https://lexi-lambda.github.io/blog/2020/11/01/names-are-not-...), as it explains that in much more depth! In particular, it says:

> To some readers, these pitfalls may seem obvious, but safety holes of this sort are remarkably common in practice. This is especially true for datatypes with more sophisticated invariants, as it may not be easy to determine whether the invariants are actually upheld by the module’s implementation. Proper use of this technique demands caution and care:

> * All invariants must be made clear to maintainers of the trusted module. For simple types, such as NonEmpty, the invariant is self-evident, but for more sophisticated types, comments are not optional.

> * Every change to the trusted module must be carefully audited to ensure it does not somehow weaken the desired invariants.

> * Discipline is needed to resist the temptation to add unsafe trapdoors that allow compromising the invariants if used incorrectly.

> * Periodic refactoring may be needed to ensure the trusted surface area remains small. It is all too easy for the responsibility of the trusted module to accumulate over time, dramatically increasing the likelihood of some subtle interaction causing an invariant violation.

> In contrast, datatypes that are correct by construction suffer none of these problems. The invariant cannot be violated without changing the datatype definition itself, which has rippling effects throughout the rest of the program to make the consequences immediately clear. Discipline on the part of the programmer is unnecessary, as the typechecker enforces the invariants automatically. There is no “trusted code” for such datatypes, since all parts of the program are equally beholden to the datatype-mandated constraints.

They are both quite useful techniques, but it’s important to understand what you’re getting (and, perhaps more importantly, what you’re not).

98. PartiallyTyped ◴[07 Mar 23 21:44 UTC] No.35061690{6}[source]▶

>>35060995 #

Construct was not the correct word. The intention was to express that you do need to parse the object into something more specific that captures the properties that you require.

Take for example APIGatewayProxyEvent [1], which has a property `queryStringParameters` with type:

    export interface APIGatewayProxyEventQueryStringParameters {
        [name: string]: string | undefined;
    }

You can then create a branded type like

    type AuthCodeEvent = APIGatewayProxyEvent & {
         queryStringParameters: {
             code: string;
             state: string;
         };
    };

The branded type here means that as soon as you verify that the event has that structure above, and you can assume that it is correct in the code handles these specific cases.

Though as the blog author mentioned in the other chain, the TS compiler is not particularly sound, so it's probably entirely possible to mess the structure and break the type without the compiler knowing about it.

[1] https://github.com/DefinitelyTyped/DefinitelyTyped/blob/mast...

99. PartiallyTyped ◴[07 Mar 23 21:48 UTC] No.35061742{10}[source]▶

>>35061524 #

Okay, is there like a book or some other resource besides your awesome blog that you'd recommend for people looking to get into this some more?

replies(1): >>35068057 #

100. lmm ◴[08 Mar 23 00:17 UTC] No.35063412{5}[source]▶

>>35055225 #

> There are plenty of cases in real-world code where an array that's part of a struct or object may or may not contain any elements. If you're just parsing input into that, it seems like you'd either still end up doing an equivalent of checking whether the array is empty or not everywhere the array might be used later, even if that check is looking at an "array has elements" type flag in the struct/object, and so you're still maintaining a description of ways that the input may be invalid.

You only check it if it makes a difference to validity or not. There's no scenario where you keep the array and a parallel flag - either an empty array is invalid in which case you refuse to construct if it's empty, or an empty array is valid in which case you don't even check. Same thing for if you're checking whether it's got an even number of elements, got less than five elements, etc. - you don't keep the flag around, you refuse to construct your validated structure (even if that "validated structure" is actually just a marker wrapper around the raw array, if your type system isn't good enough to express the real constraint directly).

101. lmm ◴[08 Mar 23 00:22 UTC] No.35063455{3}[source]▶

>>35054298 #

Most of the time you avoid having booleans in the first case, in favour of polymorphism (e.g. rather than having an "addOrMultiply" flag, you have separate "Add" and "Multiply" classes with a polymorphic method that does the addition or multiplication). You probably need some conditional logic in your "parser" (and whether that's "if" or pattern matching isn't so important IMO), but you should push booleans out of your core business logic and over to the edges of your program.

replies(1): >>35071493 #

102. ParetoOptimal ◴[08 Mar 23 00:30 UTC] No.35063518{6}[source]▶

>>35060355 #

> Not in this context they don't.

What context is it exactly where they don't matter?

I can tell you in practice, in the real world, they very much do.

> They are useless if you want to ensure that a given number is a prime number.

It's not useless. The point is that once you have type `PrimeNumber` that can only be constructed after being validated, you then can write functions exist in a reality where only PrimeNumber exists.

103. ParetoOptimal ◴[08 Mar 23 00:43 UTC] No.35063646{6}[source]▶

>>35060355 #

I wrote an example with prime numbers that you can run in the Haskell playground:

https://play.haskell.org/saved/gRsNcCGo

> They are useless if you want to ensure that a given number is a prime number.

This is wrong. In the example above `addPrimes` will only take prime numbers.

As such if I make a Jira story that says "add multiply/subtract functions using the PrimeNumber type" I'll know that implementation is simplified by only being able to concern itself with prime numbers.

104. rovolo ◴[08 Mar 23 01:59 UTC] No.35064220{6}[source]▶

>>35060838 #

From the OP:

> Still, perhaps you are skeptical of parseNonEmpty’s name. Is it really parsing anything, or is it merely validating its input and returning a result? While the precise definition of what it means to parse or validate something is debatable, I believe parseNonEmpty is a bona-fide parser (albeit a particularly simple one).

> Consider: what is a parser? Really, a parser is just a function that consumes less-structured input and produces more-structured output.

The OP is saying that a validator is a function which doesn't return anything, whereas parsing is a function which returns data. (Or in other words, validation is when you keep passing around the data in the old type, and parsing is when you pass around a new type). It is true that there is code inside the parser which you can call "validation", but the OP is labeling the function based on its signature. This is made more obvious towards the end of the article:

> Use abstract datatypes to make validators "look like" parsers. Sometimes, making an illegal state truly unrepresentable is just plain impractical given the tools Haskell provides, such as ensuring an integer is in a particular range. In that case, use an abstract newtype with a smart constructor to "fake" a parser from a validator.

They are talking about the interface, not the implementation. They are saying that you should pass around a parsed type, even if it's only wrapping a raw value, because it carries proof that this data has been validated. They are saying that you shouldn't be validating this data in lots of different places.

> It may not be immediately apparent what shotgun parsing has to do with validation—after all, if you do all your validation up front, you mitigate the risk of shotgun parsing. The problem is that validation-based approaches make it extremely difficult or impossible to determine if everything was actually validated up front or if some of those so-called “impossible” cases might actually happen. The entire program must assume that raising an exception anywhere is not only possible, it’s regularly necessary.

105. ParetoOptimal ◴[08 Mar 23 02:52 UTC] No.35064605{3}[source]▶

>>35054640 #

> How am I going to express the expectation that something is prime? With the following closed API:

Obviously not a closed API since the playground only gives you one module, but I wrote an example on the Haskell playground:

https://play.haskell.org/saved/gRsNcCGo

106. quickthrower2 ◴[08 Mar 23 04:10 UTC] No.35065147{3}[source]▶

>>35054456 #

The second looks a lot more elegant in Haskell though. Funny how syntax and influence choice of semantics!

107. ParetoOptimal ◴[08 Mar 23 04:38 UTC] No.35065313{3}[source]▶

>>35054456 #

I prefer using early return in monads with guard like:

    safeDiv :: (Monad m, Alternative m) => Int -> Int -> m Int
    safeDiv x y = do
      guard (y /= 0)
      pure (x `div` y)

    main :: IO ()
    main = do
      print $ safeDiv @Maybe 1 0
      print $ safeDiv @[] 1 0
      -- print =<< safeDiv @IO 1 0 -- guard throws an error in IO

Try it out at https://play.haskell.org/saved/a6VsE3uQ

108. lexi-lambda ◴[08 Mar 23 12:10 UTC] No.35068057{11}[source]▶

>>35061742 #

Well, like I said, the subject is extremely broad, so it is difficult to give concrete suggestions without knowing what specifically you’d like to get into. But I can give some potential options.

If you’d like to learn Haskell, I think https://www.cis.upenn.edu/~cis1940/spring13/ is still a pretty nice resource. It is quick and to the point, and it provides some exercises to work through. There are lots of things in the Haskell ecosystem that you could explore if you wanted to after getting a handle on the basics.

If you want to learn about programming languages and type systems, you could read Programming Languages: Application and Interpretation (https://cs.brown.edu/courses/cs173/2012/book/), which has a chapter on type systems. Alternatively, if you want a more thorough treatment of type systems, you could read Types and Programming Languages by Benjamin Pierce. However, both PLAI and TAPL are textbooks, and they are primarily intended to be used as supporting material in a university course with an instructor. I think PLAI is relatively accessible, but TAPL is more likely to be a challenge without some existing background in programming languages.

replies(1): >>35069344 #

109. PartiallyTyped ◴[08 Mar 23 14:30 UTC] No.35069344{12}[source]▶

>>35068057 #

The textbooks are exactly what I needed! Thank you!

110. thanatropism ◴[08 Mar 23 16:05 UTC] No.35070695{7}[source]▶

>>35060984 #

Since by now we're arguing style, you would achieve your purposes (whatever they are) if you completed certain aggressive sentences, for example:

> You come off as a crank. [... because of X, Y ,Z ]

> Please be better. [in the following manner: ... even if takes summarizing what was said]

111. leetrout ◴[08 Mar 23 16:58 UTC] No.35071493{4}[source]▶

>>35063455 #

That sounds miserable. Is there blog post or something with more details that supports this? I might be having a knee jerk reaction because I can't imagine something like this being easy to work with and maintain but I recognize you were just giving a trivial example.

replies(1): >>35074824 #

112. lmm ◴[08 Mar 23 21:15 UTC] No.35074824{5}[source]▶

>>35071493 #

This was a blog example I saw a few years ago, it went through making a calculator program without using ifs. Looked pretty nice. I can't find it now though.

113. bruce343434 ◴[09 Mar 23 09:18 UTC] No.35079571[source]▶

>>35058185 #

Sum types are much more than just error handling though. They can be used to describe any structure made of substructures, where you have _multiple kinds_ of substructures. The type of such a structure is the _sum_ of all the substructure types.

114. nicky0 ◴[14 Mar 23 17:07 UTC] No.35154724{4}[source]▶

>>35058737 #

No, but I'm a heavy Zod user so ArkType it looks interesting. Thanks for the tip!

Are there any compelling reasons to switch apart from the difference in syntax?

↑