Parse, don't validate (2019)

(lexi-lambda.github.io)

Show context

bruce343434 ◴[07 Mar 23 10:54 UTC] No.35053912[source]▶

Note that this basically requires your language to have ergonomic support for sum types, immutable "data classes", pattern matching.

The point is to parse the input into a structure which always upholds the predicates you care about so you don't end up continuously defensively programming in ifs and asserts.

replies(12): >>35054046 #>>35054070 #>>35054386 #>>35054514 #>>35054901 #>>35054993 #>>35055124 #>>35055230 #>>35056047 #>>35057866 #>>35058185 #>>35059271 #

crabbone ◴[07 Mar 23 12:18 UTC] No.35054514[source]▶

>>35053912 #

It's not just about these limitations.

In order to be useful, type systems need to be simple, but there's no such restrictions on rules that govern our expectations of data correctness.

OP is delusional if they think that their approach can be made practical. I mean, what if the expectation from the data that an value is a prime number? -- How are they going to encode this in their type systems? And this is just a trivial example.

There are plenty of useful constraints we routinely expect in message exchanges that aren't possible to implement using even very elaborate type systems. For example, if we want to ensure that all ids in XML nodes are unique. Or that the last digit of SSN is a checksum of the previous digits using some complex formula. I mean, every Web developer worth their salt knows that regular expressions are a bad idea for testing email addresses (which would be an example of parsing), and it's really preferable to validate emails by calling a number of predicates on them.

And, of course, these aren't the only examples: password validation (the annoying part that asks for capital letter, digit, special character? -- I want to see the author implement a parser to parse possible inputs to password field, while also giving helpful error messages s.a. "you forgot to use a digit"). Even though I don't doubt it's possible to do that, the resulting code would be an abomination compared to the code that does the usual stuff, i.e. just checks if a character is in a set of characters.

replies(10): >>35054557 #>>35054562 #>>35054640 #>>35054916 #>>35054920 #>>35055046 #>>35055734 #>>35055902 #>>35056302 #>>35057473 #

1. cjfd ◴[07 Mar 23 13:26 UTC] No.35055046[source]▶

>>35054514 #

  class Prime
  {
  public:
     Prime(int p): p(p)
     {
        if (!is_prime(p))
           throw std::runtime_error("Number was not a prime!");
     }

     int get_value() const
     {
        return p;
     }

   private:
      int p;
   }

replies(2): >>35056161 #>>35056800 #

2. asimpletune ◴[07 Mar 23 15:10 UTC] No.35056161[source]▶

>>35055046 (TP) #

Just wanted to add that in some languages you could have a makePrime function that takes an int and returns a maybe[Prime]. If you don't make the constructor public this works perfect, as there is essentially no way to get a Prime without going through the pathways that library author relies upon. This is a pattern that's used in Scala a lot anyways.

3. crabbone ◴[07 Mar 23 15:57 UTC] No.35056800[source]▶

>>35055046 (TP) #

That's validation, my man. Which was the whole point of this.

replies(2): >>35056828 #>>35059297 #

4. cjfd ◴[07 Mar 23 15:59 UTC] No.35056828[source]▶

>>35056800 #

It is not validation if you do it at parse time. Then you can pass a Prime around the whole time and never do the is_prime check again.

replies(1): >>35060838 #

5. chowells ◴[07 Mar 23 18:42 UTC] No.35059297[source]▶

>>35056800 #

Nope. If it was validation, it would return a boolean indicating if the value was... Valid.

Instead it's parsing. It takes in a value of one type and returns a value of a different type that is known good. Or it fails. But what it never does is let you continue forward with an invalid value as if it was valid. This is because it's doing more than just validation.

replies(1): >>35060815 #

6. crabbone ◴[07 Mar 23 20:34 UTC] No.35060815{3}[source]▶

>>35059297 #

> If it was validation, it would return a boolean

On what grounds did you decide that this is the requirement for validation? That's truly bizarre... Sometimes validating functions return booleans... but there's no general rule that they do.

Anyways, you completely missed the point OP was trying to make. Their idea was to include constraints on data (i.e. to ensure data validity) in the type associated with the data. You've done nothing of the kind: you created a random atomic type with a validation method. Your type isn't even a natural number, you definitely cannot add other natural number to it or to multiply etc...

Worse yet, you decided to go into a language with subtyping, which completely undermines all of your efforts, even if you were able to construct all of those overloads to make this type behave like a natural number: any other type that you create by inheriting from this class has the liberty to violate all the contracts you might have created in this class, but, through the definition of your language, it would still be valid to say that the subtype thus created is a prime number, even if it implements == in a way that it returns "true" when compared to 8 (only) :D

7. crabbone ◴[07 Mar 23 20:36 UTC] No.35060838{3}[source]▶

>>35056828 #

> It is not validation if you do it at parse time

Who told you so? Definitely not OP. OP doesn't believe what you just wrote.

replies(1): >>35064220 #

8. rovolo ◴[08 Mar 23 01:59 UTC] No.35064220{4}[source]▶

>>35060838 #

From the OP:

> Still, perhaps you are skeptical of parseNonEmpty’s name. Is it really parsing anything, or is it merely validating its input and returning a result? While the precise definition of what it means to parse or validate something is debatable, I believe parseNonEmpty is a bona-fide parser (albeit a particularly simple one).

> Consider: what is a parser? Really, a parser is just a function that consumes less-structured input and produces more-structured output.

The OP is saying that a validator is a function which doesn't return anything, whereas parsing is a function which returns data. (Or in other words, validation is when you keep passing around the data in the old type, and parsing is when you pass around a new type). It is true that there is code inside the parser which you can call "validation", but the OP is labeling the function based on its signature. This is made more obvious towards the end of the article:

> Use abstract datatypes to make validators "look like" parsers. Sometimes, making an illegal state truly unrepresentable is just plain impractical given the tools Haskell provides, such as ensuring an integer is in a particular range. In that case, use an abstract newtype with a smart constructor to "fake" a parser from a validator.

They are talking about the interface, not the implementation. They are saying that you should pass around a parsed type, even if it's only wrapping a raw value, because it carries proof that this data has been validated. They are saying that you shouldn't be validating this data in lots of different places.

> It may not be immediately apparent what shotgun parsing has to do with validation—after all, if you do all your validation up front, you mitigate the risk of shotgun parsing. The problem is that validation-based approaches make it extremely difficult or impossible to determine if everything was actually validated up front or if some of those so-called “impossible” cases might actually happen. The entire program must assume that raising an exception anywhere is not only possible, it’s regularly necessary.

↑