←back to thread

Parse, don't validate (2019)

(lexi-lambda.github.io)
398 points declanhaigh | 3 comments | | HN request time: 0.001s | source
Show context
bruce343434 ◴[] No.35053912[source]
Note that this basically requires your language to have ergonomic support for sum types, immutable "data classes", pattern matching.

The point is to parse the input into a structure which always upholds the predicates you care about so you don't end up continuously defensively programming in ifs and asserts.

replies(12): >>35054046 #>>35054070 #>>35054386 #>>35054514 #>>35054901 #>>35054993 #>>35055124 #>>35055230 #>>35056047 #>>35057866 #>>35058185 #>>35059271 #
crabbone ◴[] No.35054514[source]
It's not just about these limitations.

In order to be useful, type systems need to be simple, but there's no such restrictions on rules that govern our expectations of data correctness.

OP is delusional if they think that their approach can be made practical. I mean, what if the expectation from the data that an value is a prime number? -- How are they going to encode this in their type systems? And this is just a trivial example.

There are plenty of useful constraints we routinely expect in message exchanges that aren't possible to implement using even very elaborate type systems. For example, if we want to ensure that all ids in XML nodes are unique. Or that the last digit of SSN is a checksum of the previous digits using some complex formula. I mean, every Web developer worth their salt knows that regular expressions are a bad idea for testing email addresses (which would be an example of parsing), and it's really preferable to validate emails by calling a number of predicates on them.

And, of course, these aren't the only examples: password validation (the annoying part that asks for capital letter, digit, special character? -- I want to see the author implement a parser to parse possible inputs to password field, while also giving helpful error messages s.a. "you forgot to use a digit"). Even though I don't doubt it's possible to do that, the resulting code would be an abomination compared to the code that does the usual stuff, i.e. just checks if a character is in a set of characters.

replies(10): >>35054557 #>>35054562 #>>35054640 #>>35054916 #>>35054920 #>>35055046 #>>35055734 #>>35055902 #>>35056302 #>>35057473 #
cjfd ◴[] No.35055046[source]

  class Prime
  {
  public:
     Prime(int p): p(p)
     {
        if (!is_prime(p))
           throw std::runtime_error("Number was not a prime!");
     }

     int get_value() const
     {
        return p;
     }

   private:
      int p;
   }
replies(2): >>35056161 #>>35056800 #
crabbone ◴[] No.35056800[source]
That's validation, my man. Which was the whole point of this.
replies(2): >>35056828 #>>35059297 #
1. cjfd ◴[] No.35056828[source]
It is not validation if you do it at parse time. Then you can pass a Prime around the whole time and never do the is_prime check again.
replies(1): >>35060838 #
2. crabbone ◴[] No.35060838[source]
> It is not validation if you do it at parse time

Who told you so? Definitely not OP. OP doesn't believe what you just wrote.

replies(1): >>35064220 #
3. rovolo ◴[] No.35064220[source]
From the OP:

> Still, perhaps you are skeptical of parseNonEmpty’s name. Is it really parsing anything, or is it merely validating its input and returning a result? While the precise definition of what it means to parse or validate something is debatable, I believe parseNonEmpty is a bona-fide parser (albeit a particularly simple one).

> Consider: what is a parser? Really, a parser is just a function that consumes less-structured input and produces more-structured output.

The OP is saying that a validator is a function which doesn't return anything, whereas parsing is a function which returns data. (Or in other words, validation is when you keep passing around the data in the old type, and parsing is when you pass around a new type). It is true that there is code inside the parser which you can call "validation", but the OP is labeling the function based on its signature. This is made more obvious towards the end of the article:

> Use abstract datatypes to make validators "look like" parsers. Sometimes, making an illegal state truly unrepresentable is just plain impractical given the tools Haskell provides, such as ensuring an integer is in a particular range. In that case, use an abstract newtype with a smart constructor to "fake" a parser from a validator.

They are talking about the interface, not the implementation. They are saying that you should pass around a parsed type, even if it's only wrapping a raw value, because it carries proof that this data has been validated. They are saying that you shouldn't be validating this data in lots of different places.

> It may not be immediately apparent what shotgun parsing has to do with validation—after all, if you do all your validation up front, you mitigate the risk of shotgun parsing. The problem is that validation-based approaches make it extremely difficult or impossible to determine if everything was actually validated up front or if some of those so-called “impossible” cases might actually happen. The entire program must assume that raising an exception anywhere is not only possible, it’s regularly necessary.