Parse, don't validate (2019)

1. jmull ◴[07 Mar 23 13:48 UTC] No.35055254[source]▶

I get the point, but I wonder at why people find this particular article compelling. To me it's weak...

It's built on a particular technical distinction between paring and validating that (1) is not all that commonly understood or consistently accepted and (2) not actually explicitly stated in the article!

(validation: check data assumptions, fail of not met; parse: check data assumptions, fail if not met, and on success return data as a new type reflecting the additional constraints of the data, which can therefore be checked at compile time. Notice parsing includes validation, which makes the title of the article quite poor.)

That's important to know because the distinction is only meaningful in the context of certain language features, which may or may not apply.

Also, this is not great general advice:

> Push the burden of proof upward as far as possible, but no further

For one, it's a mostly meaningless, since it really just says put the burden of proof in the right place. But it implies that upward is preferable. You really want to push it upward if it's a high-level concern, and downward if it's a low-level concern. E.g., suppose you're working on an app or service that accesses the database, so the database is lower-level. You'll want to push your database-specific type transformations closer to the code that accesses the database.

Honestly, I find this whole thing kind of muddled.

(Also, in my experience, the fundamental limit here isn't on validation strategies, but the human ability to break down a problem and logically organize the solution. You can just as easily end up with an unmaintainable mess of spaghetti types as with any other useful abstraction.

replies(6): >>35055366 #>>35055866 #>>35055895 #>>35056075 #>>35057758 #>>35061557 #

2. jakelazaroff ◴[07 Mar 23 13:59 UTC] No.35055366[source]▶

>>35055254 (TP) #

> You really want to push it upward if it's a high-level concern, and downward if it's a low-level concern. E.g., suppose you're working on an app or service that accesses the database, so the database is lower-level. You'll want to push your database-specific type transformations closer to the code that accesses the database.

IMO, database code is at exactly the same level of concern as network code or filesystem code. By “upward”, she means push parsing to the boundaries of your program — as close to the point of ingress as possible.

replies(1): >>35062429 #

3. naasking ◴[07 Mar 23 14:44 UTC] No.35055866[source]▶

>>35055254 (TP) #

> It's built on a particular technical distinction between paring and validating that (1) is not all that commonly understood or consistently accepted and (2) not actually explicitly stated in the article!

If this isn't clear to you, ask yourself why programming languages are parsed and not merely validated. Validation is a subset of parsing, so clearly there's something important added.

4. dkarl ◴[07 Mar 23 14:46 UTC] No.35055895[source]▶

>>35055254 (TP) #

I figured from the title I'd get a better explanation in the comments, and I was right, but I think the article is not nearly as bad as the clickbaity title suggests. It's a decent introduction to how to use types to simplify code, and the basic idea that your types should reflect what you know about the data is extremely powerful. If you go to the trouble of checking that your data meets some constraints, you should be able to represent it with a more constrained type afterwards, and that is the essence of parsing. It all makes sense! Even the title makes sense, as a quick way to reference and remember the idea after you've learned it.

But, yeah, the clickbait title put me off, and you're right that the terminology is unhelpful, since the distinction between parsing and validation isn't consistently made, especially in practical work. Virtually all of the "validation" code I've seen in statically typed languages, in the codebases I've worked in, would be "parsing" by this definition.

5. mrkeen ◴[07 Mar 23 15:01 UTC] No.35056075[source]▶

>>35055254 (TP) #

> (2) not actually explicitly stated in the article!

    the difference between validation and parsing lies almost entirely in how information is preserved. Consider the following pair of functions:

    validateNonEmpty :: [a] -> IO ()

    parseNonEmpty :: [a] -> IO (NonEmpty a)

    Both of these functions check the same thing, but parseNonEmpty gives the caller access to the information it learned, while validateNonEmpty just throws it away.

replies(1): >>35056504 #

6. jmull ◴[07 Mar 23 15:38 UTC] No.35056504[source]▶

>>35056075 #

I know we can infer the point from the information buried in the middle of the article. But your quote is significantly edited for clarity, and, after all, is a code example, not a statement of definitions.

replies(1): >>35060900 #

7. lolinder ◴[07 Mar 23 17:01 UTC] No.35057758[source]▶

>>35055254 (TP) #

This confusion is, I think, just a question of different conceptions of the system architecture.

Your terminology is drawing from a three-tier architecture [0] with a presentation layer, logic layer, and data layer. Under this model, input (data) is the bottom layer and output (HTTP/GUI) is the top layer, with your application logic in the middle.

On the other hand, she is viewing the system through an inside-outside lens similar to the hexagonal architecture [1]. All input (data) and output (HTTP/GUI) is considered to be up and out of your application logic. Rather than being the middle of a sandwich, the application logic is the kernel of a seed.

This is a common way to view the system when programming in functional languages like Haskell because the goal is usually to push all I/O to the start of the call stack so as to minimize the amount of code that has to account for side effects. The three-tier architecture isn't concerned about isolating effects, so treating the data layer as the bottom layer of the code is reasonable.

In either model, the point is to push validation to the boundaries of your code and rely on the type checker to prove you're using things right within the logic layer.

[0] https://en.wikipedia.org/wiki/Multitier_architecture

[1] https://en.wikipedia.org/wiki/Hexagonal_architecture_%28soft...

8. cratermoon ◴[07 Mar 23 20:40 UTC] No.35060900{3}[source]▶

>>35056504 #

What is code, though, but a syntactically precise and logical way of expressing ideas?

9. Vosporos ◴[07 Mar 23 21:33 UTC] No.35061557[source]▶

>>35055254 (TP) #

It's okay to say that you didn't understand the article, you know.

10. jmull ◴[07 Mar 23 22:49 UTC] No.35062429[source]▶

>>35055366 #

The db access is just an example. I used upward and downward working off the terminology of the article. But I can put it like this:

For a given call or request, there's input, some work done with that input, and the result. (This is true, whether we're talking about a functional or imperative style.) Your code will have some structure that reflects the work to be done. You want to push your parsing toward the input if it's concerned with the input, and toward the result if it's concerned with the result.

Whether you want to call the processing closer to the input "upward", or "earlier" or whatever, that's fine with me. If you call the processing closer to the input and closer to the result both "upward" then I think it's not a useful metaphor and you should choose a different one.

replies(1): >>35063332 #

11. jakelazaroff ◴[08 Mar 23 00:10 UTC] No.35063332{3}[source]▶

>>35062429 #

Any given callee is going to deal with a bunch of both inputs and results. And it’s not clear to me what those terms mean — e.g. is the response from the database an “input” or a “result”?

I think your point of view would make more sense looking at the call stack — database access happens deeper than the code that handles the response, so you can’t push it “up” from there. And I mean, sure? But I don’t think that’s an inherently better frame than the one in which external sources are “upward” and your own application code is “downward”.