←back to thread

Parse, Don't Validate (2019)

(lexi-lambda.github.io)
389 points melse | 7 comments | | HN request time: 1.954s | source | bottom
Show context
ukj ◴[] No.27639995[source]
Software Engineers: Parse, don't validate.

Mathematicians: Parsing is validation

https://gallais.github.io/pdf/draft_sigbovik21.pdf

replies(3): >>27640078 #>>27640121 #>>27640235 #
pwdisswordfish8 ◴[] No.27640078[source]
The point being, the converse of ‘parsing is validation’ is not true.
replies(2): >>27640111 #>>27641094 #
ukj ◴[] No.27640111[source]
The word "is" implies an isomorphism.

If you see it differently you are implicitly assuming a non-formalist perspective on what "validation" means. Tell us about it.

replies(4): >>27640132 #>>27640147 #>>27640155 #>>27640177 #
1. codetrotter ◴[] No.27640132[source]
The word “is” is also often used informally to mean “is a kind of”.
replies(1): >>27640142 #
2. ukj ◴[] No.27640142[source]
"A kind of" is precisely its formal use from the PoV of a type theorist.

Two things are the same type of thing if they share all of their extensional properties.

That is what it means for two things to be identical/equal.

replies(2): >>27640206 #>>27640854 #
3. codetrotter ◴[] No.27640854[source]
But what I am saying is that parsing is a kind of validation. But all validation is not parsing.

For example let's say that I have written an HTTP API that accepts application/x-www-form-urlencoded data to one of its endpoints. Let's say `POST /users`, and this is where the client-side application posts data to.

Now I can implement this in many ways. I can for example define

    pub struct Person {
        name: String,
        phone_number: String,
    }
But how I populate this struct can determine whether I am actually parsing or not, even if most of the code aside from that is the same.

And of course I could go further and define types for the name and the phone number but in this case lets say that I have decided that strings are the proper representation in this case.

If the fields of my structs were directly accessible

    pub struct Person {
        pub name: String,
        pub phone_number: String,
    }
And in my HTTP API endpoint for `POST /users` I do the following:

    // ...
    
    let name = post_data.name;
    let phone_number = post_data.phone_number;

    let norwegian_phone_number_format = Regex::new(r"^(\+47|0047)?\d{8}$").unwrap();

    // ...
And I didn't bother to write out the rest of the code here for this example but you get the gist.

The point is that here I am doing some rudimentary validation on the phone number, requiring it to be in Norwegian format. But I am enforcing this in the implementation of the handler for the HTTP API endpoint, rather than in the data type itself.

Whereas if I was instead doing

    pub struct Person {
        name: String,
        phone_number: String,
    }

    impl Person {
        pub fn try_new (name: String, phone_number: String) -> std::result::Result<Self, PersonDataError> {
            // ...

            let norwegian_phone_number_format = Regex::new(r"^(\+47|0047)?\d{8}$").unwrap();

            // ...
        }
    }
Now I've moved the validation into an associated function of the type itself, and I've made the fields of the struct unaccessible from the outside.

And in this manner, even though my validation is still rudimentary, and a type purist might find the type insufficiently constrained, I have indeed in my own book gone from just validation to actual parsing. Because I have made it so that the construction of the type enforces the constraints on the data.

replies(1): >>27640870 #
4. ukj ◴[] No.27640870{3}[source]
You are over-complicating this into obscurity.

General case: Validating random data as input into some program.

Particular case: Validating random source code (data) as input into some compiler (program).

Do compilers parse or validate?

"parsing is validation, but validation is not parsing" if that were true then you should be able to give an example of a compiler doing some sort of validation on the random source code (data) that is not parsing.

The very thing which determines the validity of random source code is the compiler's ability to parse it.

replies(1): >>27641937 #
5. codetrotter ◴[] No.27641937{4}[source]
But I’m not talking about compilers here
replies(1): >>27644920 #
6. ukj ◴[] No.27644920{5}[source]
Why not?

Compilers are computable functions.

If “Parsing is validation, but validation is not parsing” is true then it is also true about compilers.

replies(1): >>27696150 #
7. jhgb ◴[] No.27696150{6}[source]
> If “Parsing is validation, but validation is not parsing” is true then it is also true about compilers.

This is a false statement, considering that the cases where something is validation but not parsing may very well lie in the complement of the set of compilers to the set of all computable functions. The converse statement that if this were true about the smaller set (of compilers), it would also be true about the larger set (of all programs) would on the other hand be correct.