Most active commenters
  • ukj(11)
  • pwdisswordfish8(3)
  • codetrotter(3)
  • jhgb(3)

←back to thread

Parse, Don't Validate (2019)

(lexi-lambda.github.io)
389 points melse | 23 comments | | HN request time: 0.863s | source | bottom
Show context
ukj ◴[] No.27639995[source]
Software Engineers: Parse, don't validate.

Mathematicians: Parsing is validation

https://gallais.github.io/pdf/draft_sigbovik21.pdf

replies(3): >>27640078 #>>27640121 #>>27640235 #
1. pwdisswordfish8 ◴[] No.27640078[source]
The point being, the converse of ‘parsing is validation’ is not true.
replies(2): >>27640111 #>>27641094 #
2. ukj ◴[] No.27640111[source]
The word "is" implies an isomorphism.

If you see it differently you are implicitly assuming a non-formalist perspective on what "validation" means. Tell us about it.

replies(4): >>27640132 #>>27640147 #>>27640155 #>>27640177 #
3. codetrotter ◴[] No.27640132[source]
The word “is” is also often used informally to mean “is a kind of”.
replies(1): >>27640142 #
4. ukj ◴[] No.27640142{3}[source]
"A kind of" is precisely its formal use from the PoV of a type theorist.

Two things are the same type of thing if they share all of their extensional properties.

That is what it means for two things to be identical/equal.

replies(2): >>27640206 #>>27640854 #
5. thereare5lights ◴[] No.27640147[source]
> The word "is" implies an isomorphism.

Are you talking about a bijective mapping or are you saying it's a synonym for identical?

Because the former doesn't make any sense here and the latter is not true.

Red is a color does not imply that all colors are red.

replies(1): >>27640203 #
6. ◴[] No.27640155[source]
7. pwdisswordfish8 ◴[] No.27640177[source]
‘A square is a rectangle’ means squares are isomorphic to rectangles?
replies(1): >>27640269 #
8. ukj ◴[] No.27640203{3}[source]
I am talking about the polymorphic use of the verb "is" during the process of formalization.

"Red is a color" can be formalized as "Red is a type of color" or "Red is member of set Colors".

You can't formalize "Color is red" because it doesn't mean anything.

When I say "Parsing is validation" I am using the verb "is" to mean an isomorphism.

replies(1): >>27656103 #
9. ukj ◴[] No.27640269{3}[source]
You are tripping up over polymorphism. "Is" means many things - which meaning you infer is precisely parsing!

"A square is a rectangle" means "A square is a TYPE of rectangle" (at least, that is what I am parsing it as).

"Parsing is Validation" means Parsing is isomorphic to Validation.

How do I know? Because that is how I want you to parse it.

replies(2): >>27640298 #>>27641887 #
10. pwdisswordfish8 ◴[] No.27640298{4}[source]
> 'When I use a word,' Humpty Dumpty said in rather a scornful tone, 'it means just what I choose it to mean — neither more nor less.'
replies(1): >>27640310 #
11. ukj ◴[] No.27640310{5}[source]
+∞

parse verb. resolve (a sentence) into its component parts and describe their syntactic roles.

In computer science what we do is precisely syntax analysis. Determining the meaning of operators.

Mathematicians have the exact same problem with respect to the equality operator.

https://ncatlab.org/nlab/show/equality#DifferentKinds

12. codetrotter ◴[] No.27640854{4}[source]
But what I am saying is that parsing is a kind of validation. But all validation is not parsing.

For example let's say that I have written an HTTP API that accepts application/x-www-form-urlencoded data to one of its endpoints. Let's say `POST /users`, and this is where the client-side application posts data to.

Now I can implement this in many ways. I can for example define

    pub struct Person {
        name: String,
        phone_number: String,
    }
But how I populate this struct can determine whether I am actually parsing or not, even if most of the code aside from that is the same.

And of course I could go further and define types for the name and the phone number but in this case lets say that I have decided that strings are the proper representation in this case.

If the fields of my structs were directly accessible

    pub struct Person {
        pub name: String,
        pub phone_number: String,
    }
And in my HTTP API endpoint for `POST /users` I do the following:

    // ...
    
    let name = post_data.name;
    let phone_number = post_data.phone_number;

    let norwegian_phone_number_format = Regex::new(r"^(\+47|0047)?\d{8}$").unwrap();

    // ...
And I didn't bother to write out the rest of the code here for this example but you get the gist.

The point is that here I am doing some rudimentary validation on the phone number, requiring it to be in Norwegian format. But I am enforcing this in the implementation of the handler for the HTTP API endpoint, rather than in the data type itself.

Whereas if I was instead doing

    pub struct Person {
        name: String,
        phone_number: String,
    }

    impl Person {
        pub fn try_new (name: String, phone_number: String) -> std::result::Result<Self, PersonDataError> {
            // ...

            let norwegian_phone_number_format = Regex::new(r"^(\+47|0047)?\d{8}$").unwrap();

            // ...
        }
    }
Now I've moved the validation into an associated function of the type itself, and I've made the fields of the struct unaccessible from the outside.

And in this manner, even though my validation is still rudimentary, and a type purist might find the type insufficiently constrained, I have indeed in my own book gone from just validation to actual parsing. Because I have made it so that the construction of the type enforces the constraints on the data.

replies(1): >>27640870 #
13. ukj ◴[] No.27640870{5}[source]
You are over-complicating this into obscurity.

General case: Validating random data as input into some program.

Particular case: Validating random source code (data) as input into some compiler (program).

Do compilers parse or validate?

"parsing is validation, but validation is not parsing" if that were true then you should be able to give an example of a compiler doing some sort of validation on the random source code (data) that is not parsing.

The very thing which determines the validity of random source code is the compiler's ability to parse it.

replies(1): >>27641937 #
14. ukj ◴[] No.27641094[source]
Then you have some formally inexpressible/impredicative notion of "validation" in mind. For posterity (lifting from the depths of the threads):

General case: Validating random data as input into some program.

Particular case: Validating random source code (data) as input into some compiler (program).

Do compilers parse or validate?

> "the converse of ‘parsing is validation’ is not true."

If that were the case then you should be able to give an example of a compiler validating random source code (data) but not parsing it.

What determines the validity of random input is precisely a compiler's ability to parse it.

replies(1): >>27642284 #
15. jhgb ◴[] No.27641887{4}[source]
> "A square is a rectangle" means "A square is a TYPE of rectangle" (at least, that is what I am parsing it as).

In that case your former statement that 'The word "is" implies an isomorphism' seems to be wrong.

replies(1): >>27645179 #
16. codetrotter ◴[] No.27641937{6}[source]
But I’m not talking about compilers here
replies(1): >>27644920 #
17. nsajko ◴[] No.27642284[source]
I think that you actually agree with the comment you responded to, it's just that you misinterpreted what it was trying to say.
replies(1): >>27643928 #
18. ukj ◴[] No.27643928{3}[source]
I certainly don’t disagree (that doesn’t mean I agree).

The purpose of the conversation is to arrive at mutually acceptable interpretation.

19. ukj ◴[] No.27644920{7}[source]
Why not?

Compilers are computable functions.

If “Parsing is validation, but validation is not parsing” is true then it is also true about compilers.

replies(1): >>27696150 #
20. ukj ◴[] No.27645179{5}[source]
It may be wrong in your model/interpretation of my words, but it's not wrong in my interpretation of my words.
replies(1): >>27655456 #
21. jhgb ◴[] No.27655456{6}[source]
In what interpretation is it consistent for 'A square is a rectangle" means "A square is a TYPE of rectangle"' and 'The word "is" implies an isomorphism' to be simultaneously true? No matter how I cut it, the latter seems to prevent the former to me.
22. thereare5lights ◴[] No.27656103{4}[source]
Judging by all the other disagreeing comments, your in some sort of idiosyncratic context that only you understand.

Good luck with that.

23. jhgb ◴[] No.27696150{8}[source]
> If “Parsing is validation, but validation is not parsing” is true then it is also true about compilers.

This is a false statement, considering that the cases where something is validation but not parsing may very well lie in the complement of the set of compilers to the set of all computable functions. The converse statement that if this were true about the smaller set (of compilers), it would also be true about the larger set (of all programs) would on the other hand be correct.