Most active commenters
  • ukj(16)
  • justinpombrio(4)

←back to thread

Parse, Don't Validate (2019)

(lexi-lambda.github.io)
389 points melse | 26 comments | | HN request time: 2.771s | source | bottom
Show context
seanwilson ◴[] No.27640953[source]
From the Twitter link:

> IME, people in dynamic languages almost never program this way, though—they prefer to use validation and some form of shotgun parsing. My guess as to why? Writing that kind of code in dynamically-typed languages is often a lot more boilerplate than it is in statically-typed ones!

I feel that once you've got experience working in (usually functional) programming languages with strong static type checking, flakey dynamic code that relies on runtime checks and just being careful to avoid runtime errors makes your skin crawl, and you'll intuitively gravitate towards designs that takes advantage of strong static type checks.

When all you know is dynamic languages, the design guidance you get from strong static type checking is lost so there's more bad design paths you can go down. Patching up flakey code with ad-hoc runtime checks and debugging runtime errors becomes the norm because you just don't know any better and the type system isn't going to teach you.

More general advice would be "prefer strong static type checking over runtime checks" as it makes a lot of design and robustness problems go away.

Even if you can't use e.g. Haskell or OCaml in your daily work, a few weeks or just of few days of trying to learn them will open your eyes and make you a better coder elsewhere. Map/filter/reduce, immutable data structures, non-nullable types etc. have been in other languages for over 30 years before these ideas became more mainstream best practices for example (I'm still waiting for pattern matching + algebraic data types).

It's weird how long it's taking for people to rediscover why strong static types were a good idea.

replies(10): >>27641187 #>>27641516 #>>27641651 #>>27641837 #>>27641858 #>>27641960 #>>27642032 #>>27643060 #>>27644651 #>>27657615 #
ukj ◴[] No.27641651[source]
Every programming paradigm is a good idea if the respective trade-offs are acceptable to you.

For example, one good reason why strong static types are a bad idea... they prevent you from implementing dynamic dispatch.

Routers. You can't have routers.

replies(3): >>27641741 #>>27642043 #>>27642764 #
1. justinpombrio ◴[] No.27642764[source]
Are you sure you know what dynamic dispatch is? Java has dynamic dispatch, and it is a statically typed language. In Java, it's often called "runtime polymorphism".

https://www.geeksforgeeks.org/dynamic-method-dispatch-runtim...

And using it doesn't give up any of Java's type safety guarantees. The arguments and return type of the method you call (which will be invoked with dynamic dispatch) are type checked.

replies(2): >>27643722 #>>27650975 #
2. ukj ◴[] No.27643722[source]
Are you sure you know when type-checking and when dynamic dispatching happens?

Compile time is not runtime.

The Java compiler is not the JVM.

The compiler does type checking. The JVM does the dynamic dispatching.

Neither does both.

replies(1): >>27643937 #
3. justinpombrio ◴[] No.27643937[source]
All those statements are correct. The people downvoting you know that too. I don't think anyone has figured out what point you're trying to make, though. Could you spell it out in more detail?

Consider addition. The compiler does type checking, and the JVM actually adds the numbers. Nonetheless, the addition is type checked, and does not represent a weakness of static type checking. How is dynamic dispatch different than this?

replies(1): >>27644085 #
4. ukj ◴[] No.27644085{3}[source]
The point is trivial.

You can’t have both static type safety AND dynamic dispatch at the same time and in the same context about the same data.

Choose one. Give up the other. Make a conscious trade off.

The language that you are using is making such trade offs for you - they are implicit in the language design. Best you know what they are because they are meaningful in principle and in practice.

replies(1): >>27644214 #
5. justinpombrio ◴[] No.27644214{4}[source]
Java has both static type safety AND dynamic dispatch at the same time and in the same context about the same data.
replies(1): >>27644270 #
6. ukj ◴[] No.27644270{5}[source]
No, it doesn’t.

The input-data to the compiler can’t be handled by the JVM and vice versa.

The JVM handles byte code as input. The compiler handles source code as input.

That is two different functions with two different data domains.

They literally have different types!

Which one of the two functions is the thing you call “Java”?

replies(2): >>27644442 #>>27644498 #
7. justinpombrio ◴[] No.27644442{6}[source]
A type system is sound when:

    for all expressions e:
      if e type checks with type t, then one of the following holds:
        - e evaluates to a value v of type t; or
        - e does not halt; or
        - e hits an "unavoidable error"
          like division by 0 or null deref
          (what counts as "unavoidable" varies from lang to lang)
Notice anything interesting about this definition? It uses the word "evaluate"! Type soundness is not just a statement about the type checker. It relates type checking to run time (and thus to compilation too). That is, if you muck with Java's runtime or compiler, you can break type soundness, even if you don't change its type system in the slightest.
replies(1): >>27644614 #
8. BoiledCabbage ◴[] No.27644498{6}[source]
It's pretty clear at this point you don't understand what dynamic dispatch means.

I don't think it's worthwhile for anyone else to argue with you further. C++ and Java both support dynamic dispatch although you deny it.

You've taken up almost a full page of HN arguing with everyone trying to explain it to you. People have pointed you to wikipedia showing you that you're wrong. [1]

ISOCPP of which Bjarne is a director [2] says that C++ supports dynamic dispatch. [3]

And you continue to attempt to argue that everyone on HN, Wikipedia and the creator of the C++ language are all wrong and don't know what dynamic dispatch is.

Your continued insistence is both wrong and a negative impact at this point on hn. Please stop arguing something that numerous people have taken lots of patience and charity in trying every way possible to explain to you and what is clearly factually wrong.

If you're going to reply, please explain why an organization that Bjarne Stroustrup is a director of believes that C++ supports dynamic dispatch.

1. https://en.wikipedia.org/wiki/Dynamic_dispatch#C++_implement...

2 https://isocpp.org/about

3. https://isocpp.org/wiki/faq/big-picture#why-use-oo

replies(3): >>27644551 #>>27645529 #>>27646404 #
9. ukj ◴[] No.27644614{7}[source]
Yes this is precisely what I am talking about.

Different implementations of eval() a.k.a different programming languages have different semantics.

Which implicit eval() implementation does the above hold?

What type system do you have in mind such that the above holds for ALL expressions. Type-checking itself is not always decidable.

10. ukj ◴[] No.27645529{7}[source]
>If you're going to reply, please explain why an organization that Bjarne Stroustrup is a director of believes that C++ supports dynamic dispatch.

Because the meaning of "dynamic" is ambigous!

Since you are pointing me to wikipedia I'll point you right back...

https://en.wikipedia.org/wiki/Virtual_method_table#Compariso...

"Virtual method tables also only work if dispatching is constrained to a known set of methods, so they can be placed in a simple array built at compile time."

If the vtable is generated at compile and is constrained to a known set of methods then that array is immutable! Calling that "dynamic" is an obvious misnomer!

You are neither charitable nor patient. You are committing the bandwagon fallacy as we speak. You and the other 120 (and counting) angry downvoters ;)

I am using the word "dynamic" to actually mean dynamic! I am not going to define it. Use your judgment. Dynamic is NOT static. I am not asking you to "educate me", or to tell me I am right; or wrong. I am asking you to understand the sort of programming language design I have in mind!

Either you understand; or you don't.

11. LAC-Tech ◴[] No.27646404{7}[source]
> It's pretty clear at this point you don't understand what dynamic dispatch means.

Terms in computing are so overloaded that these days I try[0] and never correct anyone on how they use a term. Instead I ask them to define it, and debate off of that definition.

So instead of downvoting this guy for using different terminology - we can ask him what he means and just have a discussion.

[0] alright I don't always succeed but it's an ideal to strive for

replies(2): >>27646463 #>>27646699 #
12. ukj ◴[] No.27646463{8}[source]
This is a commendable approach.

Computation is a general, abstract and incredibly useful idea disconnected from any particular model of computation (programming language).

Different languages are just different models of computation and have different (desirable, or undesirable) semantic properties independent from their (trivial) syntactic properties.

It's this sort of angry dogmatism which prevents people from talking about programming language design.

Not for a second do they pause to think their own understanding may be limited.

replies(1): >>27646601 #
13. Jtsummers ◴[] No.27646601{9}[source]
> It's this sort of angry dogmatism which prevents people from talking about programming language design.

The dogmatism demonstrated today was in your comments, ukj. Your inability to understand that your non-standard use of terms makes it impossible for others to communicate with you in any effective way made this a remarkable farce of a conversation or debate.

replies(3): >>27646642 #>>27647693 #>>27647802 #
14. ukj ◴[] No.27646642{10}[source]
There is no such thing as a "standard" model of computation, and therefore no such thing as "standard definition" or use of computation.

There is only the model you (and your tribe) believe is "standard" implicitly. Do you actually understand that?

I kinda thought navigating the inherent ambiguity of all language (programming or natural) is a fundamental skill for software engineers.

Communication is indeed impossible when you think you possess the "right" meaning of words.

Oh... by the way, somebody capable of understanding/listening has just informed me that what I am actually talking about is closer to multiple dispatch than dynamic dispatch.

Multiple dispatch is, in fact, more dynamic than "dynamic dispatch". Hilarity ensues.

15. Jtsummers ◴[] No.27646699{8}[source]
Dynamic dispatch is not terribly overloaded. It's dispatching based on run-time information instead of just compile-time information.

The problem in this discussion is that ukj has come to the belief (but communicated it poorly) that dynamic dispatch is somehow incompatible with static typing. And for some reason this also matters.

Static typing does not preclude dynamic dispatch, and despite being pointed to several mainstream languages that have both features, ukj decided to ignore reality or the common understanding of the phrase "dynamic dispatch" and produced this grotesque example of trying to communicate with an individual who is, apparently, just a troll. Feeding the troll, ukj, is probably the dumbest thing I did today, but I'll blame that on the insomnia reducing my ability to detect trolls.

replies(2): >>27646746 #>>27648312 #
16. ukj ◴[] No.27646746{9}[source]
>Dynamic dispatch is not terribly overloaded. It's dispatching based on run-time information instead of just compile-time information.

According to the above definition C++ does not have dynamic dispatch!

The vtable in C++ is generated at compile time and used at runtime. It's immutable at runtime. That means it is NOT dynamic!

https://en.wikipedia.org/wiki/Virtual_method_table#Compariso...

"Virtual method tables also only work if dispatching is constrained to a known set of methods, so they can be placed in a simple array built at compile time."

Like, I don't care how you use words, but you (and everyone) are using "dynamic" to speak about a statically-behaving system!

17. ◴[] No.27647693{10}[source]
18. ukj ◴[] No.27647802{10}[source]
There's also no such thing as "standard use" of terms. Your dogma has a name. Linguistic prescriptivism ( https://en.wikipedia.org/wiki/Linguistic_prescription )

If there was a "standard use" languages wouldn't evolve and we wouldn't have so many of them. 8 billion people on Earth would be speaking The Standard Language!

Perhaps the thing you don't understand (over and above how to communicate) is this comic?

https://xkcd.com/927/

19. ukj ◴[] No.27650975[source]
In English there seems to be the eternal confusion between what things are and what we call them.

When a router does lookups from a static table it's "static routing".

When Java does lookups from a static table it's "dynamic dispatch".

The same type of computation is being characterised as both "static" and "dynamic".

When a router does lookups from a dynamic table it's "dynamic routing" - there is no equivalent in Java because making the dispatch table reflexive/mutable is precisely what violates type-safety!

replies(1): >>27654120 #
20. wabain ◴[] No.27654120[source]
I don't think this analogy quite holds together. A router doing lookups from a table is implementing a static routing strategy from a control plane perspective; it's using statically configured values instead of using dynamic information about the network topology gleaned using a routing protocol like BGP. But an implementation of that strategy in terms of table lookups is dynamic—it's walking a data structure to retrieve values which were specified in the runtime configuration, not at compile time.

The reason that "dynamic dispatch" in Java, etc. is called that is that the instance of the dispatch table to use is chosen dynamically, rather than being fixed at the callsite. It's true that Java doesn't let the shape of the dispatch table change at runtime, but that's not what dynamic vs static refers to conventionally in this context. The ability to dynamically add and remove methods from a class is something which you typically only get in dynamically typed languages but dynamic dispatch and dynamic typing are not the same thing.

In particular, while a full-featured routing information base implementation will usually use some form of dynamic dispatch to customize the behavior of routes originated through different protocols, it's very uncommon for an implementation to rely on dynamic typing which adds or mutates the methods associated with different entities. That's simply a different kind of tool used for different purposes. It's something which can be helpful in object-relational mapping, for instance, because you can create methods based on a dynamic database schema. The RIB is not going to have a schema like that which changes at runtime.

replies(1): >>27654493 #
21. ukj ◴[] No.27654493{3}[source]
>But an implementation of that strategy in terms of table lookups is dynamic—it's walking a data structure to retrieve values which were specified in the runtime configuration, not at compile time.

That's precisely the point. You can specify part of the routing table at compile/configure time - the rest gets generated at runtime.

The data/control plane distinction is conceptual. It doesn't hold in memory when the router is handling its own network traffic - it has a single routing table/world-view.

My own routing table is shared by the data plane AND control plane.

At some point you will receive an external data (routing update) which requires runtime validation, you will do reflection and update your own routing table (ring 0 address space) based on external events.

>The RIB is not going to have a schema like that which changes at runtime.

The schema need not change. The entries/relations between objects changing is sufficient to violate type-safety.

Route add( *str1, *str2) to Number.add().

replies(1): >>27657552 #
22. wabain ◴[] No.27657552{4}[source]
I mean, there's ample evidence on this thread to suggest we're not going to reach a productive conclusion here but I guess I'll keep biting.

It's not clear to me if you are suggesting that a dynamic routing table with different kinds of routes cannot be implemented in a type-safe manner in a statically typed language, or if you're working with an analogy where a routing table is like a dynamic programming language at runtime, in that a static set of entities and relations are known ahead of time and those are modified by runtime input. If it's the latter I'm not really sure how the analogy works—if programming languages are to routers as types are to routing entries, what in a router is analogous to a value of a given type?

I can speak more to the former possibility; here's a rough sketch of how one could implement a routing table using the tools available in a statically typed environment (and in a type-safe way). One way to do it (I believe the common way, and certainly the only one I've seen in commercial router implementations) is to treat statically populated and dynamically learned routes more or less uniformly in the data structures used to perform data-plane lookups. Each such route entry has the same fields and gets inserted into a data structure with a predefined shape. Where special behavior is needed for routes of different kinds, that behavior can be implemented by using dynamic dispatch in the sense it's usually used in C++, Java, Rust, etc. to call a method associated with a route entry, or using other techniques common to statically compiled languages—there is a fixed set of such operations defined up front. Adding and removing entries from the routing table at runtime does not typically implicate type safety because the types used to describe the table describe all of its possible valid states. For instance, the type for a node in a radix trie might describe how it can either be leaf node or contain subnodes, etc.

> The schema need not change. The entries/relations between objects changing is sufficient to violate type-safety. > > Route add( str1, str2) to Number.add().

It's obviously not always true that entries or relations changing will validate type safety; any non-trivial system will let you perform some kinds of data manipulation at runtime. Conventional static type systems will allow some kinds of mutations (like changing around pointers in a radix trie to insert a new node) but will not have the flexibility to support some others (like changing the shape of a dispatch table at runtime).

One kind of call pattern which is incompatible with statically compiled dynamic dispatch is where the types of parameters change along with the base type which owns the dispatch table; I think this is what your add() example is getting at—you need the type of the second parameter to match the first, which you can't validate without runtime checks if you don't know what concrete implementations will be in use at runtime. In the case of a routing table I don't think this kind of polymorphism is needed though; I can't think of an instance where an operation would fundamentally require a fixed relation in the concrete types of different routes. For instance, when routes overlap you can derive a priority value for each one to decide which one to use, rather than directly implementing some kind of function whichIsBetter(a, b) which relies on knowing what concrete route kinds a and b are.

replies(5): >>27658074 #>>27658294 #>>27658798 #>>27659025 #>>27660074 #
23. ukj ◴[] No.27658074{5}[source]
The conclusion was my very first post really. Understand the limits of your tools; and your own limits - then make acceptable trade offs.

Programming is about explaining desires to computers. It is a complex, error-prone activity and we depend on our automated tooling to save us from ourselves.

Obviously, in theory you can build the exact same software in Assembly as you can with Java. But not in practice because the programming language is a human-computer interface and a symbiosis with useful feedback loops emerges at higher levels of abstraction.

The design choices of your language impose certain limits/discipline on your expressive power.

Some design patterns become easy to express; where others become difficult to express and your paradigm’s features become obstacles.

At which point it is your choice to switch off the guard rails.

Like the chap somewhere down below telling me he will use Rust “unsafe” so he can have direct control of his memory. Yes! You can! You can also read/write to /dev/mem directly!

Translated in English “I will turn off the compiler’s safety checks because it is getting in my way”. There are many ways to shoot yourself in the foot with this level of power.

He is agreeing with me, but this is HN and opposition must be maintained. Dung must be flung. For reasons.

Sure, for certain kinds of protocols this kind of polymorphism is not always needed. But I am merely pointing out the threshold where the type-safe paradigm begins to falter. The corner cases where the pros don’t outweigh the cons. Where the “compile time” and “runtime” distinction becomes a hindrance rather than a useful separation of concerns.

You can go further. Software Defined Networks. Application router.

In general, contexts in which it would be useful to know the shape of your data at runtime!

As a spontaneous aside… if your entire information-processing system is well-typed (top to bottom) - congratulations. You are well on your way to understanding Category Theory.

The entire data-schema of your stack is what Mathemaricians call a (small) “Category”.

24. ukj ◴[] No.27658294{5}[source]
> I can't think of an instance where an operation would fundamentally require a fixed relation in the concrete types of different routes. For instance, when routes overlap you can derive a priority value for each one to decide which one to use, rather than directly implementing some kind of function whichIsBetter(a, b) which relies on knowing what concrete route kinds a and b are.

Any routing logic which makes routing decisions based on information nested arbitrarily deep into your packets.

Is this an IP packet containing data; or an IP packet containing an HTTP GET request to www.google.com?

You have to do arbitrarily deep type inference in real time.

The the data has no semantic content until you infer some! Is just a bitstream!

25. ukj ◴[] No.27658798{5}[source]
>If it's the latter I'm not really sure how the analogy works—if programming languages are to routers as types are to routing entries, what in a router is analogous to a value of a given type?

It's not an analogy. Programming languages and routers are particular instances of computable functions. "dispatching" and "routing" are just another example of us using different English words to describe the exact same computation: M:N mapping function.

Whether the input is mapped to an IP address or a memory address - boring implementation details.

Nothing in a router is analogous to a value of a type because there is no such thing as "types" at runtime unless you infer them! Types exist only at compile time. Types are semantic annotations of data. You are helping the compiler help you by telling it what you know about the data you are handling.

This blob encodes a Number. That blob encodes a String.

If you don't want any help from your compiler, you don't have to tell it the types of anything - just manipulate the data directly!

That's precisely what an Assembly language do. Everything is untyped.

26. ukj ◴[] No.27659025{5}[source]
>Where special behavior is needed for routes of different kinds, that behavior can be implemented by using dynamic dispatch in the sense it's usually used.

How? A routing function is precisely a M:N mapping with untyped input.

It's just some data somebody sent you! Route it.

Unless you can infer more about the meaning of those bits the only strategies possible are static(M), round-robin(M) or random(M).