←back to thread

Parse, Don't Validate (2019)

(lexi-lambda.github.io)
389 points melse | 1 comments | | HN request time: 0.214s | source
Show context
seanwilson ◴[] No.27640953[source]
From the Twitter link:

> IME, people in dynamic languages almost never program this way, though—they prefer to use validation and some form of shotgun parsing. My guess as to why? Writing that kind of code in dynamically-typed languages is often a lot more boilerplate than it is in statically-typed ones!

I feel that once you've got experience working in (usually functional) programming languages with strong static type checking, flakey dynamic code that relies on runtime checks and just being careful to avoid runtime errors makes your skin crawl, and you'll intuitively gravitate towards designs that takes advantage of strong static type checks.

When all you know is dynamic languages, the design guidance you get from strong static type checking is lost so there's more bad design paths you can go down. Patching up flakey code with ad-hoc runtime checks and debugging runtime errors becomes the norm because you just don't know any better and the type system isn't going to teach you.

More general advice would be "prefer strong static type checking over runtime checks" as it makes a lot of design and robustness problems go away.

Even if you can't use e.g. Haskell or OCaml in your daily work, a few weeks or just of few days of trying to learn them will open your eyes and make you a better coder elsewhere. Map/filter/reduce, immutable data structures, non-nullable types etc. have been in other languages for over 30 years before these ideas became more mainstream best practices for example (I'm still waiting for pattern matching + algebraic data types).

It's weird how long it's taking for people to rediscover why strong static types were a good idea.

replies(10): >>27641187 #>>27641516 #>>27641651 #>>27641837 #>>27641858 #>>27641960 #>>27642032 #>>27643060 #>>27644651 #>>27657615 #
ukj ◴[] No.27641651[source]
Every programming paradigm is a good idea if the respective trade-offs are acceptable to you.

For example, one good reason why strong static types are a bad idea... they prevent you from implementing dynamic dispatch.

Routers. You can't have routers.

replies(3): >>27641741 #>>27642043 #>>27642764 #
justinpombrio ◴[] No.27642764[source]
Are you sure you know what dynamic dispatch is? Java has dynamic dispatch, and it is a statically typed language. In Java, it's often called "runtime polymorphism".

https://www.geeksforgeeks.org/dynamic-method-dispatch-runtim...

And using it doesn't give up any of Java's type safety guarantees. The arguments and return type of the method you call (which will be invoked with dynamic dispatch) are type checked.

replies(2): >>27643722 #>>27650975 #
ukj ◴[] No.27650975[source]
In English there seems to be the eternal confusion between what things are and what we call them.

When a router does lookups from a static table it's "static routing".

When Java does lookups from a static table it's "dynamic dispatch".

The same type of computation is being characterised as both "static" and "dynamic".

When a router does lookups from a dynamic table it's "dynamic routing" - there is no equivalent in Java because making the dispatch table reflexive/mutable is precisely what violates type-safety!

replies(1): >>27654120 #
wabain ◴[] No.27654120[source]
I don't think this analogy quite holds together. A router doing lookups from a table is implementing a static routing strategy from a control plane perspective; it's using statically configured values instead of using dynamic information about the network topology gleaned using a routing protocol like BGP. But an implementation of that strategy in terms of table lookups is dynamic—it's walking a data structure to retrieve values which were specified in the runtime configuration, not at compile time.

The reason that "dynamic dispatch" in Java, etc. is called that is that the instance of the dispatch table to use is chosen dynamically, rather than being fixed at the callsite. It's true that Java doesn't let the shape of the dispatch table change at runtime, but that's not what dynamic vs static refers to conventionally in this context. The ability to dynamically add and remove methods from a class is something which you typically only get in dynamically typed languages but dynamic dispatch and dynamic typing are not the same thing.

In particular, while a full-featured routing information base implementation will usually use some form of dynamic dispatch to customize the behavior of routes originated through different protocols, it's very uncommon for an implementation to rely on dynamic typing which adds or mutates the methods associated with different entities. That's simply a different kind of tool used for different purposes. It's something which can be helpful in object-relational mapping, for instance, because you can create methods based on a dynamic database schema. The RIB is not going to have a schema like that which changes at runtime.

replies(1): >>27654493 #
ukj ◴[] No.27654493[source]
>But an implementation of that strategy in terms of table lookups is dynamic—it's walking a data structure to retrieve values which were specified in the runtime configuration, not at compile time.

That's precisely the point. You can specify part of the routing table at compile/configure time - the rest gets generated at runtime.

The data/control plane distinction is conceptual. It doesn't hold in memory when the router is handling its own network traffic - it has a single routing table/world-view.

My own routing table is shared by the data plane AND control plane.

At some point you will receive an external data (routing update) which requires runtime validation, you will do reflection and update your own routing table (ring 0 address space) based on external events.

>The RIB is not going to have a schema like that which changes at runtime.

The schema need not change. The entries/relations between objects changing is sufficient to violate type-safety.

Route add( *str1, *str2) to Number.add().

replies(1): >>27657552 #
wabain ◴[] No.27657552[source]
I mean, there's ample evidence on this thread to suggest we're not going to reach a productive conclusion here but I guess I'll keep biting.

It's not clear to me if you are suggesting that a dynamic routing table with different kinds of routes cannot be implemented in a type-safe manner in a statically typed language, or if you're working with an analogy where a routing table is like a dynamic programming language at runtime, in that a static set of entities and relations are known ahead of time and those are modified by runtime input. If it's the latter I'm not really sure how the analogy works—if programming languages are to routers as types are to routing entries, what in a router is analogous to a value of a given type?

I can speak more to the former possibility; here's a rough sketch of how one could implement a routing table using the tools available in a statically typed environment (and in a type-safe way). One way to do it (I believe the common way, and certainly the only one I've seen in commercial router implementations) is to treat statically populated and dynamically learned routes more or less uniformly in the data structures used to perform data-plane lookups. Each such route entry has the same fields and gets inserted into a data structure with a predefined shape. Where special behavior is needed for routes of different kinds, that behavior can be implemented by using dynamic dispatch in the sense it's usually used in C++, Java, Rust, etc. to call a method associated with a route entry, or using other techniques common to statically compiled languages—there is a fixed set of such operations defined up front. Adding and removing entries from the routing table at runtime does not typically implicate type safety because the types used to describe the table describe all of its possible valid states. For instance, the type for a node in a radix trie might describe how it can either be leaf node or contain subnodes, etc.

> The schema need not change. The entries/relations between objects changing is sufficient to violate type-safety. > > Route add( str1, str2) to Number.add().

It's obviously not always true that entries or relations changing will validate type safety; any non-trivial system will let you perform some kinds of data manipulation at runtime. Conventional static type systems will allow some kinds of mutations (like changing around pointers in a radix trie to insert a new node) but will not have the flexibility to support some others (like changing the shape of a dispatch table at runtime).

One kind of call pattern which is incompatible with statically compiled dynamic dispatch is where the types of parameters change along with the base type which owns the dispatch table; I think this is what your add() example is getting at—you need the type of the second parameter to match the first, which you can't validate without runtime checks if you don't know what concrete implementations will be in use at runtime. In the case of a routing table I don't think this kind of polymorphism is needed though; I can't think of an instance where an operation would fundamentally require a fixed relation in the concrete types of different routes. For instance, when routes overlap you can derive a priority value for each one to decide which one to use, rather than directly implementing some kind of function whichIsBetter(a, b) which relies on knowing what concrete route kinds a and b are.

replies(5): >>27658074 #>>27658294 #>>27658798 #>>27659025 #>>27660074 #
1. ukj ◴[] No.27658798[source]
>If it's the latter I'm not really sure how the analogy works—if programming languages are to routers as types are to routing entries, what in a router is analogous to a value of a given type?

It's not an analogy. Programming languages and routers are particular instances of computable functions. "dispatching" and "routing" are just another example of us using different English words to describe the exact same computation: M:N mapping function.

Whether the input is mapped to an IP address or a memory address - boring implementation details.

Nothing in a router is analogous to a value of a type because there is no such thing as "types" at runtime unless you infer them! Types exist only at compile time. Types are semantic annotations of data. You are helping the compiler help you by telling it what you know about the data you are handling.

This blob encodes a Number. That blob encodes a String.

If you don't want any help from your compiler, you don't have to tell it the types of anything - just manipulate the data directly!

That's precisely what an Assembly language do. Everything is untyped.