There has been so much ink spilled on the question of what kind of type systems help programmers be productive but there is not such controversy on the performance side.
There has been so much ink spilled on the question of what kind of type systems help programmers be productive but there is not such controversy on the performance side.
I’m mostly familiar with Haskell which does type erasure. I think the means that there is no type checking at run time.
I think dependently typed languages would the benefit of knowing structure at compile time enough to detect things like dead branches and impossible cases which can optimize by removing code. I’m not sure that all types are erased in such languages though.
You are correct that you can often engineer the performance later. But layout concerns and nominal types for performance are fairly non controversial.
If you have both nominal types and structural types in your language, you can already do this, while keeping the ability to be nominal only or structural only when you want.
In the following OCaml code variant inference in pattern matching is not automatic (although the compiler will warn you about the missing cases, which helps you figuring what to write), but the types do get refined in the corresponding branch.
type 'a tree =
Tree of
[ `Empty
| `Branch of ('a tree * 'a * 'a tree)
(* Unfortunately, inline records are not supported here,
so you have to use tuples, objects, or a mutually recursive record
type. *)
]
[@@unboxed]
(* You can give aliases to subsets of constructors *)
type 'a branch =
[ `Branch of ('a tree * 'a * 'a tree) ]
let f (x : 'a branch) = ()
let f x =
match x with
| Tree `Empty -> ()
(* You need to be explicit about the cases you want to handle. This pattern
could also be written `Tree (#branch as rest)`. *)
| Tree (`Branch _ as rest) -> f rest
The one feature I'd really like in this space would be the ability to refer to a subset of the constructors of a nominal types as if it was a (restricted) polymorphic variant that could only be recombined with another subset of constructors of the same nominal type. It would allow some of the power of polymorphic variants without losing the efficient representation allowed by nominal types knowing the possible variants ahead of time.If you're going to quote Knuth you better be damn sure you've fully understood him.
The quote in question is about micro-optimisations which were pointless on account of design issues. The situation you are commenting on is kind of the opposite.
[1] https://en.wikipedia.org/wiki/Worst-case_optimal_join_algori...
Consider two functions, say copySurname and copyStreetName. They do same exact thing, but in different context. No sane person would call copySurname for the street, even though the function is the same.
So there is this tension, should name of the function (or of the type) only reflect it's internal structure (structuralism), or should it also reflect the intended domain application (nominalism)?
There is a formalism school of mathematics that is pretty much the hardcore structuralist, i.e. names of the objects don't matter, only their relationships. But most mathematicians (and all programmers) reject that view.
That way, all nominalism is simply a descriptive wrapper over structuralism.
https://en.wikipedia.org/wiki/AoS_and_SoA
(which I think is weakened in the Wikipedia version) goes to the heart of where profiling can hide huge opportunities for optimization.
It is a good prop bet that rewriting something (that doesn't allocate lots of dynamic memory) that is SoA style in AoS style will speed it up significantly. The victim is emotionally attached to their code and the profiler would never show you the many small costs, often distributed through the memory system, that add up. In a case like that they might accuse you of cheating by applying a bunch of small optimizations which are often easier to express in SoA if not downright obvious. When they apply the same optimizations to their code they will speed it up but not as much.
Oddly it's a discussion that's been going on even since people were writing games in assembly language in the 1980s and probably since that; it would be interesting to see more tools which are AoS on the inside but look SoA on the outside.
The distinction is useful even when it’s structurally identical to another function.
Two identical functions in different contexts or domains often diverge. When they’re not arbitrarily bound by the fact they have the same contents it’s easy to extend one and not the other.
During compilation, they could both end up using the same actual implementation (though I’m not sure if any compiler does this).
At this point in the collective journey we are all on understanding programming languages and what they can do, the evidence is overwhelming that there are in fact plenty of useful semantics that are intrinsically slower than other useful semantics, relative to any particular chunk of hardware executing them. That is, what is slow on CPU or GPU may differ, but there is certainly some sort of difference that will exist, and there is no amount of (feasible) compiling around that problem.
Indeed, that's why we have CPUs and GPUs and soon NPUs and in the future perhaps other types of dedicated processors... precisely because not all semantics can be implemented equally no matter how smart you are.
What I find horrifying is that types are most useful when they are more than just a shape. Just because two things have a type X[A] {X[A], A, X[A]} doesn't mean it's the same binary tree. A trivial example is that one tree could be sorting by the "id" field and another could be sorting on a "username" field, and simply casting one to the other means you've now got a garbage data structure and none of the functions/methods operating on it are going to return good data now. It's a bit of a culture shock to see such Haskell-esque type notation combined with excitement over blowing away everything about a type other than its brute shape.
If you want to add two things in Python they could be totally different types so at runtime the program is likely to have pointers in those registers and it is going to have to look at the objects and figure out what it has to do to add them, possibly calling the __add__ method on the object. In the obvious implementation you are having to fetch the int out of memory when you could have just got it out of the register.
Now many languages play tricks with pointers, we are not filling out a 64-bit address space any time soon, so we could make 'pointers' with certain bit patterns host integers and other numbers inside. Still it is a lot harder to add two of those than it is to just add two values we know are integers.
With user types it is pretty much the same, particularly in single inheritance languages like Java where it is dead obvious that you can access field A of type X by adding a fixed offset to the object pointer. In a language like Python or Javascript you are looking everything up in a hashtable, even if it is a fast hashtable. (You don't consistently win using slots.)
A really advanced compiler/runtime could undo some of this stuff, for instance it might be clear that in a particular case you are adding x+y and those are both going to be 64-bit ints and you can specialize it. You could do this at build time or even at runtime see that function has been called in a particular context 10,000 and you get the chance to inline that function into the function that calls it and then inline that function and then recompile it with the fastest code but it is tricky to get right. If an x that isn't an int finds its way in there you need to despecialize the function without breaking anything.
PyPy shows that Python could be greatly improved over what it is, I think CPython is going to approach it in speed gradually.
- The overwhelming majority of new code is written in high-level languages
- High-level languages have continued to close what small performance gaps remain
- There have been no serious efforts to implement a true low-level language for post-Pentium (superscalar) CPUs, yet alone the CPUs of today
- Even GPUs and NPUs are largely programmed by using languages that express largely the same semantics as languages for CPUs, and relying heavily on compiler optimisation
You can easily observe in any cross-language shootout in 2024 that optimized code bases in the various languages still have gradients and we do not live in a world where you can just start slamming out Python code and expect no performance delta against C.
https://prog21.dadgum.com/40.html
Merely smart compilers are amazing; one of the defining characteristics of the software world is that you can be handed these things for free. The "sufficiently smart compiler", however, does not exist, and while there is no mathematical proof that I'm aware of that they are impossible, after at least two decades of people trying to produce them and totally failing, the rational expectation at this point must be that they do not exist.
The compiler can still know and optimize the data layout of any static structural type and tuples certainly are optimized in e.g. Rust. However, the flexibility of other structural types like polymorphic variants or set-theoretic types like unions mean that the data layout also needs to be more flexible and that comes with some overhead e.g. vs a nominal sum type (like in ML, Rust enums, etc) or struct/record.
Missing data layout optimizations comes mostly due to necessary uniform representation of dynamic types - the same as for nominal types - e.g. through runtime polymorphism language features as OCaml has by default, virtual methods or Rust's trait objects. Whether a structural type system allows such dynamic features (e.g. OCaml's row polymorphism) or not is a design question - I think there's still plenty of use for structural types in a language with only compile time polymorphism (monomorphization).
See also deredede's comment https://news.ycombinator.com/item?id=42191956
Python does not attempt to apply anything more complex than that peephole optimizer, whatever that means.
Judging from my experience with Python, that peephole optimizer cannot lift operations on "dictionary with default value" to a regular dictionary operations using conditionals. I had to manually rewrite my program to use conditionals to recover it's speed.
Microbenchmarks may show an advantage for C - but it's one that is shrinking all the time (and that goes doubly for Java, which was the go-to example in the original "sufficiently smart compiler" conversations - but no longer is, because you can't actually be confident that Java is going to perform worse than C any more). And the overwhelming majority of the time, for real-world business problems, people do just start slamming out Python code, and if anything it tends to perform better.
And conversely even those C programs now rely extremely heavily on compiler smartness to reorder instructions, autovectorise, etc., often producing something quite radically different from what a naive reading of the code would mean - and there is no real appetite for a language that doesn't do this, one with semantics designed to perform well on today's CPUs or GPUs. Which suggests that designing the language semantics for performance is not actually particularly important.
Best of luck in your engineering endeavors if you ever end up in a place you need high performance code. You're going to need a lot of it.