Most active commenters
  • PaulHoule(4)
  • lmm(3)
  • thesz(3)
  • jerf(3)

←back to thread

42 points todsacerdoti | 20 comments | | HN request time: 1.757s | source | bottom
1. PaulHoule ◴[] No.42190446[source]
I think is forgotten here that one of the benefits of nominal typing is that the compiler can know that data layout at run time so performance benefits.

There has been so much ink spilled on the question of what kind of type systems help programmers be productive but there is not such controversy on the performance side.

replies(3): >>42190752 #>>42191794 #>>42199051 #
2. agentultra ◴[] No.42190752[source]
Do you mean at compile time?

I’m mostly familiar with Haskell which does type erasure. I think the means that there is no type checking at run time.

I think dependently typed languages would the benefit of knowing structure at compile time enough to detect things like dead branches and impossible cases which can optimize by removing code. I’m not sure that all types are erased in such languages though.

replies(1): >>42192089 #
3. lmm ◴[] No.42191794[source]
Premature optimisation is the root of all evil. If you get the semantics right then good performance will generally follow, if you make the wrong thing then it doesn't matter how fast it is.
replies(4): >>42191922 #>>42192250 #>>42192332 #>>42194078 #
4. taeric ◴[] No.42191922[source]
Performant code in the wild largely disagrees? This is the difference between a numpy array versus a regular python list. It is stark.

You are correct that you can often engineer the performance later. But layout concerns and nominal types for performance are fairly non controversial.

replies(1): >>42192750 #
5. taeric ◴[] No.42192089[source]
I assume they mean the simple fact that nominal types can be used in optimizations. Is why float versus double or int can be a big difference.

For user types, explicitly, it is less of a benefit. But can still show up.

replies(1): >>42194387 #
6. biorach ◴[] No.42192250[source]
> Premature optimisation is the root of all evil

If you're going to quote Knuth you better be damn sure you've fully understood him.

The quote in question is about micro-optimisations which were pointless on account of design issues. The situation you are commenting on is kind of the opposite.

replies(1): >>42193643 #
7. fanf2 ◴[] No.42192332[source]
Good data layout is the root of all optimization.
replies(1): >>42192934 #
8. thesz ◴[] No.42192750{3}[source]
Python does not do optimizations at all.

In contrast, you can find Fortran compilers that can substitute your inefficient implementation of some numerical algorithm with different much more performant one.

replies(1): >>42193650 #
9. thesz ◴[] No.42192934{3}[source]
Worst-case optimal join algorithm [1] might have a different opinion.

[1] https://en.wikipedia.org/wiki/Worst-case_optimal_join_algori...

10. PaulHoule ◴[] No.42193643{3}[source]
I would say this debate

https://en.wikipedia.org/wiki/AoS_and_SoA

(which I think is weakened in the Wikipedia version) goes to the heart of where profiling can hide huge opportunities for optimization.

It is a good prop bet that rewriting something (that doesn't allocate lots of dynamic memory) that is SoA style in AoS style will speed it up significantly. The victim is emotionally attached to their code and the profiler would never show you the many small costs, often distributed through the memory system, that add up. In a case like that they might accuse you of cheating by applying a bunch of small optimizations which are often easier to express in SoA if not downright obvious. When they apply the same optimizations to their code they will speed it up but not as much.

Oddly it's a discussion that's been going on even since people were writing games in assembly language in the 1980s and probably since that; it would be interesting to see more tools which are AoS on the inside but look SoA on the outside.

11. PaulHoule ◴[] No.42193650{4}[source]
Python has a peephole optimizer: https://akaptur.com/blog/2014/08/02/the-cpython-peephole-opt...
replies(1): >>42194022 #
12. maccard ◴[] No.42194022{5}[source]
I think what OP meant was "Python's optimisations are not very fast", which (IMO) is a fair statement. Python is many, many things, but it's best case scenarios (without rewriting in C, at which point you're writing C not python) are slower than the naive implementations in many other languages.
replies(1): >>42199243 #
13. jerf ◴[] No.42194078[source]
This is basically a form of the "Sufficiently Smart Compiler" claim. It is generally false, barring a definition of "semantics right" that simply includes a presumption that the "semantics" will be chosen for performance in which case the claim becomes circular.

At this point in the collective journey we are all on understanding programming languages and what they can do, the evidence is overwhelming that there are in fact plenty of useful semantics that are intrinsically slower than other useful semantics, relative to any particular chunk of hardware executing them. That is, what is slow on CPU or GPU may differ, but there is certainly some sort of difference that will exist, and there is no amount of (feasible) compiling around that problem.

Indeed, that's why we have CPUs and GPUs and soon NPUs and in the future perhaps other types of dedicated processors... precisely because not all semantics can be implemented equally no matter how smart you are.

replies(1): >>42194415 #
14. PaulHoule ◴[] No.42194387{3}[source]
In the case of simple types, say a 64-bit int if the compiler knows what the types are ahead of time it can store the ints in registers and add them with one assembly language instruction. Same if it is floating point.

If you want to add two things in Python they could be totally different types so at runtime the program is likely to have pointers in those registers and it is going to have to look at the objects and figure out what it has to do to add them, possibly calling the __add__ method on the object. In the obvious implementation you are having to fetch the int out of memory when you could have just got it out of the register.

Now many languages play tricks with pointers, we are not filling out a 64-bit address space any time soon, so we could make 'pointers' with certain bit patterns host integers and other numbers inside. Still it is a lot harder to add two of those than it is to just add two values we know are integers.

With user types it is pretty much the same, particularly in single inheritance languages like Java where it is dead obvious that you can access field A of type X by adding a fixed offset to the object pointer. In a language like Python or Javascript you are looking everything up in a hashtable, even if it is a fast hashtable. (You don't consistently win using slots.)

A really advanced compiler/runtime could undo some of this stuff, for instance it might be clear that in a particular case you are adding x+y and those are both going to be 64-bit ints and you can specialize it. You could do this at build time or even at runtime see that function has been called in a particular context 10,000 and you get the chance to inline that function into the function that calls it and then inline that function and then recompile it with the fastest code but it is tricky to get right. If an x that isn't an int finds its way in there you need to despecialize the function without breaking anything.

PyPy shows that Python could be greatly improved over what it is, I think CPython is going to approach it in speed gradually.

15. lmm ◴[] No.42194415{3}[source]
From where I stand the sufficiently smart compiler paradigm is already dominant, and getting more so every day.

- The overwhelming majority of new code is written in high-level languages

- High-level languages have continued to close what small performance gaps remain

- There have been no serious efforts to implement a true low-level language for post-Pentium (superscalar) CPUs, yet alone the CPUs of today

- Even GPUs and NPUs are largely programmed by using languages that express largely the same semantics as languages for CPUs, and relying heavily on compiler optimisation

replies(1): >>42196556 #
16. jerf ◴[] No.42196556{4}[source]
That's not what the "sufficiently smart compiler" is. The "sufficiently smart compiler" is one that allows you to write any amazingly capable language you want with whatever amazing semantics you want, and it compiles it into code that is competitive with C written for speed.

You can easily observe in any cross-language shootout in 2024 that optimized code bases in the various languages still have gradients and we do not live in a world where you can just start slamming out Python code and expect no performance delta against C.

https://prog21.dadgum.com/40.html

Merely smart compilers are amazing; one of the defining characteristics of the software world is that you can be handed these things for free. The "sufficiently smart compiler", however, does not exist, and while there is no mathematical proof that I'm aware of that they are impossible, after at least two decades of people trying to produce them and totally failing, the rational expectation at this point must be that they do not exist.

replies(1): >>42199816 #
17. k_g_b_ ◴[] No.42199051[source]
I think you're confusing nominal with static and structural with dynamic. This might be because there's few instances of more powerful static structural types implemented in common programming languages - OCaml's polymorphic variants, Typescript (partially dynamic). Very common structural types available in many statically typed programming languages are tuples.

The compiler can still know and optimize the data layout of any static structural type and tuples certainly are optimized in e.g. Rust. However, the flexibility of other structural types like polymorphic variants or set-theoretic types like unions mean that the data layout also needs to be more flexible and that comes with some overhead e.g. vs a nominal sum type (like in ML, Rust enums, etc) or struct/record.

Missing data layout optimizations comes mostly due to necessary uniform representation of dynamic types - the same as for nominal types - e.g. through runtime polymorphism language features as OCaml has by default, virtual methods or Rust's trait objects. Whether a structural type system allows such dynamic features (e.g. OCaml's row polymorphism) or not is a design question - I think there's still plenty of use for structural types in a language with only compile time polymorphism (monomorphization).

See also deredede's comment https://news.ycombinator.com/item?id=42191956

18. thesz ◴[] No.42199243{6}[source]
Python optimizations are not deep and/or broad. I did not mean "very fast" as in "compilation speed."

Python does not attempt to apply anything more complex than that peephole optimizer, whatever that means.

Judging from my experience with Python, that peephole optimizer cannot lift operations on "dictionary with default value" to a regular dictionary operations using conditionals. I had to manually rewrite my program to use conditionals to recover it's speed.

19. lmm ◴[] No.42199816{5}[source]
> You can easily observe in any cross-language shootout in 2024 that optimized code bases in the various languages still have gradients and we do not live in a world where you can just start slamming out Python code and expect no performance delta against C.

Microbenchmarks may show an advantage for C - but it's one that is shrinking all the time (and that goes doubly for Java, which was the go-to example in the original "sufficiently smart compiler" conversations - but no longer is, because you can't actually be confident that Java is going to perform worse than C any more). And the overwhelming majority of the time, for real-world business problems, people do just start slamming out Python code, and if anything it tends to perform better.

And conversely even those C programs now rely extremely heavily on compiler smartness to reorder instructions, autovectorise, etc., often producing something quite radically different from what a naive reading of the code would mean - and there is no real appetite for a language that doesn't do this, one with semantics designed to perform well on today's CPUs or GPUs. Which suggests that designing the language semantics for performance is not actually particularly important.

replies(1): >>42204534 #
20. jerf ◴[] No.42204534{6}[source]
If you honestly believe that Python is already, right now, in practice, as fast as C on all tasks and nobody need to even consider the matter of performance because the Sufficiently Smart Compiler is already a reality that we can count on, you and I are too far apart to ever come to any agreement on this matter.

Best of luck in your engineering endeavors if you ever end up in a place you need high performance code. You're going to need a lot of it.