The world could run on older hardware if software optimization was a priority

(twitter.com)

848 points turrini | 1 comments | 13 May 25 10:31 UTC | HN request time: 0.301s | source

Show context

cogman10 ◴[13 May 25 15:27 UTC] No.43974007[source]▶

I'm going to be pretty blunt. Carmack gets worshiped when he shouldn't be. He has several bad takes in terms of software. Further, he's frankly behind the times when it comes to the current state of the software ecosystem.

I get it, he's legendary for the work he did at id software. But this is the guy who only like 5 years ago was convinced that static analysis was actually a good thing for code.

He seems to have a perpetual view on the state of software. Interpreted stuff is slow, networks are slow, databases are slow. Everyone is working with Pentium 1s and 2MB of ram.

None of these are what he thinks they are. CPUs are wicked fast. Interpreted languages are now within a single digit multiple of natively compiled languages. Ram is cheap and plentiful. Databases and networks are insanely fast.

Good on him for sharing his takes, but really, he shouldn't be considered a "thought leader". I've noticed his takes have been outdated for over a decade.

I'm sure he's a nice guy, but I believe he's fallen into a trap that many older devs do. He's overestimating what the costs of things are because his mental model of computing is dated.

replies(2): >>43975300 #>>43982378 #

xondono ◴[13 May 25 17:15 UTC] No.43975300[source]▶

>>43974007 #

> Interpreted languages are now within a single digit multiple of natively compiled languages.

You have to be either clueless or delusional if you really believe that.

replies(2): >>43976350 #>>43978131 #

cogman10 ◴[13 May 25 21:47 UTC] No.43978131[source]▶

>>43975300 #

Let me specify that what I'm calling interpreted (and I'm sure carmack agrees) is languages with a VM and JIT.

The JVM and Javascript both fall into this category.

The proof is in the pudding. [1]

The JS version that ran in 8.54 seconds [2] did not use any sort of fancy escape hatches to get there. It's effectively the naive solution.

But if you look at the winning C version, you'll note that it went all out pulling every single SIMD trick in the book to win [3]. And with all that, the JS version is still only ~4x slower (single digit multiplier).

And if you look at the C++ version which is a near direct translation [4] which isn't using all the SIMD tricks in the books to win, it ran in 5.15. Bringing the multiple down to 1.7x.

Perhaps you weren't thinking of these JIT languages as being interpreted. That's fair. But if you did, you need to adjust your mental model of what's slow. JITs have come a VERY long way in the last 20 years.

I will say that languages like python remain slow. That wasn't what I was thinking of when I said "interpreted". It's definitely more than fair to call it an interpreted language.

[1] https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

[2] https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

[3] https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

[4] https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

replies(2): >>43979338 #>>43995658 #

xondono ◴[15 May 25 14:47 UTC] No.43995658[source]▶

>>43978131 #

And here we get into the absurdity of microbenchmarks like these.

Yes, you can get the JVM to crunch some numbers relatively fast, particularly if the operations are repetitive enough that you can pull JIT tricks.

Now try to run something that looks a bit more like an actual application and less like a cherry picked example, and as soon as you start moving memory around the gap jumps to orders of magnitude.

replies(1): >>43996896 #

1. cogman10 ◴[15 May 25 16:52 UTC] No.43996896[source]▶

>>43995658 #

> and as soon as you start moving memory around the gap jumps to orders of magnitude.

Depends on what you mean by "moving memory". In terms of heap allocation performance, the JVM and I suspect the JS engine will end up trouncing C/C++. Why? Because a GC allocator will outperform manual memory management in terms of absolute allocation rate. Because memory for a C++ allocator is pinned, unless you do a bunch of work utilizing things like bump allocators you'll simply flounder in terms of what the JVM does by default.

All(?) of the JVM GCs are moving collectors. That means new allocation ends being a simple pointer bump with a bounds check. Really hard to beat in terms of perf.

But it doesn't stop there. If we are talking real applications, then one real thing the JVM does better than C++ is concurrent programming. A GC works far better for managing concurrent data. So much so that when you look at high performance multithreaded C++ code you'll often find a garbage collector implementation. Something that Java gets by default. If you aren't google writing chrome, you are either aggressively copying memory or you are using something like atomic reference counters. Both of which will absolutely nuke performance.

But that's not all. One other thing the JVM does better than C++ can generally hope to do is devirtualization. In real applications, you are likely either in template hell for C++ to get performance or you are dealing with some form of interfaces and inheritance which ultimately creates code that the compiler will struggle to optimize without going to an extreme like PGO. The JVM gets PGO for free because it doesn't have to blindly optimize methods.

These simple microbenchmarks are a gift to C/C++, not the JVM.

↑