←back to thread

57 points mghackerlady | 1 comments | | HN request time: 0.198s | source
Show context
codeflo ◴[] No.45990893[source]
It's easily imaginable that there are new CPU features that would help with building an efficient Java VM, if that's the CPU's primary purpose. Just from the top of my head, one might want a form of finer-grainer memory virtualization that could enable very cheap concurrent garbage collection.

But having Java bytecode as the actual instruction set architecture doesn't sound too useful. It's true that any modern processor has a "compilation step" into microcode anyway, so in an abstract sense, that might as well be some kind of bytecode. But given the high-level nature of Java's bytecode instructions in particular, there are certainly some optimizations that are easy to do in a software JIT, and that just aren't practical to do in hardware during instruction decode.

What I can imagine is a purpose-built CPU that would make the JIT's job a lot easier and faster than compiling for x86 or ARM. Such a machine wouldn't execute raw Java bytecode, rather, something a tiny bit more low-level.

replies(4): >>45991683 #>>45991954 #>>45991995 #>>45992555 #
pron ◴[] No.45991683[source]
Running Java workloads is very important for most CPUs these days, and both ARM and Intel consult with the Java team on new features (although Java's needs aren't much different from those of C++). But while you're right that with modern JITs, executing Java bytecode directly isn't too helpful, our concurrent collectors are already very efficient (they could, perhaps, take advantage of new address masking features).

I think there's some disconnect between how people imagine GCs work and how the JVMs newest garbage collectors actually work. Rather than exacting a performance cost, they're more often a performance boost compared to more manual or eager memory management techniques, especially for the workloads of large, concurrent servers. The only real cost is in memory footprint, but even that is often misunderstood, as covered beautifully in this recent ISMM talk (that I would recommend to anyone interested in memory management of any kind): https://youtu.be/mLNFVNXbw7I. The key is that moving-tracing collectors can turn available RAM into CPU cycles, and some memory management techniques under-utilise available RAM.

replies(2): >>45991802 #>>45992201 #
xmcqdpt2 ◴[] No.45992201[source]
> The only real cost is in memory footprint

There are also load and store barriers which add work when accessing objects from the heap. In many cases, adding work in the parallel path is good if it allows you to avoid single-threaded sections, but not in all cases. Single-threaded programs with a lot of reads can be pretty significantly impacted by barriers,

https://rodrigo-bruno.github.io/mentoring/77998-Carlos-Gonca...

The Parallel GC is still useful sometimes!

replies(1): >>45993521 #
1. pron ◴[] No.45993521[source]
Sure, but other forms of memory management are costly, too. Even if you allocate everything from the OS upfront and then pool stuff, you still need to spend some computational work on the pool [1]. Working with bounded memory necessarily requires spending at least some CPU on memory management. It's not that the alternative to barriers is zero CPU spent on memory management.

> The Parallel GC is still useful sometimes!

Certainly for batch-processing programs.

BTW, the paper you linked is already at least somewhat out of date, as it's from 2021. The implementation of the GCs in the JDK changes very quickly. The newest GC in the JDK (and one that may be appropriate for a very large portion of programs) didn't even exist back then, and even G1 has changed a lot since. (Many performance evaluations of HotSpot implementation details may be out of date after two years.)

[1]: The cheapest, which is similar in some ways to moving-tracing collectors, especially in how it can convert RAM to CPU, is arenas, but they can have other kinds of costs.