Also an overlooked part here is that the global Erlang GC is easier to parallellize and/or keep incremental since it won't have object cycles sans PID's (that probably have special handling anyhow).
TlDr; GC's become way harder as soon as you have cyclic objects, Erlang avoids it and thus parts of it being good is more about Erlang being "simple".
This may be true only for some implementations. Good GC implementations operate on the concept of object graph roots. Whether the graph has cyclic references or not is irrelevant as the GC scans the relevant memory linearly. As long as the graph is unrooted, such GC implementations are able to still easily collect it (or, to be more precise, ignore it - the generational moving GCs the cost is the live objects that need to be relocated to an older/tenured generation).
The Java gc's are doing some crazy stuff, my point however was that the acyclic nature of the Erlang object graph enables them to do fairly "simple" optimizations to the GC that in practice should remove most need for pauses without hardware or otherwise expensive read barriers.
It doesn't have to do a lot of things to be good, once you have cycles you need a lot more machinery to be able to do the same things.
When it comes to Java - it has multiple GC implementations with different tradeoffs and degree of sophistication. I'm not very well versed in their details besides the fact that pretty much all of them are quite liberal with the use of host memory. So the way I approach it is by assuming that at least some of them resemble the GC implementation in .NET, given extensive evidence that under allocation-heavy scenarios they have similar (throughput) performance characteristics.
As for .NET itself, in server scenarios, it uses SRV GC which has per-core heaps (the count is sizing is now dynamically scalable per workload profile, leading to much smaller RAM footprint) and multi-threaded collection, which lends itself to very high throughput and linear scaling with cores even on very large hosts thanks to minimal contention (think 128C 1TiB RAM, stometimes you need to massage it with flags for this, but it's nowhere near the amount of ceremony required by Java).
Both SRV and WKS GC implementations use background collection for Gen2, large and pinned object heaps. Collection of Gen0 and Gen1 is pausing by design as it lends itself for much better throughput and pause times are short enough anyway.
They are short enough that modern .NET versions end up having better p99 latency than Go on multi-core throughput saturated nodes. Given decent enough codebase, you only ever need to worry about GC pause impact once you go into the territory of systems with hard realtime requirements. One of the better practical examples of this that exists in open source is Osu! which must run its game loop 1000hz - only 1ms of budget! This does pose challenges and requires much more hands-on interaction with GC like dynamically switching GC behaviopr depending on scenario: https://github.com/dotnet/runtime/issues/96213#issuecomment-... This, however, would be true with any language with automatic memory management, if it's possible to implement such a system in it in the first place.
Small things like tuples (combining multiple values in outputs) and out parameters relaxes the burden on the runtime since programmers don't need to create objects just to send several things back out from a function.
But the real kicker probably comes since the lowlevel components, be it the Http server with ValueTasks and the C# dictionary type getting memory savings just by having struct types and proper generics. I remember reading some article from years ago about re-writing the C# dictionary class that they reduced memory allocations by something like 90%.
On allocation traffic, I doubt average code in C# allocates less than Go - the latter puts quite a lot of emphasis on plain structs, and because Go has very poor GC throughput, the only way to explain tolerable performance in the common case is that Go still allocates less. Of course this will change now that more teams adopt Go and start classic interface spam and write abstractions that box structs into interfaces to cope with inexpressive and repetition-heavy nature of Go.
Otherwise, both .NET and Java GC implementations are throughput-focused, even the ones that target few-core smaller applications, while Go GC focuses on low to moderate allocation traffic on smaller hosts with consistent performance, and regresses severely when its capacity to reclaim memory in time is exceeded. You can expect from ~4 up to ~16-32x and more (SRV GC scales linearly with cores) difference in maximum allocation throughput between Go and .NET: https://gist.github.com/neon-sunset/c6c35230e75c89a8f6592cac...