Also an overlooked part here is that the global Erlang GC is easier to parallellize and/or keep incremental since it won't have object cycles sans PID's (that probably have special handling anyhow).
TlDr; GC's become way harder as soon as you have cyclic objects, Erlang avoids it and thus parts of it being good is more about Erlang being "simple".
But that's separate from per process GC. Per process GC is possible because processes don't share memory[1], so each process can compact its own memory without coordination with other processes. GC becomes stop the process, not stop the world, and it's effectively preemptable, so one process doing a lot of GC will not block other processes from getting cpu time.
Also, per process GC enables a pattern where a well tuned short lived process is spawned to do some work, then die, and all its garbage can be thrown away without a complex collection. With shared GC, it can be harder to avoid the impact of short lived tasks on the overall system.
[1] yes yes, shared refcounted binaries, which are allocated separately from process memory.