←back to thread

480 points jedeusus | 2 comments | | HN request time: 0s | source
Show context
nopurpose ◴[] No.43540684[source]
Every perf guide recommends to minimize allocations to reduce GC times, but if you look at pprof of a Go app, GC mark phase is what takes time, not GC sweep. GC mark always starts with known live roots (goroutine stacks, globals, etc) and traverse references from there colouring every pointer. To minimize GC time it is best to avoid _long living_ allocations. Short lived allocations, those which GC mark phase will never reach, has almost neglible effect on GC times.

Allocations of any kind have an effect on triggering GC earlier, but in real apps it is almost hopeless to avoid GC, except for very carefully written programs with no dependenciesm, and if GC happens, then reducing GC mark times gives bigger bang for the buck.

replies(12): >>43540741 #>>43541092 #>>43541624 #>>43542081 #>>43542158 #>>43542596 #>>43543008 #>>43544950 #>>43545084 #>>43545500 #>>43551041 #>>43551691 #
Capricorn2481 ◴[] No.43541092[source]
Aren't allocations themselves pretty expensive regardless of GC?
replies(2): >>43541302 #>>43541882 #
nu11ptr ◴[] No.43541302[source]
Go allocations aren't that bad. A few years ago I benchmarked them at about 4x as expensive as a bump allocation. That is slow enough to make an arena beneficial in high allocation situations, but fast enough to not make it worth it most of the time.
replies(1): >>43545083 #
aktau ◴[] No.43545083[source]
Comparing with a fairly optimized malloc at $COMPANY, the Go allocator is (both in terms of relative cycles and fraction of cycles of all Go programs) significantly more expensive than the C/C++ counterpart (3-4x IIRC). For one, it has to do more work, like setting up GC metadata, and zeroing.

There have recently been some optimizations to `runtime.mallocgc`, which may have decrease that 3-4x estimate a bit.

replies(1): >>43546160 #
1. nu11ptr ◴[] No.43546160[source]
How can that be true? If it is 3-4x more expensive than malloc, then per my measurements your malloc is a bump allocator, and that simply isn't true for any real world malloc implementation (typically a modified free list allocator afaik). `mallocgc` may not be fast, but I simply did not find it as slow as you are saying. My guess is it is about as fast as most decent malloc functions, but I have not measured, and it would be interesting to see a comparison (tough to do as you'd need to call malloc via CGo or write one in C and one in Go and trust the looping is roughly the same cost).
replies(1): >>43554728 #
2. aktau ◴[] No.43554728[source]
I should correct and clarify: I meant 3-4x more expensive in relative terms. Meaning:

  - For C++ programs, the allocator (allocating+freeing) consumes roughly 5%  of cycles.
  - For Go programs, the allocator (runtime.mallocgc) used to consume ~20% of cycles (this is the data I referenced). I checked and recently it's become closer to 15%, thanks to optimizations.
I have not tested the performance differential on a per-byte level (though that will also differ with object structure in Go).