Fil's Unbelievable Garbage Collector

1. gleenn ◴[05 Sep 25 04:20 UTC] No.45134958[source]▶

> The only "pause" threads experience is the callback executed in response to the soft handshake, which does work bounded by that thread's stack height.

So this is probably not great for functional/deeply-recursive code I guess?

replies(1): >>45134968 #

2. pizlonator ◴[05 Sep 25 04:22 UTC] No.45134968[source]▶

>>45134958 (TP) #

Meh.

The stack scan is really fast. There's not a lot of logic in there. If you max out the stack height limit (megabytes of stack?) then maybe that means milliseconds of work to scan that stack. That's still not bad.

replies(1): >>45134985 #

3. adastra22 ◴[05 Sep 25 04:28 UTC] No.45134985[source]▶

>>45134968 #

That's a very long time. Milliseconds of work is an entire frame update-render cycle in a modern game.

replies(3): >>45135001 #>>45135013 #>>45135161 #

4. pizlonator ◴[05 Sep 25 04:33 UTC] No.45135001{3}[source]▶

>>45134985 #

Would your modern game have a stack that is so deep that it brushes up against the stack height limit?

Probably not. Your game would be inches of stack away from crashing

replies(1): >>45135797 #

5. munificent ◴[05 Sep 25 04:38 UTC] No.45135013{3}[source]▶

>>45134985 #

Games don't tend to have very deep callstacks. And if a game cared about performance also wanted to use GC, it would probably try to run the GC at the end of a frame when there is little on the stack.

replies(2): >>45135039 #>>45135228 #

6. pizlonator ◴[05 Sep 25 04:43 UTC] No.45135039{4}[source]▶

>>45135013 #

Yeah UE GC safepoints at end of tick where there is no stack. That’s a common trick in systems that have both GC and ticking.

To be fair, FUGC doesn’t currently let you do that. The GC runs in a separate thread and soft handshakes at various points, which cause your game thread to react at poll checks and exits that might not be at end of tick.

But I could add a feature that lets you to force handshake responses to be at end of tick! That sounds like a good idea

7. kragen ◴[05 Sep 25 05:08 UTC] No.45135161{3}[source]▶

>>45134985 #

Latency-sensitive programs like games are usually careful to avoid deep recursion.

8. adastra22 ◴[05 Sep 25 05:26 UTC] No.45135228{4}[source]▶

>>45135013 #

FUGC runs the GC in a separate thread and you don’t have a lot of control over when it interrupts.

9. debugnik ◴[05 Sep 25 07:11 UTC] No.45135797{4}[source]▶

>>45135001 #

You're missing the point, they're giving an example of an entire workload that fits into your technique's worst-case overhead. It's could be the right trade-off and rarely be hit, but that worst-case does sound bad.

replies(4): >>45136238 #>>45136432 #>>45136439 #>>45139327 #

10. kristofferc ◴[05 Sep 25 08:20 UTC] No.45136238{5}[source]▶

>>45135797 #

Actually, it sounds quite ok.

11. adastra22 ◴[05 Sep 25 08:53 UTC] No.45136432{5}[source]▶

>>45135797 #

^ this was the intent of the example.

12. torginus ◴[05 Sep 25 08:54 UTC] No.45136439{5}[source]▶

>>45135797 #

From what he describes, he uses stack maps to tell which stack values are pointers. He can skip over everything that's not a pointer.

On x86_64 you need about 10k function deep stack, all of them with the 14 GPs filled with pointers -to have an 1MB stack.

replies(1): >>45139728 #

13. pizlonator ◴[05 Sep 25 14:57 UTC] No.45139327{5}[source]▶

>>45135797 #

Stacks tend to be small enough that the cost of scanning them is minuscule.

(I’m not trying to BS my way here - I’m explaining the reason why on the fly GC optimization almost never involves doing stuff about the time it takes to scan stack. It’s just not worth it. If it was, we’d be seeing a lot of stacklet type optimizations.)

14. pizlonator ◴[05 Sep 25 15:31 UTC] No.45139728{6}[source]▶

>>45136439 #

To play devil's advocate, the suckiest part about stack scanning is that it's a linked list walk. It's not a linear scan. So it's all pointer chasing. And it's very likely to find previously unmarked pointers, which involves CAS and other work.

(It would be a linear scan if I was just conservatively scanning, but then I'd have other problems.)

This is one of the most counterintuitive parts of GC perf! You'd think that the stack scan had to be a bottleneck, and it might even be one in some corner cases. But it's just not the longest pole in the tent most of the time, because you're so unlikely to actually have a 1MB stack, and programs that do have a 1MB stack tend to also have ginormous heaps (like many gigabytes), which then means that even if the stack scan is a problem it's not the problem.

replies(1): >>45143641 #

15. kragen ◴[05 Sep 25 21:08 UTC] No.45143641{7}[source]▶

>>45139728 #

You're writing the compiler, though, so you can define the stack layout. If the stack-scanning linked-list walk were the long pole, it wouldn't be hard to eliminate the pointer chasing: your procedure prologue could add a pointer to each newly pushed stack frame to something like a std::deque, then pop it off in the epilogue.

I don't know, maybe the fact that I'm disagreeing with someone who knows a lot more than I do about the issue should be a warning sign that I'm probably wrong?