How We Found 7 TiB of Memory Just Sitting Around

(render.com)

205 points anurag | 1 comments | 30 Oct 25 18:25 UTC | HN request time: 0.201s | source

Show context

shanemhansen ◴[30 Oct 25 21:05 UTC] No.45765342[source]▶

>>45763359 (OP) #

The unreasonable effectiveness of profiling and digging deep strikes again.

replies(1): >>45776616 #

hinkley ◴[31 Oct 25 20:58 UTC] No.45776616[source]▶

>>45765342 #

The biggest tool in the performance toolbox is stubbornness. Without it all the mechanical sympathy in the world will go unexploited.

There’s about a factor of 3 improvement that can be made to most code after the profiler has given up. That probably means there are better profilers than could be written, but in 20 years of having them I’ve only seen 2 that tried. Sadly I think flame graphs made profiling more accessible to the unmotivated but didn’t actually improve overall results.

replies(4): >>45777180 #>>45777265 #>>45777691 #>>45783146 #

zahlman ◴[31 Oct 25 22:01 UTC] No.45777180[source]▶

>>45776616 #

> The biggest tool in the performance toolbox is stubbornness. Without it all the mechanical sympathy in the world will go unexploited.

The sympathy is also needed. Problems aren't found when people don't care, or consider the current performance acceptable.

> There’s about a factor of 3 improvement that can be made to most code after the profiler has given up. That probably means there are better profilers than could be written, but in 20 years of having them I’ve only seen 2 that tried.

It's hard for profilers to identify slowdowns that are due to the architecture. Making the function do less work to get its result feels different from determining that the function's result is unnecessary.

replies(1): >>45777886 #

hinkley ◴[31 Oct 25 23:36 UTC] No.45777886[source]▶

>>45777180 #

Architecture, cache eviction, memory bandwidth, thermal throttling.

All of which have gotten perhaps an order of magnitude worse in the time since I started on this theory.

replies(2): >>45780225 #>>45782637 #

1. hinkley ◴[01 Nov 25 08:58 UTC] No.45780225[source]▶

>>45777886 #

And Amdahl’s Law. Perf charts will complain about how much CPU you’re burning in the parallel parts of code and ignore that the bottleneck is down in 8% of the code that can’t be made concurrent.

↑