How We Found 7 TiB of Memory Just Sitting Around

(render.com)

205 points anurag | 1 comments | 30 Oct 25 18:25 UTC | HN request time: 0s | source

Show context

shanemhansen ◴[30 Oct 25 21:05 UTC] No.45765342[source]▶

>>45763359 (OP) #

The unreasonable effectiveness of profiling and digging deep strikes again.

replies(1): >>45776616 #

hinkley ◴[31 Oct 25 20:58 UTC] No.45776616[source]▶

>>45765342 #

The biggest tool in the performance toolbox is stubbornness. Without it all the mechanical sympathy in the world will go unexploited.

There’s about a factor of 3 improvement that can be made to most code after the profiler has given up. That probably means there are better profilers than could be written, but in 20 years of having them I’ve only seen 2 that tried. Sadly I think flame graphs made profiling more accessible to the unmotivated but didn’t actually improve overall results.

replies(4): >>45777180 #>>45777265 #>>45777691 #>>45783146 #

zahlman ◴[31 Oct 25 22:01 UTC] No.45777180[source]▶

>>45776616 #

> The biggest tool in the performance toolbox is stubbornness. Without it all the mechanical sympathy in the world will go unexploited.

The sympathy is also needed. Problems aren't found when people don't care, or consider the current performance acceptable.

> There’s about a factor of 3 improvement that can be made to most code after the profiler has given up. That probably means there are better profilers than could be written, but in 20 years of having them I’ve only seen 2 that tried.

It's hard for profilers to identify slowdowns that are due to the architecture. Making the function do less work to get its result feels different from determining that the function's result is unnecessary.

replies(1): >>45777886 #

hinkley ◴[31 Oct 25 23:36 UTC] No.45777886[source]▶

>>45777180 #

Architecture, cache eviction, memory bandwidth, thermal throttling.

All of which have gotten perhaps an order of magnitude worse in the time since I started on this theory.

replies(2): >>45780225 #>>45782637 #

zahlman ◴[01 Nov 25 15:53 UTC] No.45782637{3}[source]▶

>>45777886 #

I meant architecture of the codebase, to be clear. (I'm sure that the increasing complexity of hardware architecture makes it harder to figure out how to write optimal code, but it isn't really degrading the performance of naive attempts, is it?)

replies(1): >>45783585 #

1. hinkley ◴[01 Nov 25 17:38 UTC] No.45783585{4}[source]▶

>>45782637 #

The problem Windows had during its time of fame is the developers always had the fastest machines money could buy. That decreased the code-build-test cycle for them, but it also made it difficult for the developers to visualize how their code would run on normal hardware. Add the general lack of empathy inspired by their toxic corporate culture of “we are the best in the world” and its small wonder why windows, 95 and 98 ran more and more dogshit on older hardware.

My first job out of college, I got handed the slowest machine they had. The app was already half done and was dogshit slow even with small data sets. I was embarrassed to think my name would be associated with it. The UI painted so slowly I could watch the individual lines paint on my screen.

My friend and I in college had made homework into a game of seeing who could make their homework assignment run faster or using less memory. Such as calculating the Fibonacci of 100, or 1000. So I just started applying those skills and learning new ones.

For weeks I evaluated improvements to the code by saying “one Mississippi, two Mississippi”. Then how many syllables I got through. Then the stopwatch function on my watch. No profilers, no benchmarking tools, just code review.

And that’s how my first specialization became optimization.

↑