There’s about a factor of 3 improvement that can be made to most code after the profiler has given up. That probably means there are better profilers than could be written, but in 20 years of having them I’ve only seen 2 that tried. Sadly I think flame graphs made profiling more accessible to the unmotivated but didn’t actually improve overall results.
If you see a database query that takes 1 hour to run, and only touches a few gb of data, you should be thinking "Well nvme bandwidth is multiple gigabytes per second, why can't it run in 1 second or less?"
The idea that anyone would accept a request to a website taking longer than 30ms, (the time it takes for a game to render it's entire world including both the CPU and GPU parts at 60fps) is insane, and nobody should really accept it, but we commonly do.
Uber could run the complete global rider/driver flow from a single server.
It doesn't, in part because all of those individual trips earn $1 or more each, so it's perfectly acceptable to the business to be more more inefficient and use hundreds of servers for this task.
Similarly, a small website taking 150ms to render the page only matters if the lost productivity costs less than the engineering time to fix it, and even then, only makes sense if that engineering time isn't more productively used to add features or reliability.
The reality is most of those requests (now) get mixed in with a firehose of traffic, and could be served much faster than 16ms if that is all that was going on. But it’s never all that is going on.