←back to thread

420 points gnabgib | 5 comments | | HN request time: 0.913s | source
Show context
drewg123 ◴[] No.44000124[source]
I tend to be of the opinion that for modern general purpose CPUs in this era, such micro-optimizations are totally unnecessary because modern CPUs are so fast that instructions are almost free.

But do you know what's not free? Memory accesses[1]. So when I'm optimizing things, I focus on making things more cache friendly.

[1] http://gec.di.uminho.pt/discip/minf/ac0102/1000gap_proc-mem_...

replies(14): >>44000191 #>>44000255 #>>44000266 #>>44000351 #>>44000378 #>>44000418 #>>44000430 #>>44000433 #>>44000478 #>>44000639 #>>44000687 #>>44001113 #>>44001140 #>>44001975 #
1. godelski ◴[] No.44001113[source]

  > modern CPUs are so fast that instructions are almost free.
Please don't.

These things compound. You especially need to consider typical computer usage involves using more than one application at a time. There's a tragedy of the commons issue that's often ignored. It can be if you're optimizing your code (you're minimizing your share!) but it can't be if you're not.

I guarantee you we'd have a lot of faster things if people invested even a little time (these also compound :). Two great examples might be Llama.cpp and FlashAttention. Both of these have had a huge impact of people (among a number of other works) but don't get nearly the same attention as other stuff. These are popular instances but I promise you that there's a million problems like these waiting to be solved. It's just not flashy, but hey plumbers and garbagemen are pretty critical jobs too

replies(1): >>44001211 #
2. EnPissant ◴[] No.44001211[source]
You haven't refuted the parent comment at all. They asserted that instructions are insignificant, and other things, such as memory accesses, dominate.
replies(2): >>44003142 #>>44010019 #
3. windward ◴[] No.44003142[source]
They do, until you have a tough problem that's still too slow after it's cache efficient
4. godelski ◴[] No.44010019[source]

  >> You especially need to consider typical computer usage involves using more than one application at a time. There's a tragedy of the commons issue
These resources include:

  - disk/ssd/long term memory
  - RAM/System memory
  - Cache
  
  BUT ALSO
  - Registers
  - CPU Cores
  - Busses/Lanes/Bandwith
  - Locks
  - Network
My point is that I/O only dominates when you're actually acting efficiently. This is dominating in the case of measuring a single operating program.

You're forgetting that when multiple programs are running that there's a lot more going on. There's a lot more communication going on too. The caches are super tiny and in high competition. To handle interlacing all those instructions. Even a program's niceness can dramatically change total performance. This is especially true when we're talking about unoptimized programs because all those little things that the OS has to manage pile up.

Get out your computer architecture book and do a skim to refresh. Even Knuth's Book (s)[1] discuss much of this because to write good programs you gotta understand the environment they're running in. Otherwise I'd be like trying to build a car but not knowing if you're building it for the city, Antarctica, or even the moon. The environment is critical to the assumptions you can make.

[0] https://en.wikipedia.org/wiki/Nice_(Unix)

[1] https://www-cs-faculty.stanford.edu/~knuth/taocp.html

replies(1): >>44011985 #
5. godelski ◴[] No.44011985{3}[source]
Actually had a good real world example today. My TV had an update a month ago and since then most of the apps don't actually work. Netflix plays with like 3 pixels, Hulu just hangs. Luckily I use the TV as a monitor 99% of the time. But I think we all know how slow a lot of these systems get. Things getting slower over time... it works fine when things ship but like I suggested, issues build over time. One app here, another there, and before you know it you're buying a new TV, computer, phone, whatever