←back to thread

420 points gnabgib | 1 comments | | HN request time: 0.206s | source
Show context
drewg123 ◴[] No.44000124[source]
I tend to be of the opinion that for modern general purpose CPUs in this era, such micro-optimizations are totally unnecessary because modern CPUs are so fast that instructions are almost free.

But do you know what's not free? Memory accesses[1]. So when I'm optimizing things, I focus on making things more cache friendly.

[1] http://gec.di.uminho.pt/discip/minf/ac0102/1000gap_proc-mem_...

replies(14): >>44000191 #>>44000255 #>>44000266 #>>44000351 #>>44000378 #>>44000418 #>>44000430 #>>44000433 #>>44000478 #>>44000639 #>>44000687 #>>44001113 #>>44001140 #>>44001975 #
andrepd ◴[] No.44000266[source]
> I tend to be of the opinion that for modern general purpose CPUs in this era, such micro-optimizations are totally unnecessary because modern CPUs are so fast that instructions are almost free.

What does this mean? Free? Optimisations are totally unnecessary because... instructions are free?

The implementation in TFA is probably on the order of 5x more efficient than a naive approach. This is time and energy as well. I don't understand what "free" means in this context.

Calendar operations are performed probably trillions of times every second across all types of computers. If you can make them more time- and energy-efficient, why wouldn't you?

If there's a problem with modern software it's too much bloat, not too much optimisation.

replies(2): >>44000312 #>>44000329 #
drewg123 ◴[] No.44000329[source]
If this is indeed done trillions of times a second, which I frankly have a hard time believing, then sure, it might be worth it. But on a modern CPU, focusing on an optimization like this is a poor use of developer resources. There are likely several other optimizations related to cache locality that you could find in less time than it would take to do this, and those other optimizations would probably give several orders of magnitude more improvement.

Not to mention that the final code is basically a giant WTF for anybody reading it. It will be an attractive nuisance that people will be drawn to, like moths to a flame, any time there is a bug around calendar operations.

replies(2): >>44000402 #>>44000448 #
1. andrepd ◴[] No.44000448[source]
> There are likely several other optimizations related to cache locality that you could find in less time than it would take to do this, and those other optimizations would probably give several orders of magnitude more improvement.

How is cache / memory access relevant in a subroutine that performs a check on a 16bit number?

> Not to mention that the final code is basically a giant WTF for anybody reading it. It will be an attractive nuisance that people will be drawn to, like moths to a flame, any time there is a bug around calendar operations.

1: comments are your friend

2: a unit test can assert that this function is equivalent to the naive one in about half a millisecond.