←back to thread

420 points gnabgib | 1 comments | | HN request time: 0.202s | source
Show context
drewg123 ◴[] No.44000124[source]
I tend to be of the opinion that for modern general purpose CPUs in this era, such micro-optimizations are totally unnecessary because modern CPUs are so fast that instructions are almost free.

But do you know what's not free? Memory accesses[1]. So when I'm optimizing things, I focus on making things more cache friendly.

[1] http://gec.di.uminho.pt/discip/minf/ac0102/1000gap_proc-mem_...

replies(14): >>44000191 #>>44000255 #>>44000266 #>>44000351 #>>44000378 #>>44000418 #>>44000430 #>>44000433 #>>44000478 #>>44000639 #>>44000687 #>>44001113 #>>44001140 #>>44001975 #
PaulKeeble ◴[] No.44000378[source]
The thing is about these optimisations (assuming they test as higher performance) is that they can get applied in a library and then everyone benefits from the speedup that took some hard graft to work out. Very few people bake their own date API nowadays if they can avoid it since it already exists and techniques like this just speed up every programme whether its on the critical path or not.
replies(1): >>44000391 #
codexb ◴[] No.44000391[source]
That's basically compilers these days. It used to be that you could try and optimize your code, inline things here and there, but these days, you're not going to beat the compiler optimization.
replies(6): >>44000550 #>>44000584 #>>44000692 #>>44000889 #>>44000980 #>>44001055 #
1. ryao ◴[] No.44000584[source]
Meanwhile, GCC will happily implement bsearch() without cmov instructions and the result will be slower than a custom implementation on which it emits cmov instructions. I do not believe anyone has filed a bug report specifically about the inefficient bsearch(), but the bug report I filed a few years ago on inefficient code generation for binary search functions is still open, so I see no point in bothering:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110001

Binary searches on OpenZFS B-Tree nodes are faster in part because we did not wait for the compiler:

https://github.com/openzfs/zfs/commit/677c6f8457943fe5b56d7a...

Eliminating comparator function overhead via inlining is also a part of the improvement, which we would not have had because the OpenZFS code is not built with LTO, so even if the compiler fixes that bug, the patch will still have been useful.