A leap year check in three instructions

(hueffner.de)

434 points gnabgib | 2 comments | 15 May 25 21:57 UTC | HN request time: 0.447s | source

Show context

drewg123 ◴[15 May 25 22:47 UTC] No.44000124[source]▶

I tend to be of the opinion that for modern general purpose CPUs in this era, such micro-optimizations are totally unnecessary because modern CPUs are so fast that instructions are almost free.

But do you know what's not free? Memory accesses[1]. So when I'm optimizing things, I focus on making things more cache friendly.

[1] http://gec.di.uminho.pt/discip/minf/ac0102/1000gap_proc-mem_...

replies(14): >>44000191 #>>44000255 #>>44000266 #>>44000351 #>>44000378 #>>44000418 #>>44000430 #>>44000433 #>>44000478 #>>44000639 #>>44000687 #>>44001113 #>>44001140 #>>44001975 #

PaulKeeble ◴[15 May 25 23:27 UTC] No.44000378[source]▶

>>44000124 #

The thing is about these optimisations (assuming they test as higher performance) is that they can get applied in a library and then everyone benefits from the speedup that took some hard graft to work out. Very few people bake their own date API nowadays if they can avoid it since it already exists and techniques like this just speed up every programme whether its on the critical path or not.

replies(1): >>44000391 #

codexb ◴[15 May 25 23:30 UTC] No.44000391[source]▶

>>44000378 #

That's basically compilers these days. It used to be that you could try and optimize your code, inline things here and there, but these days, you're not going to beat the compiler optimization.

replies(6): >>44000550 #>>44000584 #>>44000692 #>>44000889 #>>44000980 #>>44001055 #

1. kragen ◴[16 May 25 00:27 UTC] No.44000692[source]▶

>>44000391 #

That is a meme that people repeat a lot, but it turns out to be wrong:

https://cr.yp.to/talks/2015.04.16/slides-djb-20150416-a4.pdf (though see https://blog.regehr.org/archives/1515: "This piece (...) explains why Daniel J. Bernstein’s talk, The death of optimizing compilers (audio [http://cr.yp.to/talks/2015.04.16/audio.ogg]) is wrong", citing https://news.ycombinator.com/item?id=9397169)

https://blog.royalsloth.eu/posts/the-compiler-will-optimize-...

http://lua-users.org/lists/lua-l/2011-02/msg00742.html

https://web.archive.org/web/20150213004932/http://x264dev.mu...

replies(1): >>44007896 #

2. kragen ◴[16 May 25 17:25 UTC] No.44007896[source]▶

>>44000692 (TP) #

I guess I should mention that https://blog.regehr.org/archives/1515 doesn't dispute that people can pretty much always beat the shit out of optimizing compilers; Regehr explicitly says, "of course there’s plenty of hot code that wants to be optimized by hand." Rather, where he disagrees is whether it's worthwhile to use optimizing compilers for other code.

Daniel Berlin's https://news.ycombinator.com/item?id=9397169 does kind of disagree, saying, "If GCC didn't beat an expert at optimizing interpreter loops, it was because they didn't file a bug and give us code to optimize," but his actual example is the CPython interpreter loop, which is light-years from the kind of hand-optimized assembly interpreter Mike Pall's post is talking about, and moreover it wasn't feeding an interpreter loop to GCC but rather replacing interpretation with run-time compilation. Mostly what he disagrees about is the same thing Regehr disagrees about: whether there's enough code in the category of "not worth hand-optimizing but still runs often enough to matter", not whether you can beat a compiler by hand-optimizing your code. On the contrary, he brings up whole categories of code where compilers can't hope to compete with hand-optimization, such as numerical algorithms where optimization requires sacrificing numerical stability. mpweiher's comment in response discusses other scenarios where compilers can't hope to compete, like systems-level optimization.

It's worth reading the comments by haberman and Mike Pall in the HN thread there where they correct Berlin about LuaJIT, and kjksf also points out a number of widely-used libraries that got 2–4× speedups over optimized C by hand-optimizing the assembly: libjpeg-turbo, Skia, and ffmpeg. It'd be interesting to see if the intervening 10 years have changed the situation, because GCC and LLVM have improved in that time, but I doubt they've improved by even 50%, much less 300%.

↑