←back to thread

1764 points fatihky | 1 comments | | HN request time: 0.247s | source
Show context
jblow ◴[] No.12701702[source]
#9 is especially stupid because it's so context-dependent. SSE4 gives you a popcount instruction, for example, which would be easily the fastest way to do this, if available.
replies(2): >>12702641 #>>12702679 #
greyman ◴[] No.12702679[source]
Yes, but without that instruction, the algorithm mentioned by recruiter is really the quickest way. I coded chess algorithm in the past and this was exactly the method top chess open-source engines used. But imho it is hard to figure that out without prior experience with this problem.
replies(3): >>12703390 #>>12703834 #>>12708561 #
1. Const-me ◴[] No.12708561[source]
No, the algorithm mentioned by recruiter is among the slowest ways.

I recently tested different approaches. I’ve been working on some code that downsamples large set of 1 bit voxels to get shades of gray on the edges. For that, I had to counts gigabytes of those bits as fast as possible.

Advanced manually-vectorized SIMD code worked several times faster, esp. on the hardware that supports SSSE3 or XOP instructions.

And even when the hardware doesn’t have SSE4, doesn’t have SSSE3, doesn’t have XOP — SSE2-only backup plan is still faster than lookup tables. Here’s the code: http://stackoverflow.com/a/17355341/126995