←back to thread

1764 points fatihky | 2 comments | | HN request time: 0.493s | source
Show context
greyman ◴[] No.12702663[source]
I once coded chess playing algorithm for fun, and can confirm that the recruiter was correct on #9: you count bits by using a lookup table and then sum the results. It's the quickest way. But I am not sure if this is possible to figure out immediately without such experience...
replies(3): >>12702773 #>>12703527 #>>12708596 #
1. gcp ◴[] No.12703527[source]
Ah, the appeal to authority.

It depends, but generally speaking, you are wrong and OP is right that you'd want to benchmark on the actual architecture.

a) First of all, you're probably basing your answer off of experience with 64-bit popcounts. But note the question was about popcounting multiple 16-bit words, not single 64-bit words. This isn't typically what you do in a chessprogram. b) The table has a cache footprint and can be pushed out of L1, which kills that approach in many real programs. c) Modern CPUs have a POPCOUNT instruction. It's slow and limited to one port on most Intel machines, though, so not necessarily always a win either. d) Lacking POPCOUNT, and with cache pressure, the SWAR approaches are good, especially if you can compute multiple results at once. With AVX2 it becomes especially interesting. f) If the the expectation is that many of the numbers are zero, a simple loop will win.

replies(1): >>12712776 #
2. bogomipz ◴[] No.12712776[source]
>"Lacking POPCOUNT, and with cache pressure, the SWAR approaches are good, especially if you can compute multiple results at once"

Can you explain what a SWAR approach is?