It depends, but generally speaking, you are wrong and OP is right that you'd want to benchmark on the actual architecture.
a) First of all, you're probably basing your answer off of experience with 64-bit popcounts. But note the question was about popcounting multiple 16-bit words, not single 64-bit words. This isn't typically what you do in a chessprogram. b) The table has a cache footprint and can be pushed out of L1, which kills that approach in many real programs. c) Modern CPUs have a POPCOUNT instruction. It's slow and limited to one port on most Intel machines, though, so not necessarily always a win either. d) Lacking POPCOUNT, and with cache pressure, the SWAR approaches are good, especially if you can compute multiple results at once. With AVX2 it becomes especially interesting. f) If the the expectation is that many of the numbers are zero, a simple loop will win.