#9 is especially stupid because it's so context-dependent. SSE4 gives you a popcount instruction, for example, which would be easily the fastest way to do this, if available.
replies(2):
https://gist.github.com/monocasa/1d44a03cbd0170bfffc6a4a5c37...