#9 is especially stupid because it's so context-dependent. SSE4 gives you a popcount instruction, for example, which would be easily the fastest way to do this, if available.
 replies(2): 
https://gist.github.com/monocasa/1d44a03cbd0170bfffc6a4a5c37...