How fast if you write a for loop and keep track of the index and value of the smallest (possibly treating them as ints)?
replies(1):
I wonder could that be made faster by using AVX instructions; they allow to find the minimum value among several u32 values, but not immediately its index.
// (initialize ns and idxs by reading from the array
// and adding the apropriate constant to the old value of idxs.)
n_acc = min(n_acc, ns);
const is_new_min = eq(n_acc, ns);
idx_acc = blend(idx_acc, idxs, is_new_min);
Edit: I wrote this with min, eq, blend but you can actually use cmpgt, min, blend to avoid having a dependency chain through all three instructions. I am just used to using min, eq, blend because of working on unsigned values that don't have cmpgtyou can consult the list of toys here: https://www.intel.com/content/www/us/en/docs/intrinsics-guid...