Faster Argmin on Floats

(algorithmiker.github.io)

19 points return_to_monke | 1 comments | 18 Sep 25 16:20 UTC | HN request time: 0.251s | source

Show context

TheDudeMan ◴[20 Sep 25 04:36 UTC] No.45310426[source]▶

How fast if you write a for loop and keep track of the index and value of the smallest (possibly treating them as ints)?

replies(1): >>45310547 #

nine_k ◴[20 Sep 25 05:04 UTC] No.45310547[source]▶

>>45310426 #

I hazard to guess that it would be the same, because the compiler would produce a loop out of .iter(), would expose the loop index via .enumerate(), and would keep track of that index in .min_by(). I suppose the lambda would be inlined, maybe even along with comparisons.

I wonder could that be made faster by using AVX instructions; they allow to find the minimum value among several u32 values, but not immediately its index.

replies(3): >>45311141 #>>45311388 #>>45311983 #

1. anonymoushn ◴[20 Sep 25 07:05 UTC] No.45311141[source]▶

>>45310547 #

you can have some vector registers n_acc, ns, idx_acc, idxs, then you can do

  // (initialize ns and idxs by reading from the array
  //  and adding the apropriate constant to the old value of idxs.)
  n_acc = min(n_acc, ns);
  const is_new_min = eq(n_acc, ns);
  idx_acc = blend(idx_acc, idxs, is_new_min);

Edit: I wrote this with min, eq, blend but you can actually use cmpgt, min, blend to avoid having a dependency chain through all three instructions. I am just used to using min, eq, blend because of working on unsigned values that don't have cmpgt

you can consult the list of toys here: https://www.intel.com/content/www/us/en/docs/intrinsics-guid...

↑