←back to thread

170 points judicious | 1 comments | | HN request time: 0.201s | source
Show context
sfilmeyer ◴[] No.45406379[source]
I enjoyed reading the article, but I'm pretty thrown by the benchmarks and conclusion. All of the times are reported to a single digit of precision, but then the summary is claiming that one function shows an improvement while the other two are described as negligible. When all the numbers presented are "~5ms" or "~6ms", it doesn't leave me confident that small changes to the benchmarking might have substantially changed that conclusion.
replies(2): >>45406733 #>>45407659 #
gizmo686 ◴[] No.45407659[source]
Yeah. When your timing results are a single digit multiple of your timing precision, that is a good indication you either need a longer test, or a more precise clock.

At a 5ms baseline with millisecond precision, the smallest improvement you can measure is 20%. And you cannot distinguish a 20% speedup with a 20% slowdown that happened to get luck with clock ticks.

For what it is worth, I ran the provided test code on my machine with a 100x increase in iterations and got the following:

  == Benchmarking ABS ==
  ABS (branch):     0.260 sec
  ABS (branchless): 0.264 sec

  == Benchmarking CLAMP ==
  CLAMP (branch):     0.332 sec 
  CLAMP (branchless): 0.538 sec

  == Benchmarking PARTITION ==
  PARTITION (branch):     0.043 sec
  PARTITION (branchless): 0.091 sec
Which is not exactly encouraging (gcc 13.3.0, -ffast-math -march=native. I did not use the -fomit-this-entire-function flag, which my compiler does not understand).

I had to drop down to O0 to see branchless be faster in any case:

  == Benchmarking ABS ==
  ABS (branch):     0.743 sec
  ABS (branchless): 0.948 sec

  == Benchmarking CLAMP ==
  CLAMP (branch):     4.275 sec
  CLAMP (branchless): 1.429 sec

  == Benchmarking PARTITION ==
  PARTITION (branch):     0.156 sec
  PARTITION (branchless): 0.164 sec
replies(2): >>45408287 #>>45418236 #
1. Roxxik ◴[] No.45408287[source]
I also tried myself, on different array sizes, with more iterations. The branchy version is not strictly worse.

https://gist.github.com/Stefan-JLU/3925c6a73836ce841860b55c8...