←back to thread

66 points melenaboija | 2 comments | | HN request time: 0.407s | source
1. rrhjm53270 ◴[] No.41863308[source]
Thank you for sharing such an interesting work.

A little comment: adding some more aggressive optimization optimization options to simd C++ code to see the performance difference.

On my side with a AMD Ryzen 9 7900X3D CPU, I have

- 0.0592569 ms for `-O3 -march=native` option, and - 1.7741e-05 ms for `-funsafe-math-optimizations -Ofast -flto=auto -pipe -march=native`

replies(1): >>41863840 #
2. Const-me ◴[] No.41863840[source]
OP’s benchmark doesn’t use the return value from the function being benchmarked. When C++ compilers are asked to optimize such code, they often optimize away the whole function.

Pretty sure your 17.7 nanoseconds result had the whole function optimized away. Workarounds are tricky and compiler-specific.