←back to thread

49 points melenaboija | 2 comments | | HN request time: 0s | source
Show context
QuadmasterXLII ◴[] No.41852483[source]
I’m really surprised by the performance of the plain C++ version. Is automatic vectorization turned off? Frankly this task is so common that I would half expect compilers to have a hard coded special case specifically for fast dot products

Edit: Yeah, when I compile the “plain c++” with clang the main loop is 8 vmovups, 16 vfmadd231ps, and an add cmp jne. OP forgot some flags.

replies(1): >>41853866 #
mshockwave ◴[] No.41853866[source]
which flags did you use and which compiler version?
replies(1): >>41853882 #
QuadmasterXLII ◴[] No.41853882[source]
clang 19, -O3 -ffast-math -march=native
replies(1): >>41853962 #
mshockwave ◴[] No.41853962[source]
can confirm fast math makes the biggest difference
replies(2): >>41854114 #>>41854399 #
1. QuadmasterXLII ◴[] No.41854399{3}[source]
I feel like I’m kinda being the bad aunt by encouraging -ffast-math. It can definitely break some things (i.e. https://pspdfkit.com/blog/2021/understanding-fast-math/ ) but I use it habitually and I’m fine so clearly it’s safe.
replies(1): >>41854704 #
2. magicalhippo ◴[] No.41854704[source]
> It can definitely break some things

I recall it totally fudged up the ray-axis aligned bounding box intersection routine in the raytracer I worked on. The routine relied on infinities being handled correctly, and -ffast-math broke that.

I see the linked article goes into that aspect in detail, wish I had it back then.

IIRC we ended up disabling it for just that file, as it did speed up the rest my a fair bit.