←back to thread

173 points daviducolo | 5 comments | | HN request time: 0.825s | source
1. johnisgood ◴[] No.43335889[source]
Those CPU features (AVX2 and whatnot) need to be detected at runtime, too, however.

Those ifdefs only detect if the compiler supports them, i.e. at build-time only.

So... your program only compiles with AVX2 and others if the compiler supports them; so you should compile where the compiler has all those features (because you want everything to be compiled into one executable, of course), and then use runtime checks to make sure the CPU on which the program is run has actually support for AVX2, for example, as it can select the best implementation based on the available CPU features.

To make things a bit more complicated, let me quote a part from one of the projects he has: "The detection is performed at configure time through both CPUID flags and actual instruction execution tests on the host machine, verifying support in both the CPU and operating system.". Currently what you are doing is the "OS", or rather, compiler, since you are using only macro definitions.

Once you add this, then "Automatically leverages SSE4.2 and AVX2 instructions when available for maximum throughput." from the list of features on the website will be correct / accurate.

If interested, someone I know (or rather, follow) has a single header file for detecting CPU features at runtime (for C), and he also has a build-time detection one, but that has much more features.

replies(2): >>43339194 #>>43339808 #
2. CyberDildonics ◴[] No.43339194[source]
It may have SIMD features hard coded at compile time, might not have regular expressions (even though re means regular expression in grep) and the benchmarks might be so short that they are measuring process startup time (as well as mixing SIMD features and algorithms when measuring time), but that landing page is slick.
3. doctorsher ◴[] No.43339808[source]
I am interested in the CPU intrinsics detection in a single header file, if you don’t mind dropping the link.
replies(2): >>43340702 #>>43341726 #
4. ashvardanian ◴[] No.43340702[source]
Here is the code I use to detect available CPU features at runtime and dispatch the correct kernels - https://github.com/ashvardanian/StringZilla/blob/main/c/lib....

There are also several AVX-512 kernels for different CPU generations, so it’s obviously longer than just checking for AVX.

5. johnisgood ◴[] No.43341726[source]
https://zolk3ri.name/cgit/cpudetect/about/ (direct link to the header file: https://zolk3ri.name/cgit/cpudetect/tree/cpudetect.h)

If you are unsure about something, or it misses something you'd like added, feel free to e-mail the author. I am in touch with him at times and he seems friendly.

I remember he wanted to implement more than just detecting CPU instructions, but he did not want to make it bloated with checking for cache sizes and whatnot (can't remember the details).