←back to thread

173 points daviducolo | 1 comments | | HN request time: 0.205s | source
Show context
johnisgood ◴[] No.43335889[source]
Those CPU features (AVX2 and whatnot) need to be detected at runtime, too, however.

Those ifdefs only detect if the compiler supports them, i.e. at build-time only.

So... your program only compiles with AVX2 and others if the compiler supports them; so you should compile where the compiler has all those features (because you want everything to be compiled into one executable, of course), and then use runtime checks to make sure the CPU on which the program is run has actually support for AVX2, for example, as it can select the best implementation based on the available CPU features.

To make things a bit more complicated, let me quote a part from one of the projects he has: "The detection is performed at configure time through both CPUID flags and actual instruction execution tests on the host machine, verifying support in both the CPU and operating system.". Currently what you are doing is the "OS", or rather, compiler, since you are using only macro definitions.

Once you add this, then "Automatically leverages SSE4.2 and AVX2 instructions when available for maximum throughput." from the list of features on the website will be correct / accurate.

If interested, someone I know (or rather, follow) has a single header file for detecting CPU features at runtime (for C), and he also has a build-time detection one, but that has much more features.

replies(2): >>43339194 #>>43339808 #
doctorsher ◴[] No.43339808[source]
I am interested in the CPU intrinsics detection in a single header file, if you don’t mind dropping the link.
replies(2): >>43340702 #>>43341726 #
1. johnisgood ◴[] No.43341726[source]
https://zolk3ri.name/cgit/cpudetect/about/ (direct link to the header file: https://zolk3ri.name/cgit/cpudetect/tree/cpudetect.h)

If you are unsure about something, or it misses something you'd like added, feel free to e-mail the author. I am in touch with him at times and he seems friendly.

I remember he wanted to implement more than just detecting CPU instructions, but he did not want to make it bloated with checking for cache sizes and whatnot (can't remember the details).