That said, I have several times wanted to reach for LLVM intrinsics. In Rust, these are mostly available through a nightly-only feature (in std::intrinsics). One thing that potentially unlocks is "unordered" memory semantics, which are intermediate between nonatomic and relaxed atomics, in that they allow much of the optimization of the former, while also not being UB if there's a data race. In a similar vein is the LLVM "freeze" operation, which makes read from uninitialized memory into a well-defined bit pattern. There's some discussion ([1] [2], for example) of adding those to Rust proper, but it's tricky.
[1]: https://internals.rust-lang.org/t/using-llvms-unordered-read...
[2]: https://internals.rust-lang.org/t/what-if-reading-uninit-ram...
But as another data point, for something I really want to do that's not yet expressible in Rust (fp16 SIMD operations), I would rather write NEON assembly language than LLVM IR. And I am quite certain I don't want to write any of the GPU variants by hand either.