←back to thread

77 points klelatti | 1 comments | | HN request time: 0.214s | source
Show context
bee_rider ◴[] No.43669024[source]
For some reason (not sure why, maybe it was the discussion of portability and this fun NVIDIA not-quite-assembly language), this made me wonder: has anybody gotten really good at writing LLVM IIR? It seems fairly low level, and but also quite portable. And… I don’t know, I’m talking about a topic I don’t know much about, so I’m happy to be corrected here, but as a static-single-assignment language maybe it is… even more machine sympathetic than assembly? (I’m under the impression that writing really high performance assembly is really quite difficult, you have to keep a ton of instructions in flight at once, right?)
replies(3): >>43669297 #>>43669832 #>>43669930 #
1. raphlinus ◴[] No.43669832[source]
I read LLVM (or one of its many GPU-flavored variants) reasonably often, mostly to figure out where in the chain a shader miscompilation is happening. But I've never personally had to write it, and it's not easy for me to think of a use case where it would make a lot of sense. It's pretty unpleasant and fiddly, as you have to annotate all the types of the intermediate values and so on, and it doesn't have the main advantage of actual assembler: being able to reason about the performance of the code. That depends so much on the way it's compiled.

That said, I have several times wanted to reach for LLVM intrinsics. In Rust, these are mostly available through a nightly-only feature (in std::intrinsics). One thing that potentially unlocks is "unordered" memory semantics, which are intermediate between nonatomic and relaxed atomics, in that they allow much of the optimization of the former, while also not being UB if there's a data race. In a similar vein is the LLVM "freeze" operation, which makes read from uninitialized memory into a well-defined bit pattern. There's some discussion ([1] [2], for example) of adding those to Rust proper, but it's tricky.

[1]: https://internals.rust-lang.org/t/using-llvms-unordered-read...

[2]: https://internals.rust-lang.org/t/what-if-reading-uninit-ram...

But as another data point, for something I really want to do that's not yet expressible in Rust (fp16 SIMD operations), I would rather write NEON assembly language than LLVM IR. And I am quite certain I don't want to write any of the GPU variants by hand either.