←back to thread

305 points todsacerdoti | 1 comments | | HN request time: 0.211s | source
Show context
mmastrac ◴[] No.44061671[source]
The associated issue for comparing two u16s is interesting.

https://github.com/rust-lang/rust/issues/140167

replies(3): >>44061906 #>>44065911 #>>44066028 #
ack_complete ◴[] No.44066028[source]
I'm surprised there's no mention of store forwarding in that discussion. The -O3 codegen is bonkers, but the -O2 output is reasonable. In the case where one of the structs has just been computed, attempting to load it as a single 32-bit load can result in a store forwarding failure that would negate the benefit of merging the loads. In a non-inlined, non-PGO scenario the compiler doesn't have enough information to tell whether the optimization is suitable.
replies(2): >>44069905 #>>44070022 #
1. mshockwave ◴[] No.44070022[source]
> In the case where one of the structs has just been computed, attempting to load it as a single 32-bit load can result in a store forwarding failure

It actually depends on the uArch, Apple silicon doesn't seem to have this restriction: https://news.ycombinator.com/item?id=43888005

> In a non-inlined, non-PGO scenario the compiler doesn't have enough information to tell whether the optimization is suitable.

I guess you're talking about stores and load across function boundaries?

Trivia: X86 LLVM creates a whole Pass just to prevent this partial-store-to-load issue on Intel CPUs: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Targ...