The associated issue for comparing two u16s is interesting.
replies(3):
It actually depends on the uArch, Apple silicon doesn't seem to have this restriction: https://news.ycombinator.com/item?id=43888005
> In a non-inlined, non-PGO scenario the compiler doesn't have enough information to tell whether the optimization is suitable.
I guess you're talking about stores and load across function boundaries?
Trivia: X86 LLVM creates a whole Pass just to prevent this partial-store-to-load issue on Intel CPUs: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Targ...