Rust is absolutely an improvement over C in every way.
Would that failure be significantly worse than separate loading?
Just negating the optimization wouldn't be much reason against doing it. A single load is simpler and in the general case faster.
It actually depends on the uArch, Apple silicon doesn't seem to have this restriction: https://news.ycombinator.com/item?id=43888005
> In a non-inlined, non-PGO scenario the compiler doesn't have enough information to tell whether the optimization is suitable.
I guess you're talking about stores and load across function boundaries?
Trivia: X86 LLVM creates a whole Pass just to prevent this partial-store-to-load issue on Intel CPUs: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Targ...
Given this C code:
typedef struct { uint16_t a, b; } pair;
int eq_copy(pair a, pair b) {
return a.a == b.a && a.b == b.b;
}
int eq_ref(pair *a, pair *b) {
return a->a == b->a && a->b == b->b;
}
Clang generates clean code for the eq_copy variant, but complex code for the eq_ref variant. Gcc emits pretty complex code in both variants.For example, here's eq_ref from gcc -O2:
eq_ref:
movzx edx, WORD PTR [rsi]
xor eax, eax
cmp WORD PTR [rdi], dx
je .L9
ret
.L9:
movzx eax, WORD PTR [rsi+2]
cmp WORD PTR [rdi+2], ax
sete al
movzx eax, al
ret
Have a play around: https://c.godbolt.org/z/79Eaa3jYf