←back to thread

Tree Borrows

(plf.inf.ethz.ch)
572 points zdw | 2 comments | | HN request time: 0.417s | source
Show context
jcalvinowens ◴[] No.44513250[source]
> On the one hand, compilers would like to exploit the strong guarantees of the type system—particularly those pertaining to aliasing of pointers—in order to unlock powerful intraprocedural optimizations.

How true is this really?

Torvalds has argued for a long time that strict aliasing rules in C are more trouble than they're worth, I find his arguments compelling. Here's one of many examples: https://lore.kernel.org/all/CAHk-=wgq1DvgNVoodk7JKc6BuU1m9Un... (the entire thread worth reading if you find this sort of thing interesting)

Is Rust somehow fundamentally different? Based on limited experience, it seems not (at least, when unsafe is involved...).

replies(11): >>44513333 #>>44513357 #>>44513452 #>>44513468 #>>44513936 #>>44514234 #>>44514867 #>>44514904 #>>44516742 #>>44516860 #>>44517860 #
tliltocatl ◴[] No.44514904[source]
It is mostly useful on arrays/numeric code, probably next to useless otherwise. Numerics people was the ones who sponsored much of compiler/optimization work in the first place, that's how strict aliasing came to be.
replies(1): >>44515020 #
dzaima ◴[] No.44515020[source]
I don't think the usefulness is that skewed towards numerics?

Both clang/llvm and gcc can do alias checking at runtime if they can't at compile-time, which makes loops vectorizable without alias info, at the cost of a bit of constant overhead for checking aliasing. (there's the exception of gather loads though, where compile-time aliasing info is basically required)

And on the other hand there's good potential for benefit to normal code (esp. code with layers of abstractions) - if you have a `&i32`, or any other immutable reference, it's pretty useful for compiler to be able to deduplicate/CSE loads/computations from it from across the whole function regardless of what intermediate writes to potentially-other things there are.

replies(1): >>44521705 #
1. tliltocatl ◴[] No.44521705[source]
> pretty useful for compiler to be able to deduplicate/CSE loads/computations

Yes, but is it a performance improvement significant enough? L1 latency is single cycle. Is the performance improvement from eliminating that worth the trouble it brings to the application programmer?

replies(1): >>44527480 #
2. dzaima ◴[] No.44527480[source]
L1 latency is 4 cycles typically (1 nanosecond would be closer). And of course it gets longer if you're chasing through multiple pointers.

It of course depends on the specific program, but, looking at any optimization at the level of separate impacted assembly intructions, everything other than mispredictions, division, and vectorization is "just a couple cycles" so that's not really a meaningful way to look at them.