Most active commenters
  • ralfj(9)
  • GolDDranks(3)
  • hamcocar(3)

←back to thread

Tree Borrows

(plf.inf.ethz.ch)
565 points zdw | 26 comments | | HN request time: 2.25s | source | bottom
Show context
jcalvinowens ◴[] No.44513250[source]
> On the one hand, compilers would like to exploit the strong guarantees of the type system—particularly those pertaining to aliasing of pointers—in order to unlock powerful intraprocedural optimizations.

How true is this really?

Torvalds has argued for a long time that strict aliasing rules in C are more trouble than they're worth, I find his arguments compelling. Here's one of many examples: https://lore.kernel.org/all/CAHk-=wgq1DvgNVoodk7JKc6BuU1m9Un... (the entire thread worth reading if you find this sort of thing interesting)

Is Rust somehow fundamentally different? Based on limited experience, it seems not (at least, when unsafe is involved...).

replies(11): >>44513333 #>>44513357 #>>44513452 #>>44513468 #>>44513936 #>>44514234 #>>44514867 #>>44514904 #>>44516742 #>>44516860 #>>44517860 #
1. ralfj ◴[] No.44513333[source]
I would agree that C's strict aliasing rules are terrible. The rules we are proposing for Rust are very different. They are both more useful for compilers and, in my opinion, less onerous for programmers. We also have an actual in-language opt-out: use raw pointers. And finally, we have a tool you can use to check your code.

But in the end, it's a trade-off, like everything in language design. (In life, really. ;) We think that in Rust we may have found a new sweet spot for this kind of optimizations. Time will tell whether we are right.

replies(3): >>44513873 #>>44514788 #>>44521382 #
2. kookamamie ◴[] No.44513873[source]
Agreed about C's aliasing rules. Fortran had a better set of defaults.
3. NobodyNada ◴[] No.44514788[source]
As someone who has been writing a lot of unsafe Rust (mostly in an embedded context), I'm thrilled about and thankful for the work that you, your students, and the opsem team are doing.

When you're working with anything below the application level, C's confusing and underspecified rules about UB are almost impossible to keep track of, especially when it comes to aliasing and volatile/MMIO. The spec is so difficult to read and full of complicated cross-references that to actually get a practical answer you have to look for a random Stack Overflow post that may or may not have a correct interpretation of the spec, and may or may not address your specific problem.

Rust right now feels a lot harder to work with, because the spec isn't done. When you have a concrete question about a piece of code, like "is this conversion from an &mut to a *mut and back sound", and you try to look for documentation on it, you get either "Nobody knows, Rust aliasing model isn't defined"; a hand-wavy explanation that is not rigorous or specific; or a model like Stack Borrows or Tree Borrows that's defined a little too formally for easy digestion :)

But when I really started digging, I realized just how much cleaner Rust's semantics are. References aren't actually hard, Tree Borrows basically boils down to "while an &mut reference is live, you can only access the value through pointers or references derived from that reference". Pointer operations have straightforward semantics, there's no confusing notions of typed memory, and no UB "just because" for random things like integer overflow. It's just so much less complicated to understand than C's abstract machine.

I'm really looking forward to things like MiniRust, and to an aliasing model making it into the Reference / other documentation, because at that point I feel like unsafe Rust will be way easier to write confidently and correctly than C.

Congrats on the publication, and thanks again for the work you all have put into this.

replies(3): >>44517236 #>>44517279 #>>44518115 #
4. gronpi ◴[] No.44517236[source]
In C, you can alias pointers if they have compatible types. Not the case in Rust for mutable references. And the rules of Rust have tripped up even senior Rust developers.

https://github.com/rust-lang/rust/commit/71f5cfb21f3fd2f1740...

Without MIRI, a lot of Rust developers would be lost, as they do not even attempt to understand unsafe. And MIRI cannot and does not cover everything, no matter how good and beloved it is.

It should have been possible for senior Rust developers to write UB-free code without having to hope that MIRI saves them.

replies(3): >>44517796 #>>44518119 #>>44518365 #
5. gronpi ◴[] No.44517279[source]
One example of MIRI not being guaranteed to handle all cases.

https://github.com/rust-lang/rust/pull/139553#issuecomment-2...

The above issue was part of diagnosing UB in Rust stdlib.

replies(1): >>44518126 #
6. GolDDranks ◴[] No.44517796{3}[source]
The situation is not that bad. The rules of unsafe code were pretty badly defined back then, but they are in process of becoming a lot clearer, and like the grandparent argues, with a well-defined aliasing model like Tree Borrows, they are easier to understand than C's.

If you look into the code you linked, the problem was about accessing undefined bytes though an aliased, differently-typed pointer – something you would have hard time doing in C to begin with. MaybeUninit was a new thing back then. I think that nowadays, a senior Rust developer would clear the hurdles better.

replies(1): >>44518178 #
7. ralfj ◴[] No.44518115[source]
Thank you for your kind words. :)
8. ralfj ◴[] No.44518119{3}[source]
C has an opt-out that works sometimes, if a compatible type exists. Rust has an opt-out that works always: use raw pointers (or interior mutable shared references) for all accesses, and you can stop worrying about aliasing altogether.
replies(2): >>44518264 #>>44518705 #
9. ralfj ◴[] No.44518126{3}[source]
Yeah, concurrency bugs that only occur in very specific situations are hard to track down with a pure testing tool. However, we have some ongoing work that should make Miri a lot better at this... we are just not sure yet whether we can get it to have usable performance. ;)
replies(1): >>44518298 #
10. hamcocar ◴[] No.44518178{4}[source]
I am very sorry, but you do not address that TBAA, like C has by default, generally is easier than just no aliasing, like what Rust has for mutable references. This is a major difference. C code can opt into a similar kind of aliasing, namely by using _restrict_, but that is opt-in, while it is always on for Rust.

And there is newer UB as well in Rust stdlib

https://github.com/rust-lang/rust/pull/139553

replies(2): >>44518319 #>>44518355 #
11. hamcocar ◴[] No.44518264{4}[source]
I think that way of describing it is really weird.

In C, you do not use any special keywords to opt into or opt out of TBAA, instead it is the rule by default that one must follow. I do not consider that 'opting out'. One can disable that in some compilers by disabling 'strict aliasing', as the Linux kernel does, but that is usually on a whole-program basis and not standard.

In Rust, using raw pointers is using a different mechanism, and mutable references are always 'no aliasing'.

An example of opting in would be C's "restrict" keyword, where one opts into a similar constraint to that of Rust's 'no aliasing' for mutable references.

>use raw pointers (or interior mutable shared references) for all accesses, and you can stop worrying about aliasing altogether.

And dereferencing a raw pointer requires 'unsafe', right? And if one messee the rules up for it, theN UB.

Can you confirm that the interaction between raw pointers and mutable references still requires care? Is this comment accurate?

>It is safe to hold a raw pointer, const T or mut T, at the same time as a mutable reference, &mut T, to the same data. However, it is Undefined Behaviour if you deference that raw pointer while the mutable reference is still live.

replies(1): >>44518327 #
12. hamcocar ◴[] No.44518298{4}[source]
>Yeah, concurrency bugs that only occur in very specific situations are hard to track down with a pure testing tool.

It would have been better to have prevented it in the first place, and it is inconsistent with "fearless concurrency" when the Rust stdlib has UB.

And there are categories of cases that Miri does not handle either, FFI being a clear example as far as I know.

replies(1): >>44519031 #
13. simonask ◴[] No.44518319{5}[source]
How is that bug in the stdlib related to any of this?
14. ralfj ◴[] No.44518327{5}[source]
> Can you confirm that the interaction between raw pointers and mutable references still requires care?

Yes, that still requires care.

> In C, you do not use any special keywords to opt into or opt out of TBAA, instead it is the rule by default that one must follow

That's exactly the problem: sometimes you need to write code where there's more aliasing, and C just makes that impossible. It tells you to use memcpy instead, causing extra copies which cost performance.

In Rust, you can still write the code without the extra copies. Yes, it requires unsafe, but it's at least possible. (Remember that writing C is like having all code be unsafe, so in a comparison with C, if 10% of the Rust code needs to be unsafe that's still 90% less unsafe than C.)

15. GolDDranks ◴[] No.44518355{5}[source]
> TBAA, like C has by default, generally is easier than just no aliasing [note: I assume no aliasing here to refer Rust's mutable references, not C's restrict]

I don't accept that statement. Rust's mutable references are statically checked, so they are impossible to get wrong (modulo compiler bugs) if you aren't using unsafe code. Using raw pointers has no no-aliasing requirements, so again, it's easier than in C.

The hard part is mixing mutable references and raw pointers. In 95% of Rust code, you are not required to do that, and you shouldn't do that. In the remaining 5%, you should understand the aliasing model. In that case, indeed, you need to know more than what TBAA requires you to know. But that's for the case where you can also _DO_ more than TBAA would allow you to do.

replies(2): >>44518661 #>>44519044 #
16. ralfj ◴[] No.44518365{3}[source]
> And the rules of Rust have tripped up even senior Rust developers.

Yeah, even senior Rust devs make mistakes. Thanks to Miri, we can catch such mistakes. No reasonable person would expect even senior Rust devs to be magic superheroes that can write tricky unsafe code without making any mistake.

How confident are you that glibc has zero Undefined Behavior? I rather doubt it. The Rust standard library has its entire test suite (well, almost everything, except for some parts in std::fs and std::net) run through Miri. That's not a proof there's no UB in corner cases not covered by the tests, but it means we are much, much more likely to find such bugs and fix them than comparable C code.

17. gryhili ◴[] No.44518661{6}[source]
Ycombinator/Hacker News is giving me trouble regarding replying, maybe the debate is not going the way that some people want it to go. Sad.
replies(2): >>44519119 #>>44528256 #
18. mgaunard ◴[] No.44518705{4}[source]
the opt-out in C is to use char*, which is specifically allowed to alias other types.

There are also GCC extensions to build other types that may alias.

replies(1): >>44519048 #
19. ralfj ◴[] No.44519031{5}[source]
> it is inconsistent with "fearless concurrency" when the Rust stdlib has UB.

It is not. "fearless X" only applies to safe code. Rust was always very clear about that. It turns out that in practice, if >90% of your code are "fearless", that actually significantly reduces the amount of bugs you have to worry about, even if the remaining 10% can still have UB. (I don't have hard numbers on how much code is unsafe, AFAIK it is much less than 10%.)

But yeah, Miri cannot find all UB bugs. We are also very clear about that. But you don't have to find all UB bugs to make progress in this space compared to the status quo in C/C++.

20. ralfj ◴[] No.44519044{6}[source]
Yes, exactly. I'd be surprised if it's 5% -- that's about the ratio of all unsafe code, and most unsafe code does not mix references and raw pointers. I think it's less than 1% of the overall code that has to worry about this, but unfortunately I don't have hard data.
replies(1): >>44519199 #
21. ralfj ◴[] No.44519048{5}[source]
That doesn't let you treat data in-place at two different types, unless one of them happens to be char. So you still need to make a copy in many cases, e.g. to convert between an array of uint32_t and an array of uint16_t.

In some cases you can use unions, but that, too, is very limited, and I am not sure it would let you do this particular case.

replies(1): >>44521399 #
22. eru ◴[] No.44519119{7}[source]
I think HN is (intentionally) delaying the ability to reply to deeply nested comments.
23. GolDDranks ◴[] No.44519199{7}[source]
Agreed, I said 95% just to be generous, should've mentioned that.
24. jcalvinowens ◴[] No.44521382[source]
Thanks for replying Ralf. I'm barely qualified to have an opinion about these things, if I am at all :)

> The rules we are proposing for Rust are very different. They are both more useful for compilers and, in my opinion, less onerous for programmers.

My question was too vague, what I meant to ask was: what aliasing optimizations will be possible in Rust that aren't possible in C?

Example 18 in the paper is one, if I'm understanding it. But that specific example with a pointer passed to a function seems analogous to what is possible with 'restrict' in C, I'm struggling to come up with more general examples that don't involve gratuitous globals.

It seems to me like having to allow for the possibility of unsafe constrains the ability to do "heroic" optimizations such as what jcranmer described elsewhere in the thread. If that is true to some extent, is there a future where Rust might optimize more aggressively if the programmer promises not to use unsafe anywhere in the program? That's something I've always been curious about.

25. mgaunard ◴[] No.44521399{6}[source]
Not if you use may alias.

Also yes, that's the point of the C++ object model, only one type may exist in one location at a time.

Since usually different register banks are used for different types, having to copy makes sense.

26. tomhow ◴[] No.44528256{7}[source]
We have rate-limiters for new accounts to limit the damage that can be done by spammers, trolls and flamewar participants.

For users who have valuable contributions to make to threads and are earnest about participating positively to HN, we're happy (really, very happy) to turn off the rate limiters.

You seem to have registered multiple different accounts in order to keep participating in the thread, then hitting the rate limiters each time. Please don't do this. It's confusing for other readers and is against the guidelines [1]:

Throwaway accounts are ok for sensitive information, but please don't create accounts routinely. HN is a community—users should have an identity that others can relate to.

If you would like us to merge these accounts and turn of the rate-limiter, please email us at hn@ycombinator.com.

[1] https://news.ycombinator.com/newsguidelines.html