Smart pointers for the kernel

(lwn.net)

Show context

smodo ◴[18 Oct 24 02:38 UTC] No.41875908[source]▶

I’m not very well versed in kernel development. But I am a Rust dev and have observed the discussion about Rust in Linux with interest… Having said that, this part of the article has me baffled:

>> implementing these features for a smart-pointer type with a malicious or broken Deref (the trait that lets a programmer dereference a value) implementation could break the guarantees Rust relies on to determine when objects can be moved in memory. (…) [In] keeping with Rust's commitment to ensuring safe code cannot cause memory-safety problems, the RFC also requires programmers to use unsafe (specifically, implementing an unsafe marker trait) as a promise that they've read the relevant documentation and are not going to break Pin.

To the uninformed this seems like crossing the very boundary that you wanted Rust to uphold? Yes it’s only an impl Trait but still… I can hear the C devs now. ‘We pinky promise to clean up after our mallocs too!’

replies(7): >>41875965 #>>41876037 #>>41876088 #>>41876177 #>>41876213 #>>41876426 #>>41877004 #

foundry27 ◴[18 Oct 24 02:53 UTC] No.41875965[source]▶

>>41875908 #

Rust’s whole premise of guaranteed memory safety through compiletime checks has always been undermined when confronted with the reality that certain foundational operations must still be implemented using unsafe. Inevitably folks concede that lower level libraries will have these unsafe blocks and still expect higher level code to trust them, and at that point we’ve essentially recreated the core paradigm of C: trust in the programmer’s diligence. Yeah Rust makes this trust visible, but it doesn’t actually eliminate it in “hard” code.

The punchline here, so to speak, is that for all Rust’s claims to revolutionize safety, it simply(!) formalizes the same unwritten social contract C developers have been meandering along with for decades. The uniqueness boils down to “we still trust the devs, but at least now we’ve made them swear on it in writing”.

replies(10): >>41876016 #>>41876042 #>>41876122 #>>41876128 #>>41876303 #>>41876330 #>>41876352 #>>41876459 #>>41876891 #>>41877732 #

1. wbl ◴[18 Oct 24 03:03 UTC] No.41876016[source]▶

>>41875965 #

The difference is every line of C can do something wrong while very few lines of Rust can. It's much easier to scrutinize a small well contained class with tools like formal methods than a sprawling codebase.

replies(2): >>41876538 #>>41877544 #

2. uecker ◴[18 Oct 24 05:16 UTC] No.41876538[source]▶

>>41876016 (TP) #

If you limited wrong to "memory safe" and also ignore that unsafe parts violating invariants can make safe parts of Rust to be wrong.

replies(1): >>41876669 #

3. Dylan16807 ◴[18 Oct 24 05:50 UTC] No.41876669[source]▶

>>41876538 #

> If you limited wrong to "memory safe"

Yes, because this is a discussion about the value of "unsafe", so we're only talking about the wrongs that are enabled by "unsafe".

> and also ignore that unsafe parts violating invariants can make safe parts of Rust to be wrong.

If I run a line of code that corrupts memory, and the program crashes 400 lines later, I don't say the spot where it crashes is wrong, I say the memory corrupting line is wrong. So I disagree with you here.

replies(1): >>41877536 #

4. uecker ◴[18 Oct 24 08:55 UTC] No.41877536{3}[source]▶

>>41876669 #

It does not invalidate an argument that you do not want to talk about it.

Regarding the second point: yes, you can then blame the "unsafe" part but the issue is that the problem might not be so localized as the notion of "only auditing unsafe blocks is sufficient" implies. You may need to understand the subtle interaction of unsafe blocks with the rest of the program.

replies(3): >>41877958 #>>41878776 #>>41882921 #

5. ◴[18 Oct 24 08:56 UTC] No.41877544[source]▶

>>41876016 (TP) #

6. dwattttt ◴[18 Oct 24 10:22 UTC] No.41877958{4}[source]▶

>>41877536 #

> the problem might not be so localized as the notion of "only auditing unsafe blocks is sufficient" implies

It depends on what you consider "problem" can mean. An unsafe function needs someone to write unsafe in order to call it, and it's on that calling code to make sure the conditions needed to call the unsafe function are met.

If that function itself is safe, but still let's you trigger the unsafe function unsafely? That function, which had to write 'unsafe', has a bug: either it's not upholding the preconditions of the unsafe function it's calling, or it _can't_ uphold the preconditions without their own callers also being in on it, in which case they themselves need to be an unsafe function (and consider whether their design is a good one).

In this way, you'll always find unsafe 'near' the bug.

replies(1): >>41880678 #

7. Filligree ◴[18 Oct 24 12:27 UTC] No.41878776{4}[source]▶

>>41877536 #

Unsafe blocks have a specific set of requirements they have to abide by.

Assuming they successfully do so, it is then guaranteed that no safe code is able to trigger undefined behaviour by calling the unsafe code.

Importantly, this can be checked without ever reading any of the safe code.

replies(1): >>41880683 #

8. uecker ◴[18 Oct 24 16:04 UTC] No.41880678{5}[source]▶

>>41877958 #

In other words, somebody made an error somewhere.

replies(1): >>41882866 #

9. uecker ◴[18 Oct 24 16:05 UTC] No.41880683{5}[source]▶

>>41878776 #

Let's discuss this example:

https://github.com/ejmahler/transpose/blob/e70dd159f1881d86a...

The code is buggy. Where is the bug?

replies(2): >>41882679 #>>41882899 #

10. lostdog ◴[18 Oct 24 19:31 UTC] No.41882679{6}[source]▶

>>41880683 #

The most common bug in that type of code is mixing up x and y, or width and height somewhere in your loops, or maybe handling partial blocks. It's not really what Rust aims to protect against, though bounds checking is intended to be helpful here.

I don't get the argumentshere. In practice, Rust lowers the risk of most of your codebase. Yeah, it doesn't handle every logic bug, but mostly you can code with confidence, and only pay extra attention when you're coding something intricate.

A language which catches even these bugs would be incredible, and I would definitely try it out. Rust ain't that language, but it still does give you more robust programs.

replies(1): >>41885772 #

11. dwattttt ◴[18 Oct 24 19:52 UTC] No.41882866{6}[source]▶

>>41880678 #

You're thinking of C; Rust forced that somebody to write unsafe near it to create the bug.

replies(1): >>41885767 #

12. NobodyNada ◴[18 Oct 24 19:55 UTC] No.41882899{6}[source]▶

>>41880683 #

The code uses `unsafe` blocks to call `unsafe` functions that have the documented invariant that the parameters passed in accurately describe the size of the array. However, this invariant is not necessarily held if an integer overflow occurs when evaluating the `assert` statements -- for example, by calling `transpose(&[], &mut [], 2, usize::MAX / 2 + 1)`.

To answer the question of "where is the bug" -- by definition, it is where the programmer wrote an `unsafe` block that assumes an invariant which does not necessarily hold. Which I assume is the point you're trying to make -- that a buggy assert in "safe" code broke an invariant assumed by unsafe code. And indeed, that's part of the danger of `unsafe` -- by using an `unsafe` block, you are asserting that there is no possible path that could be taken, even by safe code you're interacting with, that would break one of your assumed invariants. The use of an `unsafe` block is not just an assertion that the programmer has verified the contents of the block to be sound given a set of invariants, but also that any inputs that go into the block uphold those invariants.

And indeed, I spotted this bug by thinking about the invariants in that way. I started by reading the innermost `unsafe` functions like `transpose_small` to make sure that they can't ever access an index outside of the bounds provided. Then, I looked at all the `unsafe` blocks that call those functions, and read the surrounding code to see if I could spot any errors in the bounds calculations. I observed that `transpose_recursive` and `transpose_tiled` did not check to ensure the bounds provided were actually valid before handing them off to `unsafe` code, which meant I also had to check any safe code that called those functions to see how the bounds were calculated; and there I found the integer overflow.

So you're right that this is a case of "subtle interaction of unsafe blocks with the rest of the program", but the wonderful part of `unsafe` is that you can reduce the surface area of interaction with the rest of the program to an absolute minimum. The module you linked exposes a single function with a public, safe interface; and by convention, a safe API visible outside of its module is expected to be sound regardless of the behavior of safe code in other modules. This meant I only had to check a handful of lines of code behind the safe public interface where issues like integer overflows could break invariants. Whereas if Rust had no concept of `unsafe`, I would have to worry about potentially every single call to `transpose` across a very large codebase.

replies(1): >>41885850 #

13. Dylan16807 ◴[18 Oct 24 19:58 UTC] No.41882921{4}[source]▶

>>41877536 #

Unsafe blocks have to uphold their invariants while accepting any possible input that safe code can give them. Any subtle interactions enabled by "unsafe" need to be part of the invariants. If they don't do that, it's a bug in the unsafe code, not the safe code using it.

If done properly, you can and should write out all the invariants, and a third party could create a proof that your code upholds them and they prevent memory errors. That involves checking interactions between connected unsafe blocks as a combined proof, but it won't extend to "the rest of the program" outside unsafe blocks.

14. uecker ◴[19 Oct 24 05:07 UTC] No.41885767{7}[source]▶

>>41882866 #

The bug that can lead to a violation of assumptions required for safety of the unsafe block can be elsewhere. One can hope that it is near the bloc, but there is nothing in Rust enforcing this.

replies(1): >>41890304 #

15. uecker ◴[19 Oct 24 05:09 UTC] No.41885772{7}[source]▶

>>41882679 #

The issue is a memory safety issue, which Rust aims to protect against.

But I am not saying Rust is bad. My issue is the complete unreasonable exaggeration in propaganda from "C is completely dangerous and Rust is perfectly safe". And then you discuss and end up with "Rust does not protect against everything, but it still better", which could be the start of a reasonable discussion of how much better it actually is.

replies(1): >>41887156 #

16. uecker ◴[19 Oct 24 05:33 UTC] No.41885850{7}[source]▶

>>41882899 #

I agree about what you write.. Also please note that I am not saying unsafe blocks are a bad idea. In fact, I think they are a great idea. But note that people run around saying "it is sufficient to audit unsafe blocks" but they really should say "audit unsafe and carefully analyze all logic elsewhere that may lead to a violation of their assumptions". You could argue "this is what they mean", but IMHO it is not quite the same thing and part of the usual exaggeration of the benefit of Rust safety, which I believe to be dangerously naive.

replies(2): >>41890366 #>>41903533 #

17. biorach ◴[19 Oct 24 11:12 UTC] No.41887156{8}[source]▶

>>41885772 #

> C is completely dangerous and Rust is perfectly safe"

Nobody in this conversation said that.

If you're actually continuing an argument from somewhere else you should save everyone a lot of time and say so up front, not 10 comments in.

replies(1): >>41888635 #

18. uecker ◴[19 Oct 24 16:10 UTC] No.41888635{9}[source]▶

>>41887156 #

The start of the thread was "The difference is every line of C can do something wrong while very few lines of Rust can." but this is an exaggeration of this kind.

replies(1): >>41889092 #

19. biorach ◴[19 Oct 24 17:19 UTC] No.41889092{10}[source]▶

>>41888635 #

yeah well quote that line then

20. Dylan16807 ◴[19 Oct 24 19:58 UTC] No.41890304{8}[source]▶

>>41885767 #

When you write "unsafe", you are promising to the compiler that the unsafe code enforces the assumptions it is making.

Unsafe code needs to keep its assumption-laden variables private, and it needs to verify the parameters that safe code sends it. If it doesn't do those things, it's breaking that promise.

21. Dylan16807 ◴[19 Oct 24 20:06 UTC] No.41890366{8}[source]▶

>>41885850 #

It's more like "audit unsafe and make sure it's impossible for safe code elsewhere to lead to a violation of its assumptions".

If you need to look at the safe code that calls into you when making your safety proof, then your unsafe code is incorrect and should immediately fail the audit.

Treat external safe code as unknown and malicious. Prove your unsafe code is correct anyway.

22. wbl ◴[21 Oct 24 12:39 UTC] No.41903533{8}[source]▶

>>41885850 #

The goal when writing unsafe blocks is that no calls ever lead to a violation not let's silently load all the footguns.

↑