Smart pointers for the kernel

(lwn.net)

169 points signa11 | 3 comments | 18 Oct 24 02:10 UTC | HN request time: 0.618s | source

Show context

smodo ◴[18 Oct 24 02:38 UTC] No.41875908[source]▶

I’m not very well versed in kernel development. But I am a Rust dev and have observed the discussion about Rust in Linux with interest… Having said that, this part of the article has me baffled:

>> implementing these features for a smart-pointer type with a malicious or broken Deref (the trait that lets a programmer dereference a value) implementation could break the guarantees Rust relies on to determine when objects can be moved in memory. (…) [In] keeping with Rust's commitment to ensuring safe code cannot cause memory-safety problems, the RFC also requires programmers to use unsafe (specifically, implementing an unsafe marker trait) as a promise that they've read the relevant documentation and are not going to break Pin.

To the uninformed this seems like crossing the very boundary that you wanted Rust to uphold? Yes it’s only an impl Trait but still… I can hear the C devs now. ‘We pinky promise to clean up after our mallocs too!’

replies(7): >>41875965 #>>41876037 #>>41876088 #>>41876177 #>>41876213 #>>41876426 #>>41877004 #

foundry27 ◴[18 Oct 24 02:53 UTC] No.41875965[source]▶

>>41875908 #

Rust’s whole premise of guaranteed memory safety through compiletime checks has always been undermined when confronted with the reality that certain foundational operations must still be implemented using unsafe. Inevitably folks concede that lower level libraries will have these unsafe blocks and still expect higher level code to trust them, and at that point we’ve essentially recreated the core paradigm of C: trust in the programmer’s diligence. Yeah Rust makes this trust visible, but it doesn’t actually eliminate it in “hard” code.

The punchline here, so to speak, is that for all Rust’s claims to revolutionize safety, it simply(!) formalizes the same unwritten social contract C developers have been meandering along with for decades. The uniqueness boils down to “we still trust the devs, but at least now we’ve made them swear on it in writing”.

replies(10): >>41876016 #>>41876042 #>>41876122 #>>41876128 #>>41876303 #>>41876330 #>>41876352 #>>41876459 #>>41876891 #>>41877732 #

wbl ◴[18 Oct 24 03:03 UTC] No.41876016[source]▶

>>41875965 #

The difference is every line of C can do something wrong while very few lines of Rust can. It's much easier to scrutinize a small well contained class with tools like formal methods than a sprawling codebase.

replies(2): >>41876538 #>>41877544 #

uecker ◴[18 Oct 24 05:16 UTC] No.41876538[source]▶

>>41876016 #

If you limited wrong to "memory safe" and also ignore that unsafe parts violating invariants can make safe parts of Rust to be wrong.

replies(1): >>41876669 #

Dylan16807 ◴[18 Oct 24 05:50 UTC] No.41876669[source]▶

>>41876538 #

> If you limited wrong to "memory safe"

Yes, because this is a discussion about the value of "unsafe", so we're only talking about the wrongs that are enabled by "unsafe".

> and also ignore that unsafe parts violating invariants can make safe parts of Rust to be wrong.

If I run a line of code that corrupts memory, and the program crashes 400 lines later, I don't say the spot where it crashes is wrong, I say the memory corrupting line is wrong. So I disagree with you here.

replies(1): >>41877536 #

uecker ◴[18 Oct 24 08:55 UTC] No.41877536[source]▶

>>41876669 #

It does not invalidate an argument that you do not want to talk about it.

Regarding the second point: yes, you can then blame the "unsafe" part but the issue is that the problem might not be so localized as the notion of "only auditing unsafe blocks is sufficient" implies. You may need to understand the subtle interaction of unsafe blocks with the rest of the program.

replies(3): >>41877958 #>>41878776 #>>41882921 #

Filligree ◴[18 Oct 24 12:27 UTC] No.41878776[source]▶

>>41877536 #

Unsafe blocks have a specific set of requirements they have to abide by.

Assuming they successfully do so, it is then guaranteed that no safe code is able to trigger undefined behaviour by calling the unsafe code.

Importantly, this can be checked without ever reading any of the safe code.

replies(1): >>41880683 #

uecker ◴[18 Oct 24 16:05 UTC] No.41880683[source]▶

>>41878776 #

Let's discuss this example:

https://github.com/ejmahler/transpose/blob/e70dd159f1881d86a...

The code is buggy. Where is the bug?

replies(2): >>41882679 #>>41882899 #

NobodyNada ◴[18 Oct 24 19:55 UTC] No.41882899[source]▶

>>41880683 #

The code uses `unsafe` blocks to call `unsafe` functions that have the documented invariant that the parameters passed in accurately describe the size of the array. However, this invariant is not necessarily held if an integer overflow occurs when evaluating the `assert` statements -- for example, by calling `transpose(&[], &mut [], 2, usize::MAX / 2 + 1)`.

To answer the question of "where is the bug" -- by definition, it is where the programmer wrote an `unsafe` block that assumes an invariant which does not necessarily hold. Which I assume is the point you're trying to make -- that a buggy assert in "safe" code broke an invariant assumed by unsafe code. And indeed, that's part of the danger of `unsafe` -- by using an `unsafe` block, you are asserting that there is no possible path that could be taken, even by safe code you're interacting with, that would break one of your assumed invariants. The use of an `unsafe` block is not just an assertion that the programmer has verified the contents of the block to be sound given a set of invariants, but also that any inputs that go into the block uphold those invariants.

And indeed, I spotted this bug by thinking about the invariants in that way. I started by reading the innermost `unsafe` functions like `transpose_small` to make sure that they can't ever access an index outside of the bounds provided. Then, I looked at all the `unsafe` blocks that call those functions, and read the surrounding code to see if I could spot any errors in the bounds calculations. I observed that `transpose_recursive` and `transpose_tiled` did not check to ensure the bounds provided were actually valid before handing them off to `unsafe` code, which meant I also had to check any safe code that called those functions to see how the bounds were calculated; and there I found the integer overflow.

So you're right that this is a case of "subtle interaction of unsafe blocks with the rest of the program", but the wonderful part of `unsafe` is that you can reduce the surface area of interaction with the rest of the program to an absolute minimum. The module you linked exposes a single function with a public, safe interface; and by convention, a safe API visible outside of its module is expected to be sound regardless of the behavior of safe code in other modules. This meant I only had to check a handful of lines of code behind the safe public interface where issues like integer overflows could break invariants. Whereas if Rust had no concept of `unsafe`, I would have to worry about potentially every single call to `transpose` across a very large codebase.

replies(1): >>41885850 #

1. uecker ◴[19 Oct 24 05:33 UTC] No.41885850[source]▶

>>41882899 #

I agree about what you write.. Also please note that I am not saying unsafe blocks are a bad idea. In fact, I think they are a great idea. But note that people run around saying "it is sufficient to audit unsafe blocks" but they really should say "audit unsafe and carefully analyze all logic elsewhere that may lead to a violation of their assumptions". You could argue "this is what they mean", but IMHO it is not quite the same thing and part of the usual exaggeration of the benefit of Rust safety, which I believe to be dangerously naive.

replies(2): >>41890366 #>>41903533 #

2. Dylan16807 ◴[19 Oct 24 20:06 UTC] No.41890366[source]▶

>>41885850 (TP) #

It's more like "audit unsafe and make sure it's impossible for safe code elsewhere to lead to a violation of its assumptions".

If you need to look at the safe code that calls into you when making your safety proof, then your unsafe code is incorrect and should immediately fail the audit.

Treat external safe code as unknown and malicious. Prove your unsafe code is correct anyway.

3. wbl ◴[21 Oct 24 12:39 UTC] No.41903533[source]▶

>>41885850 (TP) #

The goal when writing unsafe blocks is that no calls ever lead to a violation not let's silently load all the footguns.

↑