Smart pointers for the kernel

(lwn.net)

Show context

smodo ◴[18 Oct 24 02:38 UTC] No.41875908[source]▶

I’m not very well versed in kernel development. But I am a Rust dev and have observed the discussion about Rust in Linux with interest… Having said that, this part of the article has me baffled:

>> implementing these features for a smart-pointer type with a malicious or broken Deref (the trait that lets a programmer dereference a value) implementation could break the guarantees Rust relies on to determine when objects can be moved in memory. (…) [In] keeping with Rust's commitment to ensuring safe code cannot cause memory-safety problems, the RFC also requires programmers to use unsafe (specifically, implementing an unsafe marker trait) as a promise that they've read the relevant documentation and are not going to break Pin.

To the uninformed this seems like crossing the very boundary that you wanted Rust to uphold? Yes it’s only an impl Trait but still… I can hear the C devs now. ‘We pinky promise to clean up after our mallocs too!’

replies(7): >>41875965 #>>41876037 #>>41876088 #>>41876177 #>>41876213 #>>41876426 #>>41877004 #

foundry27 ◴[18 Oct 24 02:53 UTC] No.41875965[source]▶

>>41875908 #

Rust’s whole premise of guaranteed memory safety through compiletime checks has always been undermined when confronted with the reality that certain foundational operations must still be implemented using unsafe. Inevitably folks concede that lower level libraries will have these unsafe blocks and still expect higher level code to trust them, and at that point we’ve essentially recreated the core paradigm of C: trust in the programmer’s diligence. Yeah Rust makes this trust visible, but it doesn’t actually eliminate it in “hard” code.

The punchline here, so to speak, is that for all Rust’s claims to revolutionize safety, it simply(!) formalizes the same unwritten social contract C developers have been meandering along with for decades. The uniqueness boils down to “we still trust the devs, but at least now we’ve made them swear on it in writing”.

replies(10): >>41876016 #>>41876042 #>>41876122 #>>41876128 #>>41876303 #>>41876330 #>>41876352 #>>41876459 #>>41876891 #>>41877732 #

jchw ◴[18 Oct 24 03:33 UTC] No.41876122[source]▶

>>41875965 #

I think when people come to these conclusions it's largely due to a misunderstanding of what exactly the point of most programming language safety measures are and why they make sense.

Something that people often ponder is why you can't just solve the null safety problem by forcing every pointer dereference to be checked, with no other changes. Well of course, you can do that. But actually, simply checking to make sure the pointer is non-null at the point of dereference gets you surprisingly little. When you do this, what you're (ostencibly) trying to do is reduce the number of null pointer dereferences, but in practice what happens now is that you just have to explicitly handle them. But, in a lot of cases, there's really nothing particularly sensible to do: the pointer not being null is an invariant that was supposed to be upheld and it wasn't, and now at the point of dereference, at runtime, there's nothing to do except crash. Which is what would've happened anyways, so what's the point? What you really want to do isn't actually prevent null pointer dereferences, it's to uphold the invariants that the pointer is non-null in the first place, ideally before you leave compile time.

Disallowing "unsafe" operations without marking them explicitly unsafe doesn't give you a whole lot, but what you can do is expand the number of explicitly safe operations to cover more of what you want to do. How Rust, and many other programming languages, have been accomplishing this is by expanding the type system, and combining this with control flow analysis. Lifetimes in Rust are a prime example, but there are many more such examples. Nullability, for example, in languages like TypeScript. When you do it this way, the safety of such "safe" operations can be guaranteed, and while these guarantees do have some caveats, they are very strong to a lot of different situations that human code reviews are not, such as an unsafe combination of two otherwise-safe changesets.

It's actually totally fine that some code will probably remain unable to be easily statically verified, the point is that we want to reduce the amount of code that can't be easily statically verified to be as small as possible. In the future we can use much less easy approaches to statically verify unsafe blocks, such as using theorem provers to try to prove the correctness of "unsafe" code. But even just reducing the amount of not-necessarily-memory-safe code is an enormous win, for obvious reasons: it dramatically reduces the surface area for vulnerabilities. Moreover, time and time again, it is validated that most new vulnerabilities come from relatively recent changes in code, which is another huge win: a lot of the unsafe foundations actually don't need to be changed very often.

There is absolutely nothing special about code written in Rust, it's doing the same shit that C code has been doing for decades (well, on the abstract anyway; I'm not trying to downplay how much more expressive it is by any means). What Rust mainly offers is a significantly more advanced type system that allows validating many more invariants at compile-time. God knows C developers on large projects like the Linux kernel care about validating invariants: large amounts of effort have been poured into static checking tools for C that do exactly this. Rust is a step further though, as the safe subset of Rust provides guarantees that you basically can't just tack onto C with only more static checking tools.

replies(2): >>41876308 #>>41876615 #

1. sfvisser ◴[18 Oct 24 04:19 UTC] No.41876308[source]▶

>>41876122 #

Isn’t the argument that by checking for NULL you can now safely crash/panic instead of going into undefined behavior and being a potential security hazard?

replies(2): >>41876394 #>>41877163 #

2. jchw ◴[18 Oct 24 04:40 UTC] No.41876394[source]▶

>>41876308 (TP) #

The potential for undefined behavior is, I will agree, potentially fairly serious, especially depending on specific circumstances... (In most cases it should reliably hit an unmapped page and cause an exception, but there are exceptions on weird targets or with huge offsets.) But, you can pretty much entirely ignore it if you can just guarantee that the pointer isn't NULL in the first place, which not only prevents you from needing to worry about the undefined behavior, but also about incorrect code that might violate the invariant in the first place, since it is statically-checked.

If you were only afraid of the undefined behavior, you could augment the compiler to insert runtime checks anywhere undefined behavior could occur (which obviously can be done with Clang sanitizers.) However, the undefined behavior problem is really just a symptom of incorrect code, so it'd be even better if we could just prevent that instead.

In high level languages like Java and Python there is just as much, if not more, interest in preventing null reference exceptions, even though they are "safe".

replies(2): >>41876616 #>>41877503 #

3. eru ◴[18 Oct 24 05:36 UTC] No.41876616[source]▶

>>41876394 #

> (In most cases it should reliably hit an unmapped page and cause an exception, but there are exceptions on weird targets or with huge offsets.)

The kernel is one such exception.

replies(1): >>41878410 #

4. immibis ◴[18 Oct 24 07:42 UTC] No.41877163[source]▶

>>41876308 (TP) #

If that was the only point, we could simply add a compiler flag to make null pointer deref defined behaviour (raise SIGSEGV). It's already defined behaviour everywhere except the compiler's optimizer - unlike say a use after free.

5. thayne ◴[18 Oct 24 08:48 UTC] No.41877503[source]▶

>>41876394 #

> In most cases it should reliably hit an unmapped page and cause an exception, but there are exceptions on weird targets or with huge offsets

Perhaps the most important exception is when the optimizer assumed the pointer was non-null, so optimized it in a way that produces completely unexpected behavior when it is null.

Also use-after-free and use of uninitialized pointers is more likely to point to incorrect, but mapped, locations.

replies(1): >>41878531 #

6. jchw ◴[18 Oct 24 11:35 UTC] No.41878410{3}[source]▶

>>41876616 #

Depends a lot on the system, but I don't think this is much of a problem with modern Linux systems. Looking on my machine, vm.mmap_min_addr is set to 65536, not to mention the mitigations modern CPUs have for preventing unintended access to user pages. Just as in userspace, a null dereference on a modern Linux system is almost guaranteed to hit a trap.

That said, a potentially bigger problem is what happens when handling it. Instead of a kernel panic, nowadays you get a kernel oops. That's definitely going to have weird side-effects that could have e.g. security implications. But honestly, this all goes back to the original problem: in a lot of cases, there just isn't really a more sensible thing to do anyways. Even if the null dereference itself is potentially scary, by the time you get to the point where it might happen, you've already missed the actual underlying problem, and there might not be anything reasonable you can do.

I will grant you though that there are definitely some exotic cases where null dereferences won't trap. But this wasn't the point, I glossed over it for a reason.

replies(1): >>41878560 #

7. jchw ◴[18 Oct 24 11:53 UTC] No.41878531{3}[source]▶

>>41877503 #

> Perhaps the most important exception is when the optimizer assumed the pointer was non-null, so optimized it in a way that produces completely unexpected behavior when it is null.

> Also use-after-free and use of uninitialized pointers is more likely to point to incorrect, but mapped, locations.

I stuck to a null pointer dereference because it's useful for demonstration since the side-effect of hitting one is usually not a huge deal, but actually it wouldn't matter if it were a huge deal or not. The point I'm trying to make, and maybe not making obvious enough, is that the null pointer dereference is just a symptom of the fact that other invariants are not being upheld; it's not just about preventing an unsafe operation, it's about preventing the kinds of incorrect code that lead to them. It's the same for a use-after-free. That's exactly why I am a fan of Rusts' borrow checker, you can statically eliminate the problem that causes use-after-frees.

It isn't really that hard to construct a memory safe programming language, but the "obvious" ways of doing it have trade-offs that are undesirable or infeasible for some use cases. Rather than make the operations "more safe" by ducktaping runtime checks, Rust just forces the code to be more correct by statically checking invariants.

8. eru ◴[18 Oct 24 11:58 UTC] No.41878560{4}[source]▶

>>41878410 #

See https://lwn.net/Articles/342330/

replies(2): >>41878656 #>>41891962 #

9. jchw ◴[18 Oct 24 12:12 UTC] No.41878656{5}[source]▶

>>41878560 #

We're really going far out into the unrelated weeds now, but this relied on a myriad of bugs that were since fixed (like MMAP_PAGE_ZERO overriding mmap_min_addr, and MMAP_PAGE_ZERO not being cleared when exec'ing a setuid/setgid binary) and would be thwarted by modern processor mitigations (like SMAP and SMEP) which make this entire class of exploit usually impossible. You have to work a lot harder to have an exploitable null pointer dereference these days, and when you do, it's usually not related to the null pointer dereference itself, but actually what happens after trapping.

10. throw16180339 ◴[20 Oct 24 00:37 UTC] No.41891962{5}[source]▶

>>41878560 #

If you're a kernel developer then turn -fdelete-null-pointer-checks off. There's nothing profound about this, just code compiled with the wrong settings 15 years ago.

↑