Pitfalls of Safe Rust

(corrode.dev)

168 points pjmlp | 4 comments | 04 Apr 25 17:55 UTC | HN request time: 0.511s | source

Show context

nerdile ◴[06 Apr 25 17:56 UTC] No.43603402[source]▶

Title is slightly misleading but the content is good. It's the "Safe Rust" in the title that's weird to me. These apply to Rust altogether, you don't avoid them by writing unsafe Rust code. They also aren't unique to Rust.

A less baity title might be "Rust pitfalls: Runtime correctness beyond memory safety."

replies(1): >>43603739 #

burakemir ◴[06 Apr 25 18:33 UTC] No.43603739[source]▶

>>43603402 #

It is consistent with the way the Rust community uses "safe": as "passes static checks and thus protects from many runtime errors."

This regularly drives C++ programmers mad: the statement "C++ is all unsafe" is taken as some kind of hyperbole, attack or dogma, while the intent may well be to factually point out the lack of statically checked guarantees.

It is subtle but not inconsistent that strong static checks ("safe Rust") may still leave the possibility of runtime errors. So there is a legitimate, useful broader notion of "safety" where Rust's static checking is not enough. That's a bit hard to express in a title - "correctness" is not bad, but maybe a bit too strong.

replies(5): >>43603865 #>>43603876 #>>43603929 #>>43604918 #>>43605986 #

whytevuhuni ◴[06 Apr 25 18:45 UTC] No.43603865[source]▶

>>43603739 #

No, the Rust community almost universally understands "safe" as referring to memory safety, as per Rust's documentation, and especially the unsafe book, aka Rustonomicon [1]. In that regard, Safe Rust is safe, Unsafe Rust is unsafe, and C++ is also unsafe. I don't think anyone is saying "C++ is all unsafe."

You might be talking about "correct", and that's true, Rust generally favors correctness more than most other languages (e.g. Rust being obstinate about turning a byte array into a file path, because not all file paths are made of byte arrays, or e.g. the myriad string types to denote their semantics).

[1] https://doc.rust-lang.org/nomicon/meet-safe-and-unsafe.html

replies(3): >>43604067 #>>43604190 #>>43604779 #

pjmlp ◴[06 Apr 25 19:10 UTC] No.43604067[source]▶

>>43603865 #

Mostly, there is a sub culture that promotes to taint everything as unsafe that could be used incorrectly, instead of memory safety related operations.

replies(2): >>43604325 #>>43605297 #

dymk ◴[06 Apr 25 19:44 UTC] No.43604325[source]▶

>>43604067 #

That subculture is called “people who haven’t read the docs”, and I don’t see why anyone would give a whole lot of weight to their opinion on what technical terms mean

replies(3): >>43604715 #>>43605171 #>>43606488 #

pkhuong ◴[06 Apr 25 21:38 UTC] No.43605171[source]▶

>>43604325 #

Someone tell that to the standard library. No memory safety involved in non-zero numbers https://doc.rust-lang.org/std/num/struct.NonZero.html#tymeth...

replies(1): >>43605264 #

whytevuhuni ◴[06 Apr 25 21:51 UTC] No.43605264[source]▶

>>43605171 #

There is, since the zero is used as a niche value optimisation for enums, so that Option<NonZero<u32>> occupies the same amount of memory as u32.

But this can be used with other enums too, and in those cases, having a zero NonZero would essentially transmute the enum into an unexpected variant, which may cause an invariant to break, thus potentially causing memory unsafety in whatever required that invariant.

replies(1): >>43605313 #

1. zozbot234 ◴[06 Apr 25 21:59 UTC] No.43605313[source]▶

>>43605264 #

> which may cause an invariant to break, thus potentially causing memory unsafety in whatever required that invariant

By that standard anything and everything might be tainted as "unsafe", which is precisely GP's point. Whether the unsafety should be blamed on the outside code that's allowed to create a 0-valued NonZero<…> or on the code that requires this purported invariant in the first place is ultimately a matter of judgment, that people may freely disagree about.

replies(3): >>43606286 #>>43607183 #>>43612667 #

2. genrilz ◴[07 Apr 25 00:36 UTC] No.43606286[source]▶

>>43605313 (TP) #

EDIT: A summary of this is that it is impossible to write a sound std::Vec implementation if NonZero::new_unchecked is a safe function. This is specifically because creating a value of NonZero which is 0 is undefined behavior which is exploited by niche optimization. If you created your own `struct MyNonZero(u8)`, then you wouldn't need to mark MyNonZero::new_unchecked as unsafe because creating MyNonZero(0) is a "valid" value which doesn't trigger undefined behavior.

The issue is that this could potentially allow creating a struct whose invariants are broken in safe rust. This breaks encapsulation, which means modules which use unsafe code (like `std::vec`) have no way to stop safe code from calling them with the invariants they rely on for safety broken. Let me give an example starting with an enum definition:

  // Assume std::vec has this definition
  struct Vec<T> {
    capacity: usize,
    length:   usize,
    arena:    * T
  }
  
  enum Example {
    First {
      capacity: usize,
      length:   usize,
      arena:    usize,
      discriminator: NonZero<u8>
    },
    Second {
      vec: Vec<u8>
    }
  }

Now assume the compiler has used niche optimization so that if the byte corresponding to `discriminator` is 0, then the enum is `Example::Second`, while if the byte corresponding to `discriminator` is not 0, then the enum is `Example::First` with discriminator being equal to its given non-zero value. Furthermore, assume that `Example::First`'s `capacity`, `length`, and `arena` fields are in the in the same position as the fields of the same name for `Example::Second.vec`. If we allow `fn NonZero::new_unchecked(u8) -> NonZero<u8>` to be a safe function, we can create an invalid Vec:

  fn main() {
    let evil = NonZero::new_unchecked(0);
  
    // We write as an Example::First,
    // but this is read as an Example::Second
    // because discriminator == 0 and niche optimization
    let first = Example::First {
      capacity: 9001, length: 9001,
      arena: 0x20202020,
      discriminator: evil
    }

    if let Example::Second{ vec: bad_vec } = first {
      // If the layout of Example is as I described,
      // and no optimizations occur, we should end up in here.

      // This writes 255 to address 0x20202020
      bad_vec[0] = 255;
    }
  }

So if we allowed new_unchecked to be safe, then it would be impossible to write a sound definition of Vec.

3. rcxdude ◴[07 Apr 25 03:00 UTC] No.43607183[source]▶

>>43605313 (TP) #

Yeah, anything can (and should) be marked unsafe if it could lead to memory safety problems. And so if it potentially breaks an invariant which is relied on for memory safety, it should be marked unsafe (conversely, code should not rely on an unchecked, safe condition for memory safety). That's basically how it works, Rust has the concept of unsafe functions so that libraries can communicate to users about what can and can't be relied on to keep memory safety without manual checking. This requires a common definition of 'safe', but it then means there isn't any argument about where the bug is: if the invariant isn't enforced by the compiler in safe code, then other code should not rely on it. If it is, then the bug is in the unsafe code that broke the invariant.

4. steveklabnik ◴[07 Apr 25 15:33 UTC] No.43612667[source]▶

>>43605313 (TP) #

> Whether the unsafety should be blamed on the outside code that's allowed to create a 0-valued NonZero<…> or on the code that requires this purported invariant in the first place is ultimately a matter of judgment, that people may freely disagree about.

It's not, though. NonZero<T> has an invariant that a zero value is undefined behavior. Therefore, any API which allows for the ability to create one must be unsafe. This is a very straightforward case.

↑