Most active commenters
  • rcxdude(3)

←back to thread

Pitfalls of Safe Rust

(corrode.dev)
168 points pjmlp | 16 comments | | HN request time: 0.009s | source | bottom
Show context
nerdile ◴[] No.43603402[source]
Title is slightly misleading but the content is good. It's the "Safe Rust" in the title that's weird to me. These apply to Rust altogether, you don't avoid them by writing unsafe Rust code. They also aren't unique to Rust.

A less baity title might be "Rust pitfalls: Runtime correctness beyond memory safety."

replies(1): >>43603739 #
burakemir ◴[] No.43603739[source]
It is consistent with the way the Rust community uses "safe": as "passes static checks and thus protects from many runtime errors."

This regularly drives C++ programmers mad: the statement "C++ is all unsafe" is taken as some kind of hyperbole, attack or dogma, while the intent may well be to factually point out the lack of statically checked guarantees.

It is subtle but not inconsistent that strong static checks ("safe Rust") may still leave the possibility of runtime errors. So there is a legitimate, useful broader notion of "safety" where Rust's static checking is not enough. That's a bit hard to express in a title - "correctness" is not bad, but maybe a bit too strong.

replies(5): >>43603865 #>>43603876 #>>43603929 #>>43604918 #>>43605986 #
whytevuhuni ◴[] No.43603865[source]
No, the Rust community almost universally understands "safe" as referring to memory safety, as per Rust's documentation, and especially the unsafe book, aka Rustonomicon [1]. In that regard, Safe Rust is safe, Unsafe Rust is unsafe, and C++ is also unsafe. I don't think anyone is saying "C++ is all unsafe."

You might be talking about "correct", and that's true, Rust generally favors correctness more than most other languages (e.g. Rust being obstinate about turning a byte array into a file path, because not all file paths are made of byte arrays, or e.g. the myriad string types to denote their semantics).

[1] https://doc.rust-lang.org/nomicon/meet-safe-and-unsafe.html

replies(3): >>43604067 #>>43604190 #>>43604779 #
pjmlp ◴[] No.43604067[source]
Mostly, there is a sub culture that promotes to taint everything as unsafe that could be used incorrectly, instead of memory safety related operations.
replies(2): >>43604325 #>>43605297 #
1. dymk ◴[] No.43604325{4}[source]
That subculture is called “people who haven’t read the docs”, and I don’t see why anyone would give a whole lot of weight to their opinion on what technical terms mean
replies(3): >>43604715 #>>43605171 #>>43606488 #
2. arccy ◴[] No.43604715[source]
I don't see why people would drop the "memory" part of "memory safe" and just promote the false advertising of "safe rust"
replies(1): >>43605178 #
3. pkhuong ◴[] No.43605171[source]
Someone tell that to the standard library. No memory safety involved in non-zero numbers https://doc.rust-lang.org/std/num/struct.NonZero.html#tymeth...
replies(1): >>43605264 #
4. an_ko ◴[] No.43605178[source]
It sounds like you should read the docs. It's just a subject-specific abbreviation, not an advertising trick.
replies(1): >>43605564 #
5. whytevuhuni ◴[] No.43605264[source]
There is, since the zero is used as a niche value optimisation for enums, so that Option<NonZero<u32>> occupies the same amount of memory as u32.

But this can be used with other enums too, and in those cases, having a zero NonZero would essentially transmute the enum into an unexpected variant, which may cause an invariant to break, thus potentially causing memory unsafety in whatever required that invariant.

replies(1): >>43605313 #
6. zozbot234 ◴[] No.43605313{3}[source]
> which may cause an invariant to break, thus potentially causing memory unsafety in whatever required that invariant

By that standard anything and everything might be tainted as "unsafe", which is precisely GP's point. Whether the unsafety should be blamed on the outside code that's allowed to create a 0-valued NonZero<…> or on the code that requires this purported invariant in the first place is ultimately a matter of judgment, that people may freely disagree about.

replies(3): >>43606286 #>>43607183 #>>43612667 #
7. arccy ◴[] No.43605564{3}[source]
but it is false advertising when it's used all over the internet with: rust is safe! telling the whole world to rtfm for your co-opting of the generic word "safe" is like advertisers telling you to read the fine print: a sleazy tactic.
replies(1): >>43607855 #
8. genrilz ◴[] No.43606286{4}[source]
EDIT: A summary of this is that it is impossible to write a sound std::Vec implementation if NonZero::new_unchecked is a safe function. This is specifically because creating a value of NonZero which is 0 is undefined behavior which is exploited by niche optimization. If you created your own `struct MyNonZero(u8)`, then you wouldn't need to mark MyNonZero::new_unchecked as unsafe because creating MyNonZero(0) is a "valid" value which doesn't trigger undefined behavior.

The issue is that this could potentially allow creating a struct whose invariants are broken in safe rust. This breaks encapsulation, which means modules which use unsafe code (like `std::vec`) have no way to stop safe code from calling them with the invariants they rely on for safety broken. Let me give an example starting with an enum definition:

  // Assume std::vec has this definition
  struct Vec<T> {
    capacity: usize,
    length:   usize,
    arena:    * T
  }
  
  enum Example {
    First {
      capacity: usize,
      length:   usize,
      arena:    usize,
      discriminator: NonZero<u8>
    },
    Second {
      vec: Vec<u8>
    }
  }
Now assume the compiler has used niche optimization so that if the byte corresponding to `discriminator` is 0, then the enum is `Example::Second`, while if the byte corresponding to `discriminator` is not 0, then the enum is `Example::First` with discriminator being equal to its given non-zero value. Furthermore, assume that `Example::First`'s `capacity`, `length`, and `arena` fields are in the in the same position as the fields of the same name for `Example::Second.vec`. If we allow `fn NonZero::new_unchecked(u8) -> NonZero<u8>` to be a safe function, we can create an invalid Vec:

  fn main() {
    let evil = NonZero::new_unchecked(0);
  
    // We write as an Example::First,
    // but this is read as an Example::Second
    // because discriminator == 0 and niche optimization
    let first = Example::First {
      capacity: 9001, length: 9001,
      arena: 0x20202020,
      discriminator: evil
    }

    if let Example::Second{ vec: bad_vec } = first {
      // If the layout of Example is as I described,
      // and no optimizations occur, we should end up in here.

      // This writes 255 to address 0x20202020
      bad_vec[0] = 255;
    }
  }
So if we allowed new_unchecked to be safe, then it would be impossible to write a sound definition of Vec.
9. Guthur ◴[] No.43606488[source]
Because of cult like belief structures growing up around rust, it's clear as day for us on the outside, I see it from the evangelists in the company I work for "rust is faster and safer to develop with when compared to c++", I'm no c++ fan but it's obviously nonsense.

I feel people took the comparison of rust to c and extrapolated to c++ which is blatantly disingenuous.

replies(2): >>43607202 #>>43608104 #
10. rcxdude ◴[] No.43607183{4}[source]
Yeah, anything can (and should) be marked unsafe if it could lead to memory safety problems. And so if it potentially breaks an invariant which is relied on for memory safety, it should be marked unsafe (conversely, code should not rely on an unchecked, safe condition for memory safety). That's basically how it works, Rust has the concept of unsafe functions so that libraries can communicate to users about what can and can't be relied on to keep memory safety without manual checking. This requires a common definition of 'safe', but it then means there isn't any argument about where the bug is: if the invariant isn't enforced by the compiler in safe code, then other code should not rely on it. If it is, then the bug is in the unsafe code that broke the invariant.
11. rcxdude ◴[] No.43607202[source]
Care to explain the obvious, then? Rust is quite a lot nicer to write than C++ in my experience (and in fact, it seems like rust is most attractive to people who were already writing C++: people who still prefer C are a lot less likely to like Rust).
replies(1): >>43607234 #
12. Guthur ◴[] No.43607234{3}[source]
There is nothing attractive about c++ or rust, I really don't understand how anyone can think so, it has to be some sort of Stockholm syndrome. Think about it, before you started programming what about your experiences would make you appreciate the syntax soup of rust and c++?
replies(1): >>43607274 #
13. rcxdude ◴[] No.43607274{4}[source]
I dunno, there's not much about my previous experience that would indicate much one way or the other. I have found, though, that I tend to prefer slightly denser, heterogeneous code and syntax than average. Low-syntax languages like Haskell and Lisps make my head hurt because the code is so formless it becomes hard for me to parse, while languages with more syntax and symbols are easier (though, there is a limit, APL,k, etc, are a little far I find)
14. goku12 ◴[] No.43607855{4}[source]
It's not that either, and you are validating the GP's point. Rust has a very specific 'unsafe' keyword that every Rust developer interpret implicitly and instinctively as 'potentially memory-unsafe'. Consequently, 'safe' is interpreted as the opposite - 'guaranteed memory-safe'. Using that word as an abbreviation among Rust developers is therefore not uncommon.

However while speaking about Rust language in general, all half-decent Rust developers specify that it's about memory safety. Even the Rust language homepage has only two instances of the word - 'memory-safety' and 'thread-safety'. The accusations of sleaziness and false accusations is disingenuous at best.

15. goku12 ◴[] No.43608104[source]
The cult that I see growing online a lot are those who are invested in attacking Rust for some reason, though their arguments often indicate that they haven't even tried it. I believe that we're focusing so much on Rust evangelists that we're neglecting the other end of the zealotry spectrum - the irrational haters.

The Rust developers I meet are more interested in showing off their creations than in evangelizing the language. Even those on dedicated Rust forums are generally very receptive to other languages - you can see that in action on topics like goreleaser or Zig's comptime.

And while you have already dismissed the other commenter's experience of finding Rust nicer than C++ to program in, I would like to add that I share their experience. I have nothing against C++, and I would like to relearn it so that I can contribute to some projects I like. But the reason why I started with Rust in 2013 was because of the memory-saftey issues I was facing with C++. There are features in Rust that I find surprisingly pleasant, even with 6 additional years of experience in Python. Your opinion that Rust is unpleasant to the programmer is not universal and its detractions are not nonsense.

I appreciate the difficulty in learning Rust - especially getting past the stage of fighting the borrow checker. That's the reason why I don't promote Rust for immediate projects. However, I feel that the knowledge required to get past that stage is essential even for correct C and C++. Rust was easy for me to get started in, because of my background in digital electronics, C and C++. But once you get past that peak, Rust is full of very elegant abstractions that are similar to what's seen in Python. I know it works because I have trained js and python developers in Rust. And their feedback corroborates those assumptions about learning Rust.

16. steveklabnik ◴[] No.43612667{4}[source]
> Whether the unsafety should be blamed on the outside code that's allowed to create a 0-valued NonZero<…> or on the code that requires this purported invariant in the first place is ultimately a matter of judgment, that people may freely disagree about.

It's not, though. NonZero<T> has an invariant that a zero value is undefined behavior. Therefore, any API which allows for the ability to create one must be unsafe. This is a very straightforward case.