Most active commenters

pjmlp(5)
josephg(3)
int_19h(3)
steveklabnik(3)

Popular/hot comments

>>43605244 #

←back to thread

Pitfalls of Safe Rust

(corrode.dev)

Show context

nerdile ◴[06 Apr 25 17:56 UTC] No.43603402[source]▶

>>43585742 (OP) #

Title is slightly misleading but the content is good. It's the "Safe Rust" in the title that's weird to me. These apply to Rust altogether, you don't avoid them by writing unsafe Rust code. They also aren't unique to Rust.

A less baity title might be "Rust pitfalls: Runtime correctness beyond memory safety."

replies(1): >>43603739 #

burakemir ◴[06 Apr 25 18:33 UTC] No.43603739[source]▶

>>43603402 #

It is consistent with the way the Rust community uses "safe": as "passes static checks and thus protects from many runtime errors."

This regularly drives C++ programmers mad: the statement "C++ is all unsafe" is taken as some kind of hyperbole, attack or dogma, while the intent may well be to factually point out the lack of statically checked guarantees.

It is subtle but not inconsistent that strong static checks ("safe Rust") may still leave the possibility of runtime errors. So there is a legitimate, useful broader notion of "safety" where Rust's static checking is not enough. That's a bit hard to express in a title - "correctness" is not bad, but maybe a bit too strong.

replies(5): >>43603865 #>>43603876 #>>43603929 #>>43604918 #>>43605986 #

whytevuhuni ◴[06 Apr 25 18:45 UTC] No.43603865[source]▶

>>43603739 #

No, the Rust community almost universally understands "safe" as referring to memory safety, as per Rust's documentation, and especially the unsafe book, aka Rustonomicon [1]. In that regard, Safe Rust is safe, Unsafe Rust is unsafe, and C++ is also unsafe. I don't think anyone is saying "C++ is all unsafe."

You might be talking about "correct", and that's true, Rust generally favors correctness more than most other languages (e.g. Rust being obstinate about turning a byte array into a file path, because not all file paths are made of byte arrays, or e.g. the myriad string types to denote their semantics).

[1] https://doc.rust-lang.org/nomicon/meet-safe-and-unsafe.html

replies(3): >>43604067 #>>43604190 #>>43604779 #

1. ampere22 ◴[06 Apr 25 20:37 UTC] No.43604779[source]▶

>>43603865 #

If a C++ developer decides to use purely containers and smart pointers when starting a new project, how are they going to develop unsafe code?

Containers like std::vector and smart pointers like std::unique_ptr seem to offer all of the same statically checked guarantees that Rust does.

I just do not see how Rust is a superior language compared to modern C++

replies(5): >>43604855 #>>43604887 #>>43604895 #>>43607240 #>>43612736 #

2. criddell ◴[06 Apr 25 20:49 UTC] No.43604855[source]▶

>>43604779 (TP) #

C++ devs need to understand the difference between:

   Vec1[0];
   Vec1.at(0);

Even the at method isn’t statically checked. If you want static checking, you probably need to use std::array.

replies(1): >>43608672 #

3. ddulaney ◴[06 Apr 25 20:54 UTC] No.43604887[source]▶

>>43604779 (TP) #

Unfortunately, operator[] on std::vector is inherently unsafe. You can potentially try to ban it (using at() instead), but that has its own problems.

There’s a great talk by Louis Brandy called “Curiously Recurring C++ Bugs at Facebook” [0] that covers this really well, along with std::map’s operator[] and some more tricky bugs. An interesting question to ask if you try to watch that talk is: How does Rust design around those bugs, and what trade offs does it make?

[0]: https://m.youtube.com/watch?v=lkgszkPnV8g

replies(1): >>43605244 #

4. phoenk ◴[06 Apr 25 20:55 UTC] No.43604895[source]▶

>>43604779 (TP) #

The commonly given response to this question is two-fold, and both parts have a similar root cause: smart pointers and "safety" being bolted-on features developed decades after the fact. The first part is the standard library itself. You can put your data in a vec for instance, but if you want to iterate, the standard library gives you back a regular pointer that can be dereferenced unchecked, and is intended to be invalidated while still held in the event of a mutation. The second part is third party libraries. You may be diligent about managing memory with smart pointers, but odds are any library you might use probably wants a dumb pointer, and whether or not it assumes responsibility for freeing that pointer later is at best documented in natural language.

This results in an ecosystem where safety is opt-in, which means in practice most implementations are largely unsafe. Even if an individual developer wants to proactive about safety, the ecosystem isn't there to support them to the same extent as in rust. By contrast, safety is the defining feature of the rust ecosystem. You can write code and the language and ecosystem support you in doing so rather than being a barrier you have to fight back against.

replies(2): >>43604997 #>>43605386 #

5. josephg ◴[06 Apr 25 21:12 UTC] No.43604997[source]▶

>>43604895 #

Yep. Safe rust also protects you from UB resulting from incorrect multi-threaded code.

In C++ (and C#, Java, Go and many other “memory safe languages”), it’s very easy to mess up multithreaded code. Bugs from multithreading are often insanely difficult to reproduce and debug. Rust’s safety guardrails make many of these bugs impossible.

This is also great for performance. C++ libraries have to decide whether it’s better to be thread safe (at a cost of performance) or to be thread-unsafe but faster. Lots of libraries are thread safe “just in case”. And you pay for this even when your program / variable is single threaded. In rust, because the compiler prevents these bugs, libraries are free to be non-threadsafe for better performance if they want - without worrying about downstream bugs.

replies(1): >>43606061 #

6. ampere22 ◴[06 Apr 25 21:49 UTC] No.43605244[source]▶

>>43604887 #

Thank you for sharing. Seems I still have more to learn!

It seems the bug you are flagging here is a null reference bug - I know Rust has Optional as a workaround for “null”

Are there any pitfalls in Rust when Optional does not return anything? Or does Optional close this bug altogether? I saw Optional pop up in Java to quiet down complaints on null pointer bugs but remained skeptical whether or not it was better to design around the fact that there could be the absence of “something” coming into existence when it should have been initialized

replies(3): >>43605404 #>>43606020 #>>43612881 #

7. int_19h ◴[06 Apr 25 22:11 UTC] No.43605386[source]▶

>>43604895 #

The standard library doesn't give you a regular pointer, though (unless you specifically ask for that). It gives you an iterator, which is pointer-like, but exists precisely so that other behaviors can be layered. There's no reason why such an iterator can't do bounds checking etc, and, indeed, in most C++ implementations around, iterators do make such checks in debug builds.

The problem, rather, is that there's no implementation of checked iterators that's fast enough for release build. That's largely a culture issue in C++ land; it could totally be done.

replies(1): >>43608657 #

8. int_19h ◴[06 Apr 25 22:13 UTC] No.43605404{3}[source]▶

>>43605244 #

It's not so much Optional that deals with the bug. It's the fact that you can't just use a value that could possibly be null in a way that would break at runtime if it is null - the type system won't allow you, forcing an explicit check. Different languages do this in different ways - e.g. in C# and TypeScript you still have null, but references are designated as nullable or non-nullable - and an explicit comparison to null changes the type of the corresponding variable to indicate that it's not null.

replies(1): >>43606485 #

9. ddulaney ◴[06 Apr 25 23:46 UTC] No.43606020{3}[source]▶

>>43605244 #

Rust’s Optional does close this altogether, yes. All (non-unsafe) users of Optional are required to have some defined behavior in both cases. This is enforced by the language in the match statement, and most of the “member functions” on Optional use match under the hood.

This is an issue with the C++ standardization process as much as with the language itself. AIUI when std::optional (and std::variant, which has similar issues) were defined, there was a push to get new syntax into the language itself that would’ve been similar to Rust’s match statement.

However, that never made it through the standardization process, so we ended up with “library variants” that are not safe in all circumstances.

Here’s one of the papers from that time, though there are many others arguing different sides: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/p00...

10. spookie ◴[06 Apr 25 23:53 UTC] No.43606061{3}[source]▶

>>43604997 #

I've written some multithreaded rust and I've gotta say, this does not reflect my experience. It's just as easy to make a mess, as in any other language.

replies(2): >>43606447 #>>43609320 #

11. josephg ◴[07 Apr 25 01:09 UTC] No.43606447{4}[source]▶

>>43606061 #

Me too. I agree that its not a bed of roses - and all the memory safety guarantees in the world don't stop you from making a huge mess. But I haven't run into any of the impossible-to-debug crashes / heisenbugs in my multithreaded rust code that I have in C/C++.

I think rust delivers on its safety promise.

replies(1): >>43608646 #

12. tialaramex ◴[07 Apr 25 01:14 UTC] No.43606485{4}[source]▶

>>43605404 #

I think sum types in general and Option<T> in particular is nicer. But the reason C# has nullability isn't that they disagree with me, it's that fundamentally the CLR has the same model as Java, all these types can be null, even though in the modern C# language you can say "No, not null that's never OK" at runtime on the CLR too bad maybe it's null.

For example if I write a C# function which takes a Goose, specifically a Goose, not a Goose? or similar - well, too bad the CLR says my C# function can be called by this obsolete BASIC code which has no idea what a Goose is, but it's OK because it passed null. If my code can't cope with a null? Too bad, runtime exception.

In real C# apps written by an in-house team this isn't an issue, Ollie may not be the world's best programmer but he's not going to figure out how to explicity call this API with a null, he's going to be stopped by the C# compiler diagnostic saying it needs a Goose, and worst case he says "Hey tialaramex, why do I need a Goose?". But if you make stuff that's used by people you've never met it can be an issue.

replies(1): >>43607058 #

13. dwattttt ◴[07 Apr 25 02:37 UTC] No.43607058{5}[source]▶

>>43606485 #

> For example if I write a C# function which takes a Goose, specifically a Goose, not a Goose? or similar - well, too bad the CLR says my C# function can be called by this obsolete BASIC code which has no idea what a Goose is, but it's OK because it passed null. If my code can't cope with a null? Too bad, runtime exception.

That's actually no different to Rust still; if you try, you can pass a 0 value to a function that only accepts a reference (i.e. a non-zero pointer), be it by unsafe, or by assembly, or whatever.

Disagreeing with another comment on this thread, this isn't a matter of judgement around "who's bug is it? Should the callee check for null, or the caller?". Rust's win is by clearly articulating that the API takes non-zero, so the caller is buggy.

As you mention it can still be an issue, but there should be no uncertainty around who's mistake it is.

replies(2): >>43607931 #>>43613809 #

14. rcxdude ◴[07 Apr 25 03:10 UTC] No.43607240[source]▶

>>43604779 (TP) #

To add on another pitfall: iterator invalidation. In C++ you generally aren't allowed to modify a container while you're iterating through it, because it may re-allocate the memory and leave dangling pointers in the iterator, but the compiler doesn't check this. Rust's lifetime analysis closes this particular issue.

(Basically, the 'newer' C++ features do help a little with memory safety, but it's still fairly easy to trip up even if you restrict your own code from 'dangerous' operations. It's not at all obvious that a useful memory-safe subset of C++ exists. Even if you were to re-write the standard library to correct previous mistakes, it seems likely you would still need something like the borrow checker once you step beyond the surface level).

15. ◴[07 Apr 25 05:08 UTC] No.43607931{6}[source]▶

>>43607058 #

16. pjmlp ◴[07 Apr 25 07:05 UTC] No.43608646{5}[source]▶

>>43606447 #

Most likely because it all multi-threaded code access in-memory data structures, internal to the process memory, the only scenario in multi-threaded systems that Rust has some support for.

Make those threads access external resources simultaneously, or memory mapped to external writers, and there is no support from Rust type system.

replies(2): >>43609957 #>>43616250 #

17. pjmlp ◴[07 Apr 25 07:06 UTC] No.43608657{3}[source]▶

>>43605386 #

VC++ checked iterators are fast enough for my use cases, not everyone is trying to win a F1 race when having to deal with C++ written code.

18. pjmlp ◴[07 Apr 25 07:08 UTC] No.43608672[source]▶

>>43604855 #

Many also need to learn that there are configuration settings on their compilers that make those two cases the same, enabling bounds checking on operator[]().

replies(1): >>43610249 #

19. ViewTrick1002 ◴[07 Apr 25 08:57 UTC] No.43609320{4}[source]▶

>>43606061 #

Safe rust prevents you from writing data races. All concurrent access is forced to be guarded by synchronization primitives. Eliminating an entire class of bugs.

You can still create a mess from logical race conditions, deadlocks and similar bugs, but you won’t get segfaults because you after the tenth iteration forgot to diligently manage the mutex.

Personally I feel that in rust I can mostly reason locally, compared to say Go when I need to understand a global context whenever I touch multithreaded code.

20. sksxihve ◴[07 Apr 25 11:06 UTC] No.43609957{6}[source]▶

>>43608646 #

What mainstream language has type system features that make multi-threaded access to external resources safe?

Managing something like that is a design decision of the software being implemented not a responsibility of the language itself.

replies(1): >>43610456 #

21. criddell ◴[07 Apr 25 11:54 UTC] No.43610249{3}[source]▶

>>43608672 #

Sure, but at() is guaranteed to throw an exception and operator[] can throw an exception when you go out of bounds. C++26 is tweaking this, but it's still going to differ implementation to implementation.

At least that's my understanding of the situation. Happy to be corrected though.

22. pjmlp ◴[07 Apr 25 12:20 UTC] No.43610456{7}[source]▶

>>43609957 #

None, however the fearless concurrency sales pitch usually leaves that scenario as footnote.

23. steveklabnik ◴[07 Apr 25 15:40 UTC] No.43612736[source]▶

>>43604779 (TP) #

Here's a program that uses only std::unique_ptr:

  #include<iostream>
  #include<memory>
  
  int main() {

      std::unique_ptr<int> null_ptr;
    
      std::cout << *null_ptr << std::endl; // Undefined behavior
  }

Clang 20 compiles this code with `-std=c++23 -Wall -Werror`. If you add -fsanitize=undefined, it will print

  ==1==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address 0x000000000000 (pc 0x55589736d8ea bp 0x7ffe04a94920 sp 0x7ffe04a948d0 T1)

or similar.

24. steveklabnik ◴[07 Apr 25 15:56 UTC] No.43612881{3}[source]▶

>>43605244 #

> whether or not it was better to design around the fact that there could be the absence of “something” coming into existence when it should have been initialized

So this is actually why "no null, but optional types" is such a nice spot in the programming language design space. Because by default, you are making sure it "should have been initialized," that is, in Rust:

  struct Point {
      x: i32,
      y: i32,
  }

You know that x and y can never be null. You can't construct a Point without those numbers existing.

By contrast, here's a point where they could be:

  struct Point {
      x: Option<i32>,
      y: Option<i32>,
  }

You know by looking at the type if it's ever possible for something to be missing or not.

> Are there any pitfalls in Rust when Optional does not return anything?

So, Rust will require you to handle both cases. For example:

    let x: Option<i32> = Some(5); // adding the type for clarity

    dbg!(x + 7); // try to debug print the result

This will give you a compile-time error:

     error[E0369]: cannot add `{integer}` to `Option<i32>`
       --> src/main.rs:4:12
        |
    4   |     dbg!(x + 7); // try to debug print the result
        |          - ^ - {integer}
        |          |
        |          Option<i32>
        |
    note: the foreign item type `Option<i32>` doesn't implement `Add<{integer}>`

It's not so much "pitfalls" exactly, but you can choose to do the same thing you'd get in a language with null: you can choose not to handle that case:

    let x: Option<i32> = Some(5); // adding the type for clarity
    
    let result = match x {
        Some(num) => num + 7,
        None => panic!("we don't have a number"),
    };

    dbg!(result); // try to debug print the result

This will successfully print, but if we change `x` to `None`, we'll get a panic, and our current thread dies.

Because this pattern is useful, there's a method on Option called `unwrap()` that does this:

  let result = x.unwrap();

And so, you can argue that Rust doesn't truly force you to do something different here. It forces you to make an active choice, to handle it or not to handle it, and in what way. Another option, for example, is to return a default value. Here it is written out, and then with the convenience method:

    let result = match x {
        Some(num) => num + 7,
        None => 0,
    };

  let result = x.unwrap_or(0);

And you have other choices, too. These are just two examples.

--------------

But to go back to the type thing for a bit, knowing statically you don't have any nulls allows you to do what some dynamic language fans call "confident coding," that is, you don't always need to be checking if something is null: you already know it isn't! This makes code more clear, and more robust.

If you take this strategy to its logical end, you arrive at "parse, don't validate," which uses Haskell examples but applies here too: https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-va...

25. int_19h ◴[07 Apr 25 17:22 UTC] No.43613809{6}[source]▶

>>43607058 #

The difference is that C# has well-defined behavior in this case - a non-nullable notification is really "not-nullable-ish", and there are cases even in the language itself where code without any casts in it will observe null values of such types. It's just a type system hole they allow for convenience and back-compat.

OTOH with Rust you'd have to violate its safety guarantees, which if I understand correctly triggers UB.

replies(1): >>43614953 #

26. steveklabnik ◴[07 Apr 25 19:26 UTC] No.43614953{7}[source]▶

>>43613809 #

> which if I understand correctly triggers UB.

Yes, your parent's example would be UB, and require unsafe.

27. josephg ◴[07 Apr 25 21:40 UTC] No.43616250{6}[source]▶

>>43608646 #

> Make those threads access external resources simultaneously, or memory mapped to external writers, and there is no support from Rust type system.

I don’t think that’s true.

External thread-unsafe resources like that are similar in a way to external C libraries: they’re sort of unsafe by default. It’s possible to misuse them to violate rust’s safe memory guarantees. But it’s usually also possible to create safe struct / API wrappers around them which prevent misuse from safe code. If you model an external, thread-unsafe resource as a struct that isn’t Send / Sync then you’re forced to use the appropriate threading primitives to interact with the resource from multiple threads. When you use it like that, the type system can be a great help. I think the same trick can often be done for memory mapped resources - but it might come down to the specifics.

If you disagree, I’d love to see an example.

replies(1): >>43618901 #

28. pjmlp ◴[08 Apr 25 06:29 UTC] No.43618901{7}[source]▶

>>43616250 #

Shared memory, shared files, hardware DMA, shared database connections to the same database.

You can control safety as much as you feel like from Rust side, there is no way to validate that the data coming into the process memory doesn't get corrupted by the other side, while it is being read from Rust side.

Unless access is built in a way that all parties accessing the resource have to play by the same validation rules before writting into it, OS IPC resources like shared mutexes, semaphores, critical section.

The kind of typical readers-writers algorithms in distributed computing.

↑