Most active commenters

kibwen(4)

Flattening Rust’s learning curve

(corrode.dev)

Show context

Animats ◴[14 May 25 00:22 UTC] No.43979394[source]▶

It's like reading "A Discipline of Programming", by Dijkstra. That morality play approach was needed back then, because nobody knew how to think about this stuff.

Most explanations of ownership in Rust are far too wordy. See [1]. The core concepts are mostly there, but hidden under all the examples.

    - Each data object in Rust has exactly one owner.
      - Ownership can be transferred in ways that preserve the one-owner rule.
      - If you need multiple ownership, the real owner has to be a reference-counted cell. 
        Those cells can be cloned (duplicated.)
      - If the owner goes away, so do the things it owns.

    - You can borrow access to a data object using a reference. 
      - There's a big distinction between owning and referencing.
      - References can be passed around and stored, but cannot outlive the object.
        (That would be a "dangling pointer" error).
      - This is strictly enforced at compile time by the borrow checker.

That explains the model. Once that's understood, all the details can be tied back to those rules.

[1] https://doc.rust-lang.org/book/ch04-01-what-is-ownership.htm...

replies(18): >>43979460 #>>43979907 #>>43980199 #>>43981064 #>>43981313 #>>43981587 #>>43981720 #>>43982074 #>>43982249 #>>43982619 #>>43982747 #>>43983156 #>>43984730 #>>43988460 #>>43990363 #>>43996196 #>>44008391 #>>44028129 #

1. frankie_t ◴[14 May 25 09:37 UTC] No.43982619[source]▶

>>43979394 #

Maybe it's my learning limitations, but I find it hard to follow explanations like these. I had similar feelings about encapsulation explanations: it would say I can hide information without going into much detail. Why, from whom? How is it hiding if I can _see it on my screen_.

Similarly here, I can't understand for example _who_ is the owner. Is it a stack frame? Why would a stack frame want to move ownership to its callee, when by the nature of LIFO the callee stack will always be destroyed first, so there is no danger in hanging to it until callee returns. Is it for optimization, so that we can get rid of the object sooner? Could owner be something else than a stack frame? Why can mutable reference be only handed out once? If I'm only using a single thread, one function is guaranteed to finish before the other starts, so what is the harm in handing mutable references to both? Just slap my hands when I'm actually using multiple threads.

Of course, there are reasons for all of these things and they probably are not even that hard to understand. Somehow, every time I want to get into Rust I start chasing these things and give up a bit later.

replies(7): >>43983021 #>>43983228 #>>43983276 #>>43983536 #>>43985111 #>>43988282 #>>43991211 #

2. kibwen ◴[14 May 25 10:57 UTC] No.43983021[source]▶

>>43982619 (TP) #

> Why can mutable reference be only handed out once?

Here's a single-threaded program which would exhibit dangling pointers if Rust allowed handing out multiple references (mutable or otherwise) to data that's being mutated:

    let mut v = Vec::new();
    v.push(42);
    
    // Address of first element: 0x6533c883fb10
    println!("{:p}", &v[0]);
    
    // Put something after v on the heap
    // so it can't be grown in-place
    let v2 = v.clone();
    
    v.push(43);
    v.push(44);
    v.push(45);
    // Exceed capacity and trigger reallocation
    v.push(46);
    
    // New address of first element: 0x6533c883fb50
    println!("{:p}", &v[0]);

replies(2): >>43988334 #>>43989574 #

3. dwattttt ◴[14 May 25 11:33 UTC] No.43983228[source]▶

>>43982619 (TP) #

> Why would a stack frame want to move ownership to its callee, when by the nature of LIFO the callee stack will always be destroyed first, so there is no danger in hanging to it until callee returns.

It definitely takes some getting used to, but there's absolutely times when you could want something to move ownership into a called function, and extending it would be wrong.

An example would be if it represents something you can only do once, e.g. deleting a file. Once you've done it, you don't want to be able to do it again.

4. lucozade ◴[14 May 25 11:39 UTC] No.43983276[source]▶

>>43982619 (TP) #

> _who_ is the owner. Is it a stack frame?

The owned memory may be on a stack frame or it may be heap memory. It could even be in the memory mapped binary.

> Why would a stack frame want to move ownership to its callee

Because it wants to hand full responsibility to some other part of the program. Let's say you have allocated some memory on the heap and handed a reference to a callee then the callee returned to you. Did they free the memory? Did they hand the reference to another thread? Did they hand the reference to a library where you have no access to the code? Because the answer to those questions will determine if you are safe to continue using the reference you have. Including, but not limited to, whether you are safe to free the memory.

If you hand ownership to the callee, you simply don't care about any of that because you can't use your reference to the object after the callee returns. And the compiler enforces that. Now the callee could, in theory give you back ownership of the same memory but, if it does, you know that it didn't destroy etc that data otherwise it couldn't give it you back. And, again, the compiler is enforcing all that.

> Why can mutable reference be only handed out once?

Let's say you have 2 references to arrays of some type T and you want to copy from one array to the other. Will it do what you expect? It probably will if they are distinct but what if they overlap? memcpy has this issue and "solves" it by making overlapped copies undefined. With a single mutable reference system, it's not possible to get that scenario because, if there were 2 overlapping references, you couldn't write to either of them. And if you could write to one, then the other has to be a reference (mutable or not) to some other object.

There are also optimisation opportunities if you know 2 objects are distinct. That's why C added the restrict keyword.

> If I'm only using a single thread

If you're just knocking up small scripts or whatever then a lot of this is overkill. But if you're writing libraries, large applications, multi-dev systems etc then you may be single threaded but who's confirming that for every piece of the system at all times? People are generally really rubbish at that sort of long range thinking. That's where these more automated approaches shine.

> hide information...Why, from whom?

The main reason is that you want to expose a specific contract to the rest of the system. It may be, for example, that you have to maintain invariants eg double entry book-keeping or that the sides of a square are the same length. Alternatively, you may want to specify a high level algorithm eg matrix inversion, but want it to work for lots of varieties of matrix implementation eg sparse, square. In these cases, you want your consumer to be able to use your objects, with a standard interface, without them knowing, or caring, about the detail. In other words you're hiding the implementation detail behind the interface.

5. kibwen ◴[14 May 25 12:08 UTC] No.43983536[source]▶

>>43982619 (TP) #

> Why would a stack frame want to move ownership to its callee

Rust's system of ownership and borrowing effectively lets you hand out "permissions" for data access. The owner gets the maximum permissions, including the ability to hand out references, which grant lesser permissions.

In some cases these permissions are useful for performance, yes. The owner has the permission to eagerly destroy something to instantly free up memory. It also has the permission to "move out" data, which allows you to avoid making unnecessary copies.

But it's useful for other reasons too. For example, threads don't follow a stack discipline; a callee is not guaranteed to terminate before the caller returns, so passing ownership of data sent to another thread is important for correctness.

And naturally, the ability to pass ownership to higher stack frames (from callee to caller) is also necessary for correctness.

In practice, people write functions that need the least permissions necessary. It's overwhelmingly common for callees to take references rather than taking ownership, because what they're doing just doesn't require ownership.

6. Hackbraten ◴[14 May 25 14:39 UTC] No.43985111[source]▶

>>43982619 (TP) #

I think your comment has received excellent replies. However, no one has tackled your actual question so far:

> _who_ is the owner. Is it a stack frame?

I don’t think that it’s helpful to call a stack frame the owner in the sense of the borrow checker. If the owner was the stack frame, then why would it have to borrow objects to itself? The fact that the following code doesn’t compile seems to support that:

    fn main() {
        let a: String = "Hello".to_owned();
        let b = a;
        println!("{}", a);  // error[E0382]: borrow of moved value: `a`
    }

User lucozade’s comment has pointed out that the memory where the object lives is actually the thing that is being owned. So that can’t be the owner either.

So if neither a) the stack frame nor b) the memory where the object lives can be called the owner in the Rust sense, then what is?

Could the owner be the variable to which the owned chunk of memory is bound at a given point in time? In my mental model, yes. That would be consistent with all borrow checker semantics as I have understood them so far.

Feel free to correct me if I’m not making sense.

replies(1): >>43986175 #

7. adastra22 ◴[14 May 25 16:06 UTC] No.43986175[source]▶

>>43985111 #

I believe this answer is correct. Ownership exists at the language level, not the machine level. Thinking of a part of the stack or a piece of memory as owning something isn’t correct. A language entity, like a variable, is what owns another object in rust. When that object goes at a scope, its resources are released, including all the things it owns.

replies(2): >>43992517 #>>44002676 #

8. kazinator ◴[14 May 25 19:25 UTC] No.43988282[source]▶

>>43982619 (TP) #

> Why would a stack frame want to move ownership to its callee

Happens all the time in modern programming:

callee(foo_string + "abc")

Argument expression foo_string + "abc" constructs a new string. That is not captured in any variable here; it is passed to the caller. Only the caller knows about this.

This situation can expose bugs in a run-time's GC system. If callee is something written in a low level language that is resposible for indicating "nailed" objects to the garbage collector, and it forgets to nail the argument object, GC can prematurely collect it because nothing else in the image knows about that object: only the callee. The bug won't surface in situations like callee(foo_string) where the caller still has a reference to foo_string (at least if that variable is live: has a next use).

9. kazinator ◴[14 May 25 19:32 UTC] No.43988334[source]▶

>>43983021 #

The analogous program in pretty much any modern language under the sun has no problem with this, in spite of multiple references being casually allowed.

To have a safe reference to the cell of a vector, we need a "locative" object for that, which keeps track of v, and the offset 0 into v.

replies(2): >>43989760 #>>43993972 #

10. Someone ◴[14 May 25 21:50 UTC] No.43989574[source]▶

>>43983021 #

> // Put something after v on the heap

> // so it can't be grown in-place

> let v2 = v.clone();

I doubt rust guarantees that “Put something after v on the heap” behavior.

The whole idea of a heap is that you give up control over where allocations happen in exchange for an easy way to allocate, free and reuse memory.

replies(2): >>43989757 #>>43993920 #

11. steveklabnik ◴[14 May 25 22:17 UTC] No.43989757{3}[source]▶

>>43989574 #

That’s correct.

12. steveklabnik ◴[14 May 25 22:17 UTC] No.43989760{3}[source]▶

>>43988334 #

That’s a different implementation, and one you can do in Rust too.

13. oconnor663 ◴[15 May 25 02:17 UTC] No.43991211[source]▶

>>43982619 (TP) #

> Could owner be something else than a stack frame?

Yes. There are lots of ways an object might be owned:

- a local variable on the stack

- a field of a struct or a tuple (which might itself be owned on the stack, or nested in yet another struct, or one of the other options below)

- a heap-allocating container, most commonly basic data structures like Vec or HashMap, but also including things like Box (std::unique_ptr in C++), Arc (std::shared_ptr), and channels

- a static variable -- note that in Rust these are always const-initialized and never destroyed

I'm sure there are others I'm not thinking of.

> Why would a stack frame want to move ownership to its callee, when by the nature of LIFO the callee stack will always be destroyed first

Here are some example situations where you'd "pass by value" in Rust:

- You might be dealing with "Copy" types like integers and bools, where (just like in C or C++ or Go) values are easier to work with in a lot of common cases.

- You might be inserting something into a container that will own it. Maybe the callee gets a reference to that longer-lived container in one of its other arguments, or maybe the callee is a method on a struct type that includes a container.

- You might pass ownership to another thread. For example, the main() loop in my program could listen on a socket, and for each of the connections it gets, it might spawn a worker thread to own the connection and handle it. (Using async and "tasks" is pretty much the same from an ownership perspective.)

- You might be dealing with a type that uses ownership to represent something besides just memory. For example, owning a MutexGuard gives you the ability to unlock the Mutex by dropping the guard. Passing a MutexGuard by value tells the callee "I have taken this lock, but now you're responsible for releasing it." Sometimes people also use non-Copy enums to represent fancy state machines that you have to pass around by value, to guarantee whatever property they care about about the state transitions.

14. carlmr ◴[15 May 25 06:58 UTC] No.43992517{3}[source]▶

>>43986175 #

I think it's funny how I had this kind of sort of "clear" understanding of Rust ownership from experience, and asking "why" repeatedly puts a few holes in the illusion of my understanding being clear. It's mostly familiarity of concepts from working with C++ and RAII and solving some ownership issues. It's kind of like when people ask you for the definition of a word, and you know what it means, but you also can't quite explain it.

I would say you're correct that ownership is something that only exists on the language level. Going back to the documentation: https://doc.rust-lang.org/book/ch04-01-what-is-ownership.htm...

The first part that gives a hint is this

>Rust uses a third approach: memory is managed through a system of ownership with a set of rules that the compiler checks.

This clearly means ownership is a concept in the Rust language. Defined by a set of rules checked by the compiler.

Later:

>First, let’s take a look at the ownership rules. Keep these rules in mind as we work through the examples that illustrate them:

>*Each value in Rust has an owner*.

>There can only be one owner at a time.

>*When the owner goes out of scope*, the value will be dropped.

So the owner can go out of scope and that leads to the value being dropped. At the same time each value has an owner.

So from this we gather. An owner can go out of scope, so an owner would be something that lives within a scope. A variable declaration perhaps? Further on in the text this seems to be confirmed. A variable can be an owner.

>Rust takes a different path: the memory is automatically returned once the variable that owns it goes out of scope.

Ok, so variables can own values. And borrowed variables (references) are owned by the variables they borrow from, this much seems clear. We can recurse all the way down. What about up? Who owns the variables? I'm guessing the program or the scope, which in turn is owned by the program.

So I think variables own values directly, references are owned by the variables they borrow from. All variables are owned by the program and live as long as they're in scope (again something that only exists at program level).

15. kibwen ◴[15 May 25 11:32 UTC] No.43993920{3}[source]▶

>>43989574 #

It certainly doesn't guarantee it, this is just what's needed to induce a relocation in this particular instance. But this makes Rust's ownership tracking even more important, because it would be trivial for this to "accidentally work" in something like C++, only for it to explode as soon as any future change either perturbs the heap or pushes enough items to the vec that a relocation is suddenly triggered.

16. kibwen ◴[15 May 25 11:41 UTC] No.43993972{3}[source]▶

>>43988334 #

> The analogous program in pretty much any modern language under the sun has no problem with this, in spite of multiple references being casually allowed.

And then every time the underlying data moves, the program's runtime either needs to do a dynamic lookup of all pointers to that data and then iterate over all of them to point to the new location, or otherwise you need to introduce yet another layer of indirection (or even worse, you could use linked lists). Many languages exist in domains where they don't mind paying such a runtime cost, but Rust is trying to be as fast as possible while being as memory-safe as possible.

In other words, pick your poison:

1. Allow mutable data, but do not support direct interior references.

2. Allow interior references, but do not allow mutable data.

3. Allow mutable data, but only allow indirect/dynamically adjusted references.

4. Allow both mutable data and direct interior references, force the author to manually enforce memory-safety.

5. Allow both mutable data and direct interior references, use static analysis to ensure safety by only allowing references to be held when mutation cannot invalidate them.

17. Animats ◴[16 May 25 07:30 UTC] No.44002676{3}[source]▶

>>43986175 #

> Ownership exists at the language level, not the machine level.

Right. That's the key here. "Move semantics" can let you move something from the stack to the heap, or the heap to the stack, provided that a lot of fussy rules are enforced. It's quite common to do this. You might create a struct on the stack, then push it onto a vector, to be appended at the end. Works fine. The data had to be copied, and the language took care of that. It also took care of preventing you from doing that if the struct isn't safely move copyable.

C++ now has "move semantics", but for legacy reasons, enforcement is not strict enough to prevent moves which should not be allowed.

↑