Compiling C to Safe Rust, Formalized

(arxiv.org)

291 points love2read | 1 comments | 20 Dec 24 23:30 UTC | HN request time: 0.195s | source

Show context

pizlonator ◴[21 Dec 24 01:12 UTC] No.42476714[source]▶

Compiling a tiny subset of C, that is. It might be so tiny as to be useless in practice.

I have low hopes for this kind of approach; it’s sure to hit the limits of what’s possible with static analysis of C code. Also, choosing Rust as the target makes the problem unnecessarily hard because Rust’s ownership model is so foreign to how real C programs work.

replies(4): >>42476809 #>>42476961 #>>42477085 #>>42477236 #

pornel ◴[21 Dec 24 02:08 UTC] No.42476961[source]▶

>>42476714 #

Rust's ownership model is close enough for translating C. It's just more explicit and strongly typed, so the translation needs to figure out what a more free-form C code is trying to do, and map that to Rust's idioms.

For example, C's buffers obviously have lengths, but in C the length isn't explicitly tied to a pointer, so the translator has to deduce how the C program tracks the length to convert that into a slice. It's non-trivial even if the length is an explicit variable, and even trickier if it's calculated or changes representations (e.g. sometimes used in the form of one-past-the-end pointer).

Other C patterns like `bool should_free_this_pointer` can be translated to Rust's enum of `Owned`/`Borrowed`, but again it requires deducing which allocation is tied to which boolean, and what's the true safe scope of the borrowed variant.

replies(4): >>42477145 #>>42477151 #>>42477477 #>>42477822 #

pizlonator ◴[21 Dec 24 02:47 UTC] No.42477145[source]▶

>>42476961 #

Rust’s ownership model forbids things like doubly linked lists, which C programs use a lot.

That’s just one example of how C code is nowhere near meeting Rust’s requirements. There are lots of others.

replies(3): >>42477256 #>>42477615 #>>42482450 #

orf ◴[21 Dec 24 03:10 UTC] No.42477256[source]▶

>>42477145 #

> Rust’s ownership model forbids things like doubly linked lists, which C programs use a lot.

It’s literally in the standard library

https://doc.rust-lang.org/std/collections/struct.LinkedList....

replies(4): >>42477296 #>>42477337 #>>42477424 #>>42477565 #

singron ◴[21 Dec 24 03:33 UTC] No.42477337[source]▶

>>42477256 #

This implementation uses unsafe. You can write a linked list in safe rust (e.g. using Rc), but it probably wouldn't resemble the one you write in C.

In practice, a little unsafe is usually fine. I only bring it up since the article is about translating to safe rust.

replies(3): >>42477355 #>>42477984 #>>42479070 #

1. imtringued ◴[21 Dec 24 11:34 UTC] No.42479070[source]▶

>>42477337 #

I don't really see it as a big "owning" of Rust that a complex pointer heavy structure with runtime defined ownership cannot be checked statically. Almost every language that people use doubly linked lists in has a GC, making the discussion kind of meaningless.

So C and C++ are the exceptions to the rule, but how do they make it easy to write doubly linked lists? Obviously, the key assumption is that that the developer makes sure that node->next->prev = node->prev->next = node (Ignoring nullptr).

With this restriction, you can safely write a doubly linked list even without reference counting.

However, this isn't true on the pointer level. The prev pointers could be pointing at the elements in a completely random order. For example tail->prev = head, head->prev = second_last and so on. So that going backwards from the tail is actually going forwards again!

Then there is also the problem of having a pointer from the outside of the linked list pointing directly at a node. You would need a weak pointer, because another pointer could have requested deletion from the linked list, while you're still holding a reference.

If you wanted to support this generic datastructure, rather than the doubly linked list you have in your head, then you would need reference counting in C/C++ as well!

What this tells you, is that Rust isn't restrictive enough to enforce these memory safe contracts. Anyone with access to the individual nodes could break the contract and make the code unsafe.

↑