Compiling C to Safe Rust, Formalized

(arxiv.org)

291 points love2read | 1 comments | 20 Dec 24 23:30 UTC | HN request time: 0.259s | source

Show context

pizlonator ◴[21 Dec 24 01:12 UTC] No.42476714[source]▶

Compiling a tiny subset of C, that is. It might be so tiny as to be useless in practice.

I have low hopes for this kind of approach; it’s sure to hit the limits of what’s possible with static analysis of C code. Also, choosing Rust as the target makes the problem unnecessarily hard because Rust’s ownership model is so foreign to how real C programs work.

replies(4): >>42476809 #>>42476961 #>>42477085 #>>42477236 #

pornel ◴[21 Dec 24 02:08 UTC] No.42476961[source]▶

>>42476714 #

Rust's ownership model is close enough for translating C. It's just more explicit and strongly typed, so the translation needs to figure out what a more free-form C code is trying to do, and map that to Rust's idioms.

For example, C's buffers obviously have lengths, but in C the length isn't explicitly tied to a pointer, so the translator has to deduce how the C program tracks the length to convert that into a slice. It's non-trivial even if the length is an explicit variable, and even trickier if it's calculated or changes representations (e.g. sometimes used in the form of one-past-the-end pointer).

Other C patterns like `bool should_free_this_pointer` can be translated to Rust's enum of `Owned`/`Borrowed`, but again it requires deducing which allocation is tied to which boolean, and what's the true safe scope of the borrowed variant.

replies(4): >>42477145 #>>42477151 #>>42477477 #>>42477822 #

pizlonator ◴[21 Dec 24 02:47 UTC] No.42477145[source]▶

>>42476961 #

Rust’s ownership model forbids things like doubly linked lists, which C programs use a lot.

That’s just one example of how C code is nowhere near meeting Rust’s requirements. There are lots of others.

replies(3): >>42477256 #>>42477615 #>>42482450 #

1. Rusky ◴[21 Dec 24 21:25 UTC] No.42482450[source]▶

>>42477145 #

Rust's ownership model has two aspects:

- A dynamic part specifies what is actually allowed, and totally supports doubly linked lists and other sorts of cyclic mutable data structures.

- A static part conservatively approximates the dynamic part, but is still flexible enough to express the interfaces and usage of these data structures even if it can't check their implementations.

This is the important difference over traditional static analysis of C. It enables `unsafe` library code to bridge the dynamic and static rules in a modular way, so that that extensions to the static rules can be composed safely, downstream of their implementation.

Rust's strategy was never for the built-in types like `&mut T`/`&T` to be a complete final answer to safety. It actually started with a lot more built-in tools to support these patterns, and slowly moved them out of the compiler and runtime and into library code, as it turned out their APIs could still be expressed safely with a smaller core.

Something like Fil-C would be totally complementary to this approach- not only could the dynamic checking give you stronger guarantees about the extensions to the static rules, but the static checks could give the compiler more leverage to elide the dynamic checks.

↑