Compiling C to Safe Rust, Formalized

(arxiv.org)

291 points love2read | 1 comments | 20 Dec 24 23:30 UTC | HN request time: 0.212s | source

Show context

pizlonator ◴[21 Dec 24 01:12 UTC] No.42476714[source]▶

Compiling a tiny subset of C, that is. It might be so tiny as to be useless in practice.

I have low hopes for this kind of approach; it’s sure to hit the limits of what’s possible with static analysis of C code. Also, choosing Rust as the target makes the problem unnecessarily hard because Rust’s ownership model is so foreign to how real C programs work.

replies(4): >>42476809 #>>42476961 #>>42477085 #>>42477236 #

pornel ◴[21 Dec 24 02:08 UTC] No.42476961[source]▶

>>42476714 #

Rust's ownership model is close enough for translating C. It's just more explicit and strongly typed, so the translation needs to figure out what a more free-form C code is trying to do, and map that to Rust's idioms.

For example, C's buffers obviously have lengths, but in C the length isn't explicitly tied to a pointer, so the translator has to deduce how the C program tracks the length to convert that into a slice. It's non-trivial even if the length is an explicit variable, and even trickier if it's calculated or changes representations (e.g. sometimes used in the form of one-past-the-end pointer).

Other C patterns like `bool should_free_this_pointer` can be translated to Rust's enum of `Owned`/`Borrowed`, but again it requires deducing which allocation is tied to which boolean, and what's the true safe scope of the borrowed variant.

replies(4): >>42477145 #>>42477151 #>>42477477 #>>42477822 #

pizlonator ◴[21 Dec 24 02:47 UTC] No.42477145[source]▶

>>42476961 #

Rust’s ownership model forbids things like doubly linked lists, which C programs use a lot.

That’s just one example of how C code is nowhere near meeting Rust’s requirements. There are lots of others.

replies(3): >>42477256 #>>42477615 #>>42482450 #

orf ◴[21 Dec 24 03:10 UTC] No.42477256[source]▶

>>42477145 #

> Rust’s ownership model forbids things like doubly linked lists, which C programs use a lot.

It’s literally in the standard library

https://doc.rust-lang.org/std/collections/struct.LinkedList....

replies(4): >>42477296 #>>42477337 #>>42477424 #>>42477565 #

singron ◴[21 Dec 24 03:33 UTC] No.42477337[source]▶

>>42477256 #

This implementation uses unsafe. You can write a linked list in safe rust (e.g. using Rc), but it probably wouldn't resemble the one you write in C.

In practice, a little unsafe is usually fine. I only bring it up since the article is about translating to safe rust.

replies(3): >>42477355 #>>42477984 #>>42479070 #

1. oconnor663 ◴[21 Dec 24 07:02 UTC] No.42477984[source]▶

>>42477337 #

More important than whether you use a little unsafe or a lot, is whether you can find a clean boundary above which everything can be safe. Something like a hash function or a block cipher can be piles and piles of assembly under the covers, but since the API is bytes-in-bytes-out, the safety concerns are minimal. On the other hand, memory-mapping a file is just one FFI function call, but the uncontrollable mutability of the whole thing tends to poison everything above it with unsafety.

↑