Compiling C to Safe Rust, Formalized

(arxiv.org)

291 points love2read | 1 comments | 20 Dec 24 23:30 UTC | HN request time: 0.209s | source

Show context

pizza234 ◴[21 Dec 24 17:55 UTC] No.42481083[source]▶

I've ported some projects to Rust (including C, where I've used C2Rust as first step), and I've drawn some conclusions.

1. Converting a C program to Rust, even if it includes unsafe code, often uncovers bugs quickly thanks to Rust’s stringent constraints (bounds checking, strict signatures, etc.).

2. automated C to Rust conversion is IMO something that will never be solved entirely, because the design of C program is fundamentally different from Rust; such conversions require a significant redesign to be made safe (of course, not all C programs are the same).

3. in some cases, it’s plain impossible to port a program from C to Rust while preserving the exact semantics, because unsafety can be inherent in the design.

That said, tooling is essential to porting, and as tools continue to evolve, the process will become more streamlined.

replies(2): >>42481307 #>>42482340 #

int_19h ◴[21 Dec 24 21:13 UTC] No.42482340[source]▶

>>42481083 #

> automated C to Rust conversion is IMO something that will never be solved entirely

Automated conversion of C to safe fast Rust is hard. Automated conversion of C to safe Rust in general is much easier - you just need to represent memory as an array, and treat pointers as indices into said array. Now you can do everything C can do - unchecked pointer arithmetic, unions etc - without having to fight the borrow checker. Semantics fully preserved. Similar techniques have been used for C-to-Java for a long time now.

Of course, the value of such a conversion is kinda dubious. You basically end up with something like C compiled to wasm, but even slower, and while the resulting code is technically "safe", it is still susceptible to issues buffer overflows producing invalid state, dangling pointers allowing access to data in contexts where it shouldn't be allowed etc.

replies(2): >>42482663 #>>42486313 #

1. zozbot234 ◴[21 Dec 24 21:58 UTC] No.42482663[source]▶

>>42482340 #

You can do a lot better than that. You can treat memory ranges coming from separate allocations as distinct segments, and pointers as tuples of a segment ID and a linear offset within the segment. This is essentially what systems like CHERI are built on, and how C and C++ are implemented on segmented architectures like the 8086 and 80286. The C standard includes a somewhat limited notion of "objects" that's intended to support this exact case.

↑