Compiling C to Safe Rust, Formalized

(arxiv.org)

291 points love2read | 1 comments | 20 Dec 24 23:30 UTC | HN request time: 0.3s | source

Show context

pizza234 ◴[21 Dec 24 17:55 UTC] No.42481083[source]▶

I've ported some projects to Rust (including C, where I've used C2Rust as first step), and I've drawn some conclusions.

1. Converting a C program to Rust, even if it includes unsafe code, often uncovers bugs quickly thanks to Rust’s stringent constraints (bounds checking, strict signatures, etc.).

2. automated C to Rust conversion is IMO something that will never be solved entirely, because the design of C program is fundamentally different from Rust; such conversions require a significant redesign to be made safe (of course, not all C programs are the same).

3. in some cases, it’s plain impossible to port a program from C to Rust while preserving the exact semantics, because unsafety can be inherent in the design.

That said, tooling is essential to porting, and as tools continue to evolve, the process will become more streamlined.

replies(2): >>42481307 #>>42482340 #

LPisGood ◴[21 Dec 24 18:35 UTC] No.42481307[source]▶

>>42481083 #

>because unsafety can be inherent in the design

I agree in principle, and I have strong feelings based on my experience that this is the case, but I think it would be illustrative to have some hard examples in mind. Does anyone know any simple cases to ground this discussion in?

replies(2): >>42481571 #>>42481575 #

nuancebydefault ◴[21 Dec 24 19:16 UTC] No.42481571[source]▶

>>42481307 #

Suppose it is a dll that has exported functions returning or accepting unsafe strings. No way to make it safe without changing the API.

replies(1): >>42482125 #

tatref ◴[21 Dec 24 20:38 UTC] No.42482125[source]▶

>>42481571 #

In Rust, there is no unsafe String, only blocks of code can be unsafe, no?

replies(1): >>42482331 #

1. whytevuhuni ◴[21 Dec 24 21:12 UTC] No.42482331[source]▶

>>42482125 #

They likely mean a char* pointer to a null-terminated string, or a char* pointer and a length, as is usual for C.

If Rust was forced to expose such an API (to be on par with C's old API), it would have to use `*const u8` in its signature. Converting that to something that can be used in Rust is unsafe.

Even once converted to &[u8], it now has to deal with non-UTF8 inputs throughout its whole codebase, which is a lot more inconvenient. A lot of methods, like .split_ascii_whitespace, are missing on &[u8]. A lot of libraries won't take anything but a &str.

Or they might be tempted to convert such an input to a String, in which case the semantics will differ (it will now panic on non-UTF8 inputs).

↑