←back to thread

146 points returningfory2 | 1 comments | | HN request time: 0s | source
Show context
mmastrac ◴[] No.43645485[source]
This is a great way to see why invalid UTF-8 strings and unicode chars cause undefined behaviour in Rust. `char` is a special integer type, known to have a valid range which is a sub-range of its storage type. Outside of dataless enums, this is the only datatype with this behaviour (EDIT: I neglected NonZero<...>/NonZeroXXX and some other zero-niche types).

If you manage to construct an invalid char from an invalid string or any other way, you can defeat the niche optimization code and accidentally create yourself an unsound transmute, which is game over for soundness.

replies(5): >>43645776 #>>43645961 #>>43646463 #>>43646643 #>>43651356 #
imtringued ◴[] No.43651356[source]
Fine, I'll take the 4 byte hit for security and safety critical software then.

Edit: In retrospect, the optimization doesn't actually cause any security or safety problems, because unsafe code can break any invariant, including an enum with a separated tag and value. The particular memory layout of the enum is irrelevant.

replies(1): >>43651820 #
1. tialaramex ◴[] No.43651820[source]
Exactly. The correct C for working with what may or may not be a valid file descriptor produces the same machine code as the Rust with Option<OwnedFd>. But, if you mess this up and forget to handle an invalid descriptor the C compiler has no idea and your bug goes unnoticed whereas the Rust won't compile because None isn't Some(OwnedFd).