←back to thread

146 points returningfory2 | 1 comments | | HN request time: 0s | source
Show context
mmastrac ◴[] No.43645485[source]
This is a great way to see why invalid UTF-8 strings and unicode chars cause undefined behaviour in Rust. `char` is a special integer type, known to have a valid range which is a sub-range of its storage type. Outside of dataless enums, this is the only datatype with this behaviour (EDIT: I neglected NonZero<...>/NonZeroXXX and some other zero-niche types).

If you manage to construct an invalid char from an invalid string or any other way, you can defeat the niche optimization code and accidentally create yourself an unsound transmute, which is game over for soundness.

replies(5): >>43645776 #>>43645961 #>>43646463 #>>43646643 #>>43651356 #
timerol ◴[] No.43645961[source]
> Outside of dataless enums, this is the only datatype with this behaviour.

Note that there are non-zero integer types that can also be used in this way, like NonZeroU8 https://doc.rust-lang.org/std/num/type.NonZeroU8.html. The NULL pointer is also used as a niche, and you can create your own as well, as documented in https://www.0xatticus.com/posts/understanding_rust_niche/

replies(2): >>43646020 #>>43646076 #
mmastrac ◴[] No.43646020[source]
Ack, yeah. I forgot about those despite having used them. That's a good point and I stand corrected. Edited post above.
replies(1): >>43646436 #
1. deathanatos ◴[] No.43646436{3}[source]
I guess it depends on whether the sentence is only qualified as "integer" types, but bool is sort of the same way, no? A bool must be either 0 or 1 (false or true), or it's UB.

(And I think for much the same reason, the niche optimization. Option<bool> is 1 B.)

(And for the non-Rustaceans, the only way to get a bool to be not false or true, i.e., not 0 or 1, would be unsafe {} code. Or put differently, not having a bool be "2" is an invariant unsafe code must not violate. (IIRC, at all times, even in unsafe code.))