If you manage to construct an invalid char from an invalid string or any other way, you can defeat the niche optimization code and accidentally create yourself an unsound transmute, which is game over for soundness.
If you manage to construct an invalid char from an invalid string or any other way, you can defeat the niche optimization code and accidentally create yourself an unsound transmute, which is game over for soundness.
What does "undefined behavior" mean without a spec? Wouldn't the behavior rustc produces today be de-facto defined behavior? It seems like the contention is violating some transmute constraint, but does this not result in reproducible runtime behavior? In what context are you framing "soundness"?
EDIT: I'm honestly befuddled why anyone would downvote this. I certainly don't think this is detracting from the conversation at all—how can you understand the semantics of the above comment without understanding what the intended meaning of "undefined behavior" or "soundness" is?
You don't need a full language spec to declare something UB. And, arguably, from the compiler correctness perspective, there is no fundamental difference between walls of prose in the C/C++ "spec" and the "informal spec" currently used by Rust. (Well, there is the CompCert exception, but it's quite far from the mainstream compilers in many regards)
Incorrect with respect to an assumption furnished where? Your sibling comment mentions RFCs—is this behavior tied to some kind of documented expectation?
> A simpler example is `Option<NonZeroU8>`, the compiler assumes that `NonZeroU8` can never contain 0, thus it can use 0 as value for `None`. Now, if you take a reference to the inner `NonZeroU8` stored in `Some` and write 0 to it, you changed `Some` to `None`, while other optimizations may rely on the assumption that references to the content of `Some` can not flip the enum variant to `None`.
That seems to be the intended behavior, unless I'm reading incorrectly. Why else would you write a 0 to it? Also, does this not require using the `unsafe` keyword? So is tricking the compiler into producing the behavior you described not the expected and intended behavior?
In the definition of the `NonZeroU8` type. Or in a more practical terms, in LLVM, when we generate LLVM IR we communicate this property to LLVM and it in turn uses it to apply optimizations to our code.
>Also, does this not require using the `unsafe` keyword?
Yes, it requires `unsafe` and the point is that writing 0 to `NonZeroU8` is UB since it breaks the locality principle critical for correctness of optimizations. Applying just one incorrect (because of the broken assumption) optimization together with numerous other (correct) optimizations can easily lead to very surprising results, which are practically impossible to predict and debug. This is why it's considered such anathema to have UB in code, since having UB in one place may completely break code somewhere far away.