←back to thread

146 points returningfory2 | 4 comments | | HN request time: 0.239s | source
Show context
mmastrac ◴[] No.43645485[source]
This is a great way to see why invalid UTF-8 strings and unicode chars cause undefined behaviour in Rust. `char` is a special integer type, known to have a valid range which is a sub-range of its storage type. Outside of dataless enums, this is the only datatype with this behaviour (EDIT: I neglected NonZero<...>/NonZeroXXX and some other zero-niche types).

If you manage to construct an invalid char from an invalid string or any other way, you can defeat the niche optimization code and accidentally create yourself an unsound transmute, which is game over for soundness.

replies(5): >>43645776 #>>43645961 #>>43646463 #>>43646643 #>>43651356 #
NoTeslaThrow ◴[] No.43645776[source]
> This is a great way to see why invalid UTF-8 strings and unicode chars cause undefined behaviour in Rust.

What does "undefined behavior" mean without a spec? Wouldn't the behavior rustc produces today be de-facto defined behavior? It seems like the contention is violating some transmute constraint, but does this not result in reproducible runtime behavior? In what context are you framing "soundness"?

EDIT: I'm honestly befuddled why anyone would downvote this. I certainly don't think this is detracting from the conversation at all—how can you understand the semantics of the above comment without understanding what the intended meaning of "undefined behavior" or "soundness" is?

replies(5): >>43645920 #>>43645923 #>>43646838 #>>43647769 #>>43648876 #
mmastrac ◴[] No.43645920[source]
> What does "undefined behavior" mean without a spec?

While not as formalized as C/C++, Rust's "spec" exists in the reference, nomicon, RFCs and documentation. I believe that there is a desire for a spec, but enough resources exist that the community can continue without one with no major negative side-effects (unless you want to re-implement the compiler from scratch, I suppose).

The compiler may exploit "lack of UB" for optimizations, e.g., using a known-invalid value as a niche, optimizing away safety checks, etc.

> Wouldn't the behavior rustc produces today be de-facto defined behavior?

Absolutely not. Bugs are fixed and the behaviour changes. Not often, but it happens.

This post probably answers a lot of your reply as well: https://jacko.io/safety_and_soundness.html

replies(1): >>43645935 #
NoTeslaThrow ◴[] No.43645935[source]
EDIT:

> While not as formalized as C/C++, Rust's "spec" exists in the reference, nomicon, RFCs and documentation. I believe that there is a desire for a spec, but enough resources exist that the community can continue without one with no major negative side-effects (unless you want to re-implement the compiler from scratch, I suppose).

Thank you, I was unaware that this is a thing.

> This post probably answers a lot of your reply as well: https://jacko.io/safety_and_soundness.html

This appears to also rely on "undefined behavior" as a meaningful term.

replies(1): >>43646003 #
mmastrac ◴[] No.43646003[source]
> This appears to also rely on "undefined behavior" as a meaningful term.

I assure you it is a meaningful term:

https://llvm.org/docs/UndefinedBehavior.html

replies(1): >>43646034 #
NoTeslaThrow ◴[] No.43646034[source]
Ok, but in the context of the language at hand? Presumably the IR has distinct semantics from the language that generates the IR. Does UB just strictly resolve to LLVM UB? That's very reasonable!
replies(2): >>43646173 #>>43646377 #
fc417fc802 ◴[] No.43646173[source]
No. UB is a term of art here.

Consider a hypothetical non-LLVM full reimplementation of the compiler. If it optimizes and there are invalid assumptions then there is likely UB. LLVM isn't involved in that case though.

replies(1): >>43646570 #
NoTeslaThrow ◴[] No.43646570{3}[source]
> If it optimizes and there are invalid assumptions then there is likely UB.

It's the distinguishing from bugs that concerns me.

replies(3): >>43646620 #>>43646899 #>>43649091 #
fc417fc802 ◴[] No.43646620{4}[source]
I don't follow. Isn't UB a subset of bugs or alternatively a follow on consequence that causes observable behavior to further deviate?
replies(1): >>43647058 #
1. NoTeslaThrow ◴[] No.43647058{5}[source]
> Isn't UB a subset of bugs

No, not at all. UB can still produce correct and expected results for the entire input domain.

replies(3): >>43647136 #>>43648979 #>>43649194 #
2. fc417fc802 ◴[] No.43647136[source]
If I have a bug that only triggers between 9 and 10 am EST on Mondays that is still a bug, no? Now extend that to "rand(1.0) < 0.01". Now extend that to a check using __TIME__ that goes off at compile time instead of runtime (some binaries are buggy, some aren't). Now extend that to UB.
3. ◴[] No.43648979[source]
4. pharrington ◴[] No.43649194[source]
"can" is extremely different than "will"!