←back to thread

146 points returningfory2 | 1 comments | | HN request time: 0s | source
Show context
mmastrac ◴[] No.43645485[source]
This is a great way to see why invalid UTF-8 strings and unicode chars cause undefined behaviour in Rust. `char` is a special integer type, known to have a valid range which is a sub-range of its storage type. Outside of dataless enums, this is the only datatype with this behaviour (EDIT: I neglected NonZero<...>/NonZeroXXX and some other zero-niche types).

If you manage to construct an invalid char from an invalid string or any other way, you can defeat the niche optimization code and accidentally create yourself an unsound transmute, which is game over for soundness.

replies(5): >>43645776 #>>43645961 #>>43646463 #>>43646643 #>>43651356 #
NoTeslaThrow ◴[] No.43645776[source]
> This is a great way to see why invalid UTF-8 strings and unicode chars cause undefined behaviour in Rust.

What does "undefined behavior" mean without a spec? Wouldn't the behavior rustc produces today be de-facto defined behavior? It seems like the contention is violating some transmute constraint, but does this not result in reproducible runtime behavior? In what context are you framing "soundness"?

EDIT: I'm honestly befuddled why anyone would downvote this. I certainly don't think this is detracting from the conversation at all—how can you understand the semantics of the above comment without understanding what the intended meaning of "undefined behavior" or "soundness" is?

replies(5): >>43645920 #>>43645923 #>>43646838 #>>43647769 #>>43648876 #
duckerude ◴[] No.43647769[source]
It means that anything strange that happens next isn't a language bug.

Whether something is a bug or not is sometimes hard to pin down because there's no formal spec. Most of the time it's pretty clear though. Most software doesn't have a formal spec and manages to categorize bugs anyway.

replies(1): >>43649494 #
NoTeslaThrow ◴[] No.43649494[source]
> It means that anything strange that happens next isn't a language bug.

This is even more vague. The language is getting blamed regardless. This makes no sense.

replies(1): >>43651143 #
1. dwattttt ◴[] No.43651143[source]
No: the language defined that e.g. a NonZeroU8 can't contain 0, and the only way it could is via illegal means. You don't need a formal proof to describe that.

To try to characterise what any compiler, hypothetical or not, does if you nonetheless produce one (again, via means that aren't valid) isn't meaningful.