←back to thread

146 points returningfory2 | 7 comments | | HN request time: 0.001s | source | bottom
Show context
mmastrac ◴[] No.43645485[source]
This is a great way to see why invalid UTF-8 strings and unicode chars cause undefined behaviour in Rust. `char` is a special integer type, known to have a valid range which is a sub-range of its storage type. Outside of dataless enums, this is the only datatype with this behaviour (EDIT: I neglected NonZero<...>/NonZeroXXX and some other zero-niche types).

If you manage to construct an invalid char from an invalid string or any other way, you can defeat the niche optimization code and accidentally create yourself an unsound transmute, which is game over for soundness.

replies(5): >>43645776 #>>43645961 #>>43646463 #>>43646643 #>>43651356 #
NoTeslaThrow ◴[] No.43645776[source]
> This is a great way to see why invalid UTF-8 strings and unicode chars cause undefined behaviour in Rust.

What does "undefined behavior" mean without a spec? Wouldn't the behavior rustc produces today be de-facto defined behavior? It seems like the contention is violating some transmute constraint, but does this not result in reproducible runtime behavior? In what context are you framing "soundness"?

EDIT: I'm honestly befuddled why anyone would downvote this. I certainly don't think this is detracting from the conversation at all—how can you understand the semantics of the above comment without understanding what the intended meaning of "undefined behavior" or "soundness" is?

replies(5): >>43645920 #>>43645923 #>>43646838 #>>43647769 #>>43648876 #
1. ben0x539 ◴[] No.43648876[source]
> I'm honestly befuddled why anyone would downvote this.

I think there's two parts to this. First, there's a bit of a history of people making disingenious jabs at Rust for not having an "ISO C++" style spec. Typically people would try to suggest that Rust can't be ready for production or shouldn't receive support in other ecosystems without being certified by some manner of international committee. Second, Rust by now has an extensive tradition of people discussing memory safety invariants, what soundness means, formal models of what is a valid memory access, desirable optimizations, etc, etc, so your question what undefined behavior means could be taken to be, like, polemically reductive or dismissive.

In context I don't think it's what you're doing, but I would also not be surprised if a lot of people reading Rust-related HN discussions are just super tired of anything that even slightly looks like an effort to re-litigate undefined behavior from first principles, because it tends to derail more specific discussions.

replies(2): >>43649500 #>>43652296 #
2. NoTeslaThrow ◴[] No.43649500[source]
Tbh, I just really hate the term "undefined behavior". It really feels like laziness in terms of what the possible damage might entail.
replies(3): >>43650357 #>>43651484 #>>43661990 #
3. arlort ◴[] No.43650357[source]
It is a term of art in compilers/language design though, isn't it?

If you break an invariant the compiler is relying on for optimization then you can't say for sure what the effect after all optimisation passes or in future versions of the compiler will be. It's just "undefined"

4. imtringued ◴[] No.43651484[source]
Yeah I personally think the problem isn't undefined behavior itself, but the C development culture where undefined behavior is sprinkled all over the language to the point where it has become unavoidable plus the inevitable assignment of blame onto C developers, because everyone knows there is enough time in the day for fuzzing your entire code base.
5. zozbot234 ◴[] No.43652296[source]
> Second, Rust by now has an extensive tradition of people discussing memory safety invariants, what soundness means, formal models of what is a valid memory access

Rust is still lacking a definitive formal model of "soundness" in unsafe code. I'm not sure why you're suggesting that this is not a valid criticism or remark, it's just a fact.

replies(1): >>43659145 #
6. ben0x539 ◴[] No.43659145[source]
Showing up out of nowhere pretending like they haven't even thought about what it means isn't helpful though.
7. Dylan16807 ◴[] No.43661990[source]
In a situation like this, causing UB is basically saying you deliberately corrupted your memory.

How are you supposed to be specific about what the possible damage might entail for corrupted memory? If you have a function with an "if" or a "while" or a "switch" in it, and you break the variable being evaluated, you might cause the program to skip over the choices and run whatever happens to be next in memory. What's the non-lazy listing of possible outcomes at that point?