←back to thread

177 points signa11 | 2 comments | | HN request time: 0.566s | source
Show context
Arch-TK ◴[] No.42160944[source]
I have memorised the UB rules for C. Or rather, more accurately, I have memorised the subset of UB rules I need to memorise to be productive in the language and am very strict in sticking to only writing code which I know is well defined (and know my way around the C standard at a level where any obscure code I sometimes need to write can be verified to be well defined without too much hassle). I think Rust may be difficult But, if I forget something, or make a mistake, I'm screwed. Yes there's ubsan, there's tests, but ubsan and tests aren't guaranteed to work when ub is involved.

This is why I call C a minefield.

On that note, C++ has such an explosion of UB that I don't generally believe anyone who claims to know C++ because it seems to me to be almost infeasible to both learn all the rules, or at least the subset required to be productive, and then to write/modify code without getting lost.

With rust, the amount of rules I need to learn to understand rust's borrow checker is about the same or even less. And if I forget the rules, the borrow checker is there to back me up.

I still think that unless you need the performance, you should use a higher level language which hides this from you. It's genuinely easier to think about.

That being said, writing correct rust which is going to a: work as I intended and b: not have UB is much less mentally taxing, even when I have to reach for unsafe.

If you find it more taxing than writing C or C++ it's probably either because you haven't internalised the rules of the borrow checker, or because your C or C++ are riddled with various kinds of serious issues.

replies(7): >>42161052 #>>42161225 #>>42161510 #>>42162166 #>>42162494 #>>42162555 #>>42162621 #
bargainbot3k ◴[] No.42161510[source]
Embedded. Your UB is my opportunity.
replies(2): >>42161955 #>>42184081 #
cyberax ◴[] No.42161955[source]
Really? So far it seems like most of the UBs in C are caused either by:

1. Masochism

2. Underspecification, in a vain attempt to make a language that can theoretically be used on PDP computers.

replies(1): >>42162106 #
amluto ◴[] No.42162106[source]
You’re missing #3, which accounts for an absolutely enormous amount of loss:

3. The fact that an inappropriate write through a pointer results in behavior that is so undefined that it can lead to remote code execution and hence do literally anything.

No amount of additional specification can fix #3, and masochism cannot explain it.

One could mitigate #3 to some extent with techniques like control flow integrity or running in a strongly sandboxed environment.

replies(3): >>42162379 #>>42162414 #>>42162592 #
1. cyberax ◴[] No.42162592[source]
There's nothing really you can do with out-of-bounds write in C except say that it can do "anything". This UB is unavoidable.

I'm talking more about the nonsense like "c++ + ++c". There's no reason but masochism to keep it undefined. Just pick one unambiguous option and codify it.

An example of #2 is stuff like signed overflow. There are only so many ways to handle it: wraparound, saturate, error out. So C should just document them and provide a way to detect which behavior is active (like it does with endianness).

replies(1): >>42164505 #
2. jcranmer ◴[] No.42164505[source]
It's someone disingenuous to purposefully ignore what is the most common kind of UB in C. It's also ultimately not a very useful dichotomy, especially because it misunderstands why behavior ends up being undefined. For example:

> I'm talking more about the nonsense like "c++ + ++c". There's no reason but masochism to keep it undefined. Just pick one unambiguous option and codify it.

It's because there's an underlying variance in what the compilers (and the hardware [1]) translated for expressions like that, and codifying any option would have broken several of them, which was anathema in the days of ANSI C standardization. (It's still pretty frowned upon, but "get one person to change behavior so that everybody gets a consistent standard" is something the committees are more willing to countenance nowadays).

> An example of #2 is stuff like signed overflow. There are only so many ways to handle it: wraparound, saturate, error out.

Funnily enough, none of the ways you mention turn out to be the way it's actually implemented in the compiler nowadays.

As for why UB actually exists, there are several reasons. Sometimes, it's essential because the underlying behavior is impossible to rationally specify (e.g., errant pointer dereferences, traps). Sometimes, it's because you have optimization hints where you don't want to constrain violation of those hints (e.g., restrict, noreturn). Sometimes, it's erroneous behavior that's hard to consistently diagnose (e.g., signed overflow). Sometimes, it's for explicit implementation-defined behavior, but for various reasons, the standard authors didn't think it could be implemented as unspecified or implementation-defined behavior.

[1] Remember, this is the days of CISC, and not the x86 only-very-barely-not-RISC kind of CISC, the heady days of CISC where things like "*p++ = --q" is a single instruction.