←back to thread

177 points signa11 | 4 comments | | HN request time: 0s | source
Show context
Arch-TK ◴[] No.42160944[source]
I have memorised the UB rules for C. Or rather, more accurately, I have memorised the subset of UB rules I need to memorise to be productive in the language and am very strict in sticking to only writing code which I know is well defined (and know my way around the C standard at a level where any obscure code I sometimes need to write can be verified to be well defined without too much hassle). I think Rust may be difficult But, if I forget something, or make a mistake, I'm screwed. Yes there's ubsan, there's tests, but ubsan and tests aren't guaranteed to work when ub is involved.

This is why I call C a minefield.

On that note, C++ has such an explosion of UB that I don't generally believe anyone who claims to know C++ because it seems to me to be almost infeasible to both learn all the rules, or at least the subset required to be productive, and then to write/modify code without getting lost.

With rust, the amount of rules I need to learn to understand rust's borrow checker is about the same or even less. And if I forget the rules, the borrow checker is there to back me up.

I still think that unless you need the performance, you should use a higher level language which hides this from you. It's genuinely easier to think about.

That being said, writing correct rust which is going to a: work as I intended and b: not have UB is much less mentally taxing, even when I have to reach for unsafe.

If you find it more taxing than writing C or C++ it's probably either because you haven't internalised the rules of the borrow checker, or because your C or C++ are riddled with various kinds of serious issues.

replies(7): >>42161052 #>>42161225 #>>42161510 #>>42162166 #>>42162494 #>>42162555 #>>42162621 #
1. tialaramex ◴[] No.42161225[source]
The ISO document for C has an appendix which lists all the known categories of Undefined Behaviour. It's not exactly a small list, but it's something you could memorize if you wanted to, like the list of all US interstates, where they start and where they end.

There has been a proposal to attempt this for C++ but IMO the progress on making such an appendix is slower than the rate of the change for the language, making it a never ending task. It was also expanded by the fact that on top of Undefined Behaviour C++ also explicitly has IFNDR, programs which it declares to be Ill-formed (ie they are not C++) but No Diagnostic is required (ie your compiler doesn't know that it's not C++). This is much worse than UB.

replies(2): >>42162516 #>>42163562 #
2. blub ◴[] No.42162516[source]
This only makes sense if one wants to write a Phd on C++ UB and needs the exhaustive list.

For the rest of us, there’s cppreference, UBsan and quite a few books on writing correct C++ code. Of course, these will still not suffice to write 100% memory safe code, which is a pretty arbitrary goal that just happens to match what Rust offers and is pushed a lot by Rust advocates.

It’s a nice goal, but not everybody works on software that’s attacked all day every day.

replies(1): >>42163162 #
3. kimixa ◴[] No.42163162[source]
Also, memory safety isn't the only "bug" - I'd even argue that the majority of "memory" issues in unsafe languages like C are actually the result of a logic error or mismatch of interface expectations, and a memory error is often the "first noticed failure". In the trivial example strcpy() examples people love to use, unexpectedly truncating a string often means the program has "failed" in it's intended task just as much as a segfault or other memory corruption.

I'm extremely positive on highlighting as many of these problems before it gets to the user's hands, even more so if it's as early as a compile step as in the borrow checker, but lets not delude ourselves that they are the only possible issue software has. Or that in many languages it's a tooling issue (or culture issue accepting that tooling...) rather than a fundamental language difference.

On a side node, with the prevalence of things WASM I feel some people are just redefining what "memory safety" is. Defining a block of memory and using offsets within that is just reinventing pointers, the runtime ensuring that any offsets are within that block just mirroring the MMU and process isolation. We should really be looking at why that isn't well used rather than just reimplementing a new version on top for "security", as if those reasons aren't really "technical" (IE poor isolation between "Trusted" and "Untrusted" data processing in separate processes due to it being "Easier") we need to ensure we don't just do the same things again, and if they are technical we can fix them.

4. Arch-TK ◴[] No.42163562[source]
That's the appendix containing documented UB. The standard also explicitly states that any behaviour not explicitly defined by the standard is undefined meaning that there are things which aren't in that list. And I can confirm, there are things which you can do in C which are UB but which are not on that list.