Most active commenters

Arch-TK(5)
chipdart(3)
rcxdude(3)

Popular/hot comments

>>42162106 #

←back to thread

Stop making me memorize the borrow checker

(erikmcclure.com)

1. Arch-TK ◴[17 Nov 24 00:43 UTC] No.42160944[source]▶

>>42160501 (OP) #

I have memorised the UB rules for C. Or rather, more accurately, I have memorised the subset of UB rules I need to memorise to be productive in the language and am very strict in sticking to only writing code which I know is well defined (and know my way around the C standard at a level where any obscure code I sometimes need to write can be verified to be well defined without too much hassle). I think Rust may be difficult But, if I forget something, or make a mistake, I'm screwed. Yes there's ubsan, there's tests, but ubsan and tests aren't guaranteed to work when ub is involved.

This is why I call C a minefield.

On that note, C++ has such an explosion of UB that I don't generally believe anyone who claims to know C++ because it seems to me to be almost infeasible to both learn all the rules, or at least the subset required to be productive, and then to write/modify code without getting lost.

With rust, the amount of rules I need to learn to understand rust's borrow checker is about the same or even less. And if I forget the rules, the borrow checker is there to back me up.

I still think that unless you need the performance, you should use a higher level language which hides this from you. It's genuinely easier to think about.

That being said, writing correct rust which is going to a: work as I intended and b: not have UB is much less mentally taxing, even when I have to reach for unsafe.

If you find it more taxing than writing C or C++ it's probably either because you haven't internalised the rules of the borrow checker, or because your C or C++ are riddled with various kinds of serious issues.

replies(7): >>42161052 #>>42161225 #>>42161510 #>>42162166 #>>42162494 #>>42162555 #>>42162621 #

2. akira2501 ◴[17 Nov 24 01:01 UTC] No.42161052[source]▶

>>42160944 (TP) #

> This is why I call C a minefield.

Computing is a series of "minefields." At least you get a map of this particular one.

I'm far more confronted by public facing APIs that involve user authentication than I am of any particular documented set of language facts.

3. tialaramex ◴[17 Nov 24 01:32 UTC] No.42161225[source]▶

>>42160944 (TP) #

The ISO document for C has an appendix which lists all the known categories of Undefined Behaviour. It's not exactly a small list, but it's something you could memorize if you wanted to, like the list of all US interstates, where they start and where they end.

There has been a proposal to attempt this for C++ but IMO the progress on making such an appendix is slower than the rate of the change for the language, making it a never ending task. It was also expanded by the fact that on top of Undefined Behaviour C++ also explicitly has IFNDR, programs which it declares to be Ill-formed (ie they are not C++) but No Diagnostic is required (ie your compiler doesn't know that it's not C++). This is much worse than UB.

replies(2): >>42162516 #>>42163562 #

4. bargainbot3k ◴[17 Nov 24 02:29 UTC] No.42161510[source]▶

>>42160944 (TP) #

Embedded. Your UB is my opportunity.

replies(2): >>42161955 #>>42184081 #

5. cyberax ◴[17 Nov 24 04:25 UTC] No.42161955[source]▶

>>42161510 #

Really? So far it seems like most of the UBs in C are caused either by:

1. Masochism

2. Underspecification, in a vain attempt to make a language that can theoretically be used on PDP computers.

replies(1): >>42162106 #

6. amluto ◴[17 Nov 24 05:04 UTC] No.42162106{3}[source]▶

>>42161955 #

You’re missing #3, which accounts for an absolutely enormous amount of loss:

3. The fact that an inappropriate write through a pointer results in behavior that is so undefined that it can lead to remote code execution and hence do literally anything.

No amount of additional specification can fix #3, and masochism cannot explain it.

One could mitigate #3 to some extent with techniques like control flow integrity or running in a strongly sandboxed environment.

replies(3): >>42162379 #>>42162414 #>>42162592 #

7. PittleyDunkin ◴[17 Nov 24 05:20 UTC] No.42162166[source]▶

>>42160944 (TP) #

> I still think that unless you need the performance, you should use a higher level language which hides this from you.

Exporting and consuming the full c abi with very little effort is also another huge thing in rust's favor. Languages have opted heavily for supporting calling into the c abi and being hosted by the c abi, so naturally support for rust on the same terms comes for free. There's even rust in linux now.

8. Dylan16807 ◴[17 Nov 24 06:29 UTC] No.42162379{4}[source]▶

>>42162106 #

That's not missing, I think they left it out of the "most" criticism on purpose. A dangling pointer is one of the few really good cases for UB. (Though good arguments can be made to give the compiler less leeway in that situation.)

9. thaumasiotes ◴[17 Nov 24 06:45 UTC] No.42162414{4}[source]▶

>>42162106 #

> The fact that an inappropriate write through a pointer results in behavior that is so undefined that it can lead to remote code execution

This is a strange way to look at it. You'd get remote code execution only if the result of writing through the pointer was exactly what you'd expect: that the value you tried to write was copied into the memory indexed by the pointer.

10. blub ◴[17 Nov 24 07:12 UTC] No.42162494[source]▶

>>42160944 (TP) #

I think you’re missing the author’s point, but OTOH he undermined it himself by stating that learning the rules helps: because Rust requires that the ownership and relationships are encoded in the type system, it requires significant design changes when those relationships change.

Learning the rules only partly mitigates this, because sometimes one does exploratory programming and isn’t sure what the final types are or they just want to change something.

Rust thrives on over-specification which calcifies the APIs.

Anyway, just as the author’s allegedly holding Rust wrong, one could say that you’re holding C++ wrong - the right approach is to learn how to write correct code and then the exceptions. Also accept and be at peace with the fact that your code will have some bugs. I don’t know why the average Rust developer is so obsessed with getting things perfect and no less with memory safety when the overall software quality is the way it is. I mean if someone’s researching the topic or works on Rust, sure, be the Stallman of memory correctness.

replies(1): >>42164159 #

11. blub ◴[17 Nov 24 07:17 UTC] No.42162516[source]▶

>>42161225 #

This only makes sense if one wants to write a Phd on C++ UB and needs the exhaustive list.

For the rest of us, there’s cppreference, UBsan and quite a few books on writing correct C++ code. Of course, these will still not suffice to write 100% memory safe code, which is a pretty arbitrary goal that just happens to match what Rust offers and is pushed a lot by Rust advocates.

It’s a nice goal, but not everybody works on software that’s attacked all day every day.

replies(1): >>42163162 #

12. scott_w ◴[17 Nov 24 07:27 UTC] No.42162555[source]▶

>>42160944 (TP) #

After reading the article, it’s clear the author approves of the fact Rust has these rules (and prefers it over C++). They’re highlighting the natural challenges that brings so future iterations or competitors can see what needs to be improved.

13. cyberax ◴[17 Nov 24 07:35 UTC] No.42162592{4}[source]▶

>>42162106 #

There's nothing really you can do with out-of-bounds write in C except say that it can do "anything". This UB is unavoidable.

I'm talking more about the nonsense like "c++ + ++c". There's no reason but masochism to keep it undefined. Just pick one unambiguous option and codify it.

An example of #2 is stuff like signed overflow. There are only so many ways to handle it: wraparound, saturate, error out. So C should just document them and provide a way to detect which behavior is active (like it does with endianness).

replies(1): >>42164505 #

14. chipdart ◴[17 Nov 24 07:42 UTC] No.42162621[source]▶

>>42160944 (TP) #

> I have memorised the UB rules for C.

Why? What's wrong with using one of the many static code analysis tool to tell you about them if/when they appear?

replies(1): >>42163225 #

15. kimixa ◴[17 Nov 24 09:57 UTC] No.42163162{3}[source]▶

>>42162516 #

Also, memory safety isn't the only "bug" - I'd even argue that the majority of "memory" issues in unsafe languages like C are actually the result of a logic error or mismatch of interface expectations, and a memory error is often the "first noticed failure". In the trivial example strcpy() examples people love to use, unexpectedly truncating a string often means the program has "failed" in it's intended task just as much as a segfault or other memory corruption.

I'm extremely positive on highlighting as many of these problems before it gets to the user's hands, even more so if it's as early as a compile step as in the borrow checker, but lets not delude ourselves that they are the only possible issue software has. Or that in many languages it's a tooling issue (or culture issue accepting that tooling...) rather than a fundamental language difference.

On a side node, with the prevalence of things WASM I feel some people are just redefining what "memory safety" is. Defining a block of memory and using offsets within that is just reinventing pointers, the runtime ensuring that any offsets are within that block just mirroring the MMU and process isolation. We should really be looking at why that isn't well used rather than just reimplementing a new version on top for "security", as if those reasons aren't really "technical" (IE poor isolation between "Trusted" and "Untrusted" data processing in separate processes due to it being "Easier") we need to ensure we don't just do the same things again, and if they are technical we can fix them.

16. rcxdude ◴[17 Nov 24 10:16 UTC] No.42163225[source]▶

>>42162621 #

Those tools can't reliably identify undefined behaviour.

replies(1): >>42163678 #

17. Arch-TK ◴[17 Nov 24 11:34 UTC] No.42163562[source]▶

>>42161225 #

That's the appendix containing documented UB. The standard also explicitly states that any behaviour not explicitly defined by the standard is undefined meaning that there are things which aren't in that list. And I can confirm, there are things which you can do in C which are UB but which are not on that list.

18. chipdart ◴[17 Nov 24 11:52 UTC] No.42163678{3}[source]▶

>>42163225 #

> Those tools can't reliably identify undefined behaviour.

I'm sorry, can you explain what leads you to believe your hypothetical scenario is an argument rejecting the use of static code analysis tools?

I mean, I'm stating the fact that there are many many tools out there that can pick up these problems. This is a known fact. You're saying that hypothetically perhaps they might not catch each and every single hypothetical case. So what?

replies(1): >>42164203 #

19. Arch-TK ◴[17 Nov 24 13:45 UTC] No.42164159[source]▶

>>42162494 #

I think unless your code is guaranteed to never interact with any untrusted input it is nowadays an increasingly unacceptable compromise to just accept that your program might have serious flaws which can lead to remote code execution or worse.

Moreover, it becomes increasingly unpleasant and unworkable to deal with code which progressively gets more and more unreliable.

It's expected that if the complexity of a program grows, the state space that the program can occupy grows with it. But with UB you can run into by accident that state space seems to grow exponentially in comparison to a language like Rust.

If you are required to write code at that low level, I would not use anything other than something like rust.

If you are not required to write code at that level. There are many languages with much less uncertainty than C++ which are much more productive than either C++ or rust.

replies(1): >>42166957 #

20. rcxdude ◴[17 Nov 24 13:58 UTC] No.42164203{4}[source]▶

>>42163678 #

They're a good idea, but not a substitute for knowing the rules. And they don't just miss theoretical cases, they miss problems in practice even when used rigourously.

replies(1): >>42164441 #

21. chipdart ◴[17 Nov 24 14:46 UTC] No.42164441{5}[source]▶

>>42164203 #

> They're a good idea, but not a substitute for knowing the rules.

It's a good thing no one made that claim, then.

The whole point is that were seeing people in this thread making all sort of wild claims on how it's virtually impossible to catch these errors in C++ even though back in reality there are a myriad of static analysis and memory checker tools that do just that.

Your average developer also knows how to type in a space character but still it's a good idea to onboard linters and automatic code formatters.

replies(2): >>42164644 #>>42165992 #

22. jcranmer ◴[17 Nov 24 14:58 UTC] No.42164505{5}[source]▶

>>42162592 #

It's someone disingenuous to purposefully ignore what is the most common kind of UB in C. It's also ultimately not a very useful dichotomy, especially because it misunderstands why behavior ends up being undefined. For example:

> I'm talking more about the nonsense like "c++ + ++c". There's no reason but masochism to keep it undefined. Just pick one unambiguous option and codify it.

It's because there's an underlying variance in what the compilers (and the hardware [1]) translated for expressions like that, and codifying any option would have broken several of them, which was anathema in the days of ANSI C standardization. (It's still pretty frowned upon, but "get one person to change behavior so that everybody gets a consistent standard" is something the committees are more willing to countenance nowadays).

> An example of #2 is stuff like signed overflow. There are only so many ways to handle it: wraparound, saturate, error out.

Funnily enough, none of the ways you mention turn out to be the way it's actually implemented in the compiler nowadays.

As for why UB actually exists, there are several reasons. Sometimes, it's essential because the underlying behavior is impossible to rationally specify (e.g., errant pointer dereferences, traps). Sometimes, it's because you have optimization hints where you don't want to constrain violation of those hints (e.g., restrict, noreturn). Sometimes, it's erroneous behavior that's hard to consistently diagnose (e.g., signed overflow). Sometimes, it's for explicit implementation-defined behavior, but for various reasons, the standard authors didn't think it could be implemented as unspecified or implementation-defined behavior.

[1] Remember, this is the days of CISC, and not the x86 only-very-barely-not-RISC kind of CISC, the heady days of CISC where things like "*p++ = --q" is a single instruction.

23. eddd-ddde ◴[17 Nov 24 15:24 UTC] No.42164644{6}[source]▶

>>42164441 #

You made the claim

> Why? What's wrong with using one of the many static code analysis tool to tell you about them if/when they appear?

You clearly pose static analysers as an alternative to understanding UB. You still need to understand how things work.

24. rcxdude ◴[17 Nov 24 18:44 UTC] No.42165992{6}[source]▶

>>42164441 #

It's not impossible to catch those errors in C and C++. In fact, every time you run a new tool against a large C or C++ codebase you will find new ones. What none of these tools do is catch all the issues, as demonstrated by the fact that people keep finding new ones.

25. sfink ◴[17 Nov 24 20:32 UTC] No.42166957{3}[source]▶

>>42164159 #

> I think unless your code is guaranteed to never interact with any untrusted input it is nowadays an increasingly unacceptable compromise to just accept that your program might have serious flaws which can lead to remote code execution or worse.

I think that's too strong a statement, because it applies to in-development programs. I agree with you if you're talking about released programs, but there can be benefit in leaving open the possibility of detectable flaws, serious or otherwise, while your code is still in development.

It's analogous to only compiling and running in debug mode throughout your development, and then switching to release mode for the final binary. The binary is suboptimal throughout your development process; it's too slow. But as long as the `--release` flag doesn't require any code changes, it's still a better idea than developing entirely in release mode.

Similarly, the binary could be suboptimal from a correctness standpoint, as long as removing the `--devel` flag only works when the compiler is fully happy. `--devel` could turn some borrow checking failures into warnings and still give you a runnable binary. Or it could allow leaving types underspecified in interfaces, and do an unsound type inference. Best case, it could even do runtime checks and/or coercions to establish the assumptions that the callee was compiled with.

Whether it would be worth the complexity is an open question, but it seems reasonably clear that Rust has a problem with brittleness to development-time change.

replies(1): >>42168913 #

26. Arch-TK ◴[18 Nov 24 01:28 UTC] No.42168913{4}[source]▶

>>42166957 #

If you develop C or C++ haphazardly in such a way that you leave a bunch of UB on the table during development then there's little to no chance that you'll have actually erased all presence of it by the end of development.

There currently exist no tools which with complete reliability point out all UB in your program. If any part of your program can have UB and you didn't write it with the explicit intention of not having UB in it at any point then you're going to be left with a tough situation to deal with.

I've read a lot of C in my time and there's codebases which I read and find easy and quick to review because they stick to the rules and only bend them sparingly and then there's codebases which are a pain to evaluate even the most basic parts for errors and UB.

There's no such thing as "detectable UB", there's only UB which your tools have luckily managed to detect.

Leave the UB to the people who can't avoid it, stick to safe languages when you can.

27. Arch-TK ◴[19 Nov 24 14:55 UTC] No.42184081[source]▶

>>42161510 #

Not really.

In embedded environments you're constrained by toolchain and platform but it's still a bad idea to rely on any behaviour which your compiler doesn't provide a definition for (which might be more behaviour than what your standard provides a definition for) because changes to the version of the compiler or even changes to surrounding code can trigger issues caused by reliance on UB.

It's not actually that hard to write embedded code which does not invoke UB outside of register access and even there it's possible to limit yourself to invoking behaviours which the combination of hardware + compiler does provide documented behaviour for.

(source: I've written embedded code which did not knowingly/intentionally invoke UB outside of register access and in those cases the implementation did define behaviour.)

↑