C and C++ prioritize performance over correctness (2023)

1. ajross ◴[31 Mar 25 16:42 UTC] No.43537017[source]▶

This headline is badly misunderstanding things. C/C++ date from an era where "correctness" in the sense the author means wasn't a feasible feature. There weren't enough cycles at build time to do all the checking we demand from modern environments (e.g. building medium-scale Rust apps on a Sparcstation would be literally *weeks* of build time).

And more: the problem faced by the ANSI committee wasn't something where they were tempted to "cheat" by defining undefined behavior at all. It's that there was live C code in the world that did this stuff, for real and valid reasons. And they knew if they published a language that wasn't compatible no one would use it. But there were also variant platforms and toolchains that didn't do things the same way. So instead of trying to enumerate them all individually (which probably wasn't possible anyway), they identified the areas where they knew they could define firm semantics and allowed the stuff outside that boundary to be "undefined", so existing environments could continue to implement them compatibly.

Is that a good idea for a new language? No. But ANSI wasn't writing a new language. They were adding features to the language in which Unix was already written.

replies(5): >>43537270 #>>43537327 #>>43537466 #>>43537560 #>>43537849 #

2. bgirard ◴[31 Mar 25 17:07 UTC] No.43537270[source]▶

>>43537017 (TP) #

Did anything prevent them from transitioning undefined behavior towards defined behavior over time?

> It's that there was live C code in the world that did this stuff, for real and valid reasons.

If you allow undefined behavior, then you can move towards a more strictly defined behavior without any forward compatibility risk without breaking all live C code. For instance in the `EraseAll` example you can define the behavior in a more useful way rather than saying 'anything at all is allowed'.

replies(1): >>43537710 #

3. VWWHFSfQ ◴[31 Mar 25 17:13 UTC] No.43537327[source]▶

>>43537017 (TP) #

I don't think the headline is misunderstanding anything.

These things are both true:

> C and C++ Prioritize Performance over Correctness

> C/C++ date from an era where "correctness" in the sense the author means wasn't a feasible feature.

So correctness wasn't feasible, and therefore wasn't a priority. The machines were limited, and so performance was the priority.

4. rocqua ◴[31 Mar 25 17:25 UTC] No.43537466[source]▶

>>43537017 (TP) #

> So instead of trying to enumerate them all individually (which probably wasn't possible anyway), they identified the areas where they knew they could define firm semantics and allowed the stuff outside that boundary to be "undefined", so existing environments could continue to implement them compatibly.

These things didn't become undefined behavior. They became implementation defined behavior. The distinction is that for implementation defined behavior, a compiler has to make a decision consistently.

The big point of implementation defined behavior is 1s vs 2s complement. I believe shifting bits off the end of an unsigned int is also considered implementation defined.

For implementation defined behavior, the optimization of "assume it never happens" isn't allowed by the standard.

replies(1): >>43537696 #

5. pjmlp ◴[31 Mar 25 17:34 UTC] No.43537560[source]▶

>>43537017 (TP) #

The author is a famous compiler writer, including C and C++ compilers as GCC contributor, regardless of how Go is designed, he does know what he is talking about.

replies(1): >>43538066 #

6. bluGill ◴[31 Mar 25 17:46 UTC] No.43537696[source]▶

>>43537466 #

They did have implementation defined behavior, but a large part of undefined behavior was exactly that: never define anywhere and could have always been raised to implementation defined if they had thought to mention it.

replies(1): >>43538472 #

7. bluGill ◴[31 Mar 25 17:48 UTC] No.43537710[source]▶

>>43537270 #

No, and that has been happening over time. C++26 for example looked at uninitialized variables and defined them. The default is intentionally unreasonable for all cases where this would happen just forcing everyone to initialize (and also because the value is unreasonable makes it easy for runtime tools to detect the issue when the compiler cannot)

8. jayd16 ◴[31 Mar 25 18:01 UTC] No.43537849[source]▶

>>43537017 (TP) #

But we write new code in C and C++ today. We make these tradeoffs today. So its not some historical oddity. That is the tradeoff we make.

9. ajross ◴[31 Mar 25 18:24 UTC] No.43538066[source]▶

>>43537560 #

It's still a bad headline. UB et. al. weren't added to the language for "performance" reasons, period. They were then and remain today compatibility features.

replies(2): >>43538405 #>>43541974 #

10. pjmlp ◴[31 Mar 25 18:53 UTC] No.43538405{3}[source]▶

>>43538066 #

That is what implementation defined were supposed to be.

11. moefh ◴[31 Mar 25 18:58 UTC] No.43538472{3}[source]▶

>>43537696 #

I don't doubt what you're saying is true, I have heard similar things many many times over the years. The problem is that it's always stated somewhat vaguely, never with concrete examples, and it doesn't match my (perhaps naive) reading of any of the standards.

For example, I just checked C99[1]: it says in many places "If <X>, the behavior is undefined". It also says in even more places "<X> is implementation-defined" (although from my cursory inspection, most -- but not all -- of these seem to be about the behavior of library functions, not the compiler per se).

So it seems to me that the standards writers were actually very particular about the difference between implementation-defined behavior and undefined behavior.

[1] https://port70.net/~nsz/c/c99/n1256.html

replies(2): >>43539071 #>>43539082 #

12. bluGill ◴[31 Mar 25 19:49 UTC] No.43539071{4}[source]▶

>>43538472 #

What you are not seeing is times where the standard didn't say anything at all.

replies(1): >>43539341 #

13. jcranmer ◴[31 Mar 25 19:50 UTC] No.43539082{4}[source]▶

>>43538472 #

I think bluGill might be referring to cases of undefined behavior which are undefined because the specification literally never mentions the behavior, as opposed to explicitly saying the behavior is undefined.

My canonical example of such a case is what happens if you call qsort where the comparison function is "int compare(const void*, const void*) { return 1; }".

14. moefh ◴[31 Mar 25 20:11 UTC] No.43539341{5}[source]▶

>>43539071 #

Are those instances of undefined behavior relevant to what's being discussed here? The vast majority undefined behavior people argue/warn/complain about, including the original article, is behavior that is explicitly defined to be undefined (I say that with the caveat that I almost never use C++ and I've never read any C++ standard closely, so my perception is biased towards C; things might be different for C++).

What I mean to say is that the "problem" of undefined behavior does seem to be intentionally introduced by the authors of the standard, not an oversight.

15. fooker ◴[01 Apr 25 01:41 UTC] No.43541974{3}[source]▶

>>43538066 #

You are wrong. The formalized concept of UB was introduced exactly because of this.

Let's take something as simple as divide by zero. Now, suppose you have a bunch of code with random arithmetic operations.

A compiler can not optimize this code at all without somehow proving that all denominators are non zero. What UB brings you is that you can optimize the program based on the assumption that UB never occurs. If it actually does, who cares, the program would have done something bogus anyway.

Now think about pointer dereferences, etc etc.

replies(1): >>43546319 #

16. ajross ◴[01 Apr 25 13:03 UTC] No.43546319{4}[source]▶

>>43541974 #

UB was not introduced to facilitate optimization, period. At the time the ANSI standard was being written, such optimizations didn't even exist yet. The edge case trickery around "assume behavior is always defined" didn't start showing up until the late 90's, a full decade and a half later.

UB was introduced to allow for variant/incompatible platform behavior (in your example, how the hardware treats a divide by zero condition) in a way that allowed pre-existing code to remain valid on the platform it was written, but to leave the core language semantics clear for future code.