C and C++ prioritize performance over correctness (2023)

1. uecker ◴[31 Mar 25 17:42 UTC] No.43537642[source]▶

You can implement C in completely different ways. For example, I like that signed overflow is UB because it is trivial to catch it, while unsigned wraparound - while defined - leads to extremely difficult to find bugs.

replies(4): >>43537908 #>>43538002 #>>43538056 #>>43538186 #

2. dehrmann ◴[31 Mar 25 18:07 UTC] No.43537908[source]▶

>>43537642 (TP) #

Some version of ints doing bad things plagues lots of other languages. Java, Kotlin, C#, etc. silently overflow, and Javascript numbers can look and act like ints until they don't. Python is the notable exception.

3. AlotOfReading ◴[31 Mar 25 18:18 UTC] No.43538002[source]▶

>>43537642 (TP) #

There's 3 reasonable choices for what to do with unsigned overflow: wraparound, saturation, and trapping. Of those, I find wrapping behavior by far the most intuitive and useful.

Saturation breaks the successor relation S(x) != x. Sometimes you want that, but it's extremely situational and rarely do you want saturation precisely at the type max. Saturation is better served by functions in C.

Trapping is fine conceptually, but it means all your arithmetic operations can now error. That's a severe ergonomic issue, isn't particularly well defined for many systems, and introduces a bunch of thorny issues with optimizations. Again, better as functions in C.

On the other hand, wrapping is the mathematical basis for CRCs, Error correcting codes, cryptography, bitwise math, and more. There's no wasted bits, it's the natural implementation in hardware, it's familiar behavior to students from a young age as "clock arithmetic", compilers can easily insert debug mode checks for it (the way rust does when you forget to use Wrapping<T>), etc.

It's obviously not perfect either, as it has the same problem of all fixed size representations in diverging from infinite math people are actually trying to do, but I don't think the alternatives would be better.

replies(4): >>43538187 #>>43538933 #>>43539073 #>>43539198 #

4. Strilanc ◴[31 Mar 25 18:23 UTC] No.43538056[source]▶

>>43537642 (TP) #

Could you provide the code you use to trivially catch signed overflows? My impression is the opposite: unsigned is trivial (just test `a+b < b`) while signed is annoying (especially because evaluating a potentially-overflowing expression would cause the UB I'm trying to avoid).

replies(6): >>43538172 #>>43538229 #>>43538928 #>>43539940 #>>43539954 #>>43541133 #

5. uecker ◴[31 Mar 25 18:32 UTC] No.43538172[source]▶

>>43538056 #

Oh, I meant it is trivial to catch bugs caused by overflow by using a sanitizer and difficult to find bugs caused by wraparound.

But checking for signed overflow is also simply with C23: https://godbolt.org/z/ebKejW9ao

6. tsimionescu ◴[31 Mar 25 18:35 UTC] No.43538186[source]▶

>>43537642 (TP) #

Signed overflow being UB does not make it easier to find in any way. Any detection for signed overflow you can write give that it's UB could be found if it were defined. There are plenty of sanitizers for behaviors that are not UB, at least for other languages, so it's not even an ecosystem advantage.

replies(1): >>43543580 #

7. uecker ◴[31 Mar 25 18:35 UTC] No.43538187[source]▶

>>43538002 #

I am fine with unsigned wraparound, I just think one should avoid using these types for indices and sizes, and use them only for the applications where modulo arithmetic makes sense mathematically.

8. ndiddy ◴[31 Mar 25 18:38 UTC] No.43538229[source]▶

>>43538056 #

Compiling with -ftrapv will cause your program to trap on signed overflow/underflow, so when you run it in a debugger you can immediately see where and why the overflow/underflow occurred.

replies(1): >>43542268 #

9. o11c ◴[31 Mar 25 19:36 UTC] No.43538928[source]▶

>>43538056 #

Avoiding UB for performing the signed addition/subtraction/multiplication is trivial - just cast to unsigned, do the operation, cast back. In standard C23 or GNU C11, you can write a `make_unsigned` and `make_signed` macro using `typeof` and `_Generic`.

10. jcranmer ◴[31 Mar 25 19:37 UTC] No.43538933[source]▶

>>43538002 #

> There's 3 reasonable choices for what to do with unsigned overflow: wraparound, saturation, and trapping.

There's a 4th reasonable choice: pretend it doesn't happen. Now, before you crucify me for daring to suggest that undefined behavior can be a good thing, let me explain:

When you start working on a lot of peephole optimizations, you quickly come to the discovery that there are quite a few cases where two pieces of code are almost equivalent, except that they end up giving different answers if someone overflowed (or some other edge case you don't really care about). Rather interestingly, even if you put a lot of effort into a compiler to make it aggressively infer that code can't overflow, you still run into problems because those assumptions don't really compose well (e.g., knowing that (A + (B + C)) can't overflow doesn't mean that ((A + B) + C) can't overflow--imagine B = INT_MAX and C = INT_MIN to see why).

And sure, individual peephole optimizations don't make much of a performance effect. But they can sometimes have want-of-a-nail side effects, where a failure because of inability to assume nonoverflow in one place causes another optimization to fail to kick in and the domino effect results in measurable slowdowns. In one admittedly extreme example, I've seen a single this-might-overflow result in a 10× slowdown, since it alone was responsible for the autoparallelization framework to fail to kick in. This is happened enough to me that there are times I just want to shake the computer and scream "I DON'T FUCKING CARE ABOUT EDGE CASES, JUST GIVE ME THE DAMN FASTEST CODE."

The problem with undefined behavior isn't that it risks destroying your code if you hit it (that's a good thing!); the problem is that it too frequently comes without a way to opt-out of it. And there is room to argue if it should be opt-in or opt-out, but completely absent is a step too far for me.

(Slight apologies for the rant, I'm currently in the middle of tracking down a performance hit caused by... inability to infer non-overflow of an operation.)

replies(2): >>43539519 #>>43540958 #

11. cwzwarich ◴[31 Mar 25 19:49 UTC] No.43539073[source]▶

>>43538002 #

> it's the natural implementation in hardware

The natural implementation in hardware is that addition of two N-bit numbers produces an N+1-bit number. Most architectures even expose this extra bit as a carry bit.

replies(1): >>43539787 #

12. itishappy ◴[31 Mar 25 20:00 UTC] No.43539198[source]▶

>>43538002 #

Does wrapping not break the successor relationship as well? I suppose it's a different problem than saturation, but the symptoms are the same: the relationship between a number and it's successor is no longer intuitive (not injective for saturation, not ordered for wrapping).

13. AlotOfReading ◴[31 Mar 25 20:24 UTC] No.43539519{3}[source]▶

>>43538933 #

You can get exactly the same "benefits" without the side effects by simply making signed overflow unspecified rather than undefined. There are better alternatives of course, but this is the one that has essentially no tradeoffs.

I don't consider destroying code semantics if you hit it a good thing, especially when there's no reliable and automatic way to observe it.

replies(1): >>43539840 #

14. AlotOfReading ◴[31 Mar 25 20:46 UTC] No.43539787{3}[source]▶

>>43539073 #

Addition of two 1-bit numbers produces a 1-bit number, which is simple and fundamental enough that we call it XOR. If you take that N-bit adder and drop the final carry (a.k.a use a couple XORs instead of the full adder), you get wrapping addition. It's a pretty natural implementation, especially for fixed width circuits where a carry flag to hold the "Nth+1" bit may not exist, like RISC-V.

15. jcranmer ◴[31 Mar 25 20:52 UTC] No.43539840{4}[source]▶

>>43539519 #

> You can get exactly the same "benefits" without the side effects by simply making signed overflow unspecified rather than undefined.

Actually, you don't. Unspecified behavior means that you have some sort of limited "blast radius" for the behavior; in LLVM's terminology, it would be roughly equivalent to "freeze poison"--i.e., the overflow returns some value, which is chosen nondeterministically, but that is the limit you can do with it. By contrast, the way LLVM handles undefined overflow is to treat it as "poison", which itself is already weaker than C/C++'s definition of UB (which LLVM also has) [1].

Now poison seems weirder in that it has properties like "can be observed to be a different value by different uses (even within the same instruction)." But it also turns out from painful experience that if you don't jump to that level of weird behavior, you end up accidentally making optimizations like "x * 2" -> "x + x" illegal because oops our semantics accidentally makes the number of uses of a value something that can't be increased because formal semantics are hard. (Hats off to Nuno Lopes et al for working on this! We need more formalism in our production compilers!)

[1] IMHO, C itself could benefit from fronting a definition of poison-like UB to the standard, rather than having just one kind of undefined behavior. But that also requires getting the committee to accept that UB isn't an inherently evil thing that needs to be stamped out at all costs, and I don't have the energy to push hard on that front myself.

replies(2): >>43540139 #>>43543491 #

16. ◴[31 Mar 25 21:01 UTC] No.43539940[source]▶

>>43538056 #

17. steveklabnik ◴[31 Mar 25 21:04 UTC] No.43539954[source]▶

>>43538056 #

You're right that this test would be UB for signed integers.

See here for that in action, as well as one way to test it that does work: https://godbolt.org/z/sca6hxer4

If you're on C23, uercker's advice to use these standardized functions is the best, of course.

18. AlotOfReading ◴[31 Mar 25 21:23 UTC] No.43540139{5}[source]▶

>>43539840 #

My understanding is that unspecified behavior is poison as the standard understands it. The standard pretty much days that implementation being free to do whatever it thinks is reasonable. I'm not deeply familiar with the nuances of LLVM's terminology here, but that would be frozen poison I think?

replies(1): >>43543516 #

19. amavect ◴[31 Mar 25 23:05 UTC] No.43540958{3}[source]▶

>>43538933 #

I feel like this all motivates for a very expressive type system for integers. Add different types for wraparound, saturation, trapping, and undefined. Probably require theorem proving in the language to provably never overflow for undefined overflow integers.

>knowing that (A + (B + C)) can't overflow doesn't mean that ((A + B) + C) can't overflow

Here, the associative property works for unsigned integers, but those don't get the optimizations for assuming overflow can't happen, which feels very disappointing. Again, adding more types would make this an option.

replies(1): >>43554303 #

20. amavect ◴[31 Mar 25 23:26 UTC] No.43541133[source]▶

>>43538056 #

>unsigned is trivial (just test `a+b < b`)

Nitpicking, the test itself should avoid overflowing. Instead, test "a <= UINT_MAX - b" to prove no overflow occurs.

For signed integers, we need to prove the following without overflowing in the test: "a+b <= INT_MAX && a+b >= INT_MIN". The algorithm follows: test "b >= 0", which implies "INT_MAX-b <= INT_MAX && a+b >= INT_MIN", so then test "a <= INT_MAX-b". Otherwise, "b < 0", which implies "INT_MIN-b >= INT_MIN && a+b <= INT_MAX", so then test "a >= INT_MIN-b".

replies(1): >>43542957 #

21. AlotOfReading ◴[01 Apr 25 02:36 UTC] No.43542268{3}[source]▶

>>43538229 #

It's worth mentioning that GCC's ftrapv has been unreliable and partially broken for 20+ years at this point. It's recommended that you use the fsanitize traps instead, and there's an open ticket to switch the ftrapv implementation over to using it under the hood:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101521

replies(1): >>43549143 #

22. lelanthran ◴[01 Apr 25 04:49 UTC] No.43542957{3}[source]▶

>>43541133 #

> Nitpicking, the test itself should avoid overflowing.

Why? Overflowing is well defined for unsigned.

replies(1): >>43552153 #

23. uecker ◴[01 Apr 25 06:33 UTC] No.43543491{5}[source]▶

>>43539840 #

Clang now has "frozen poison" is because the original poison was essentially a flawed concept that lead to incorrect optimizations. I certainly do no think we should import this into the C standard.

I similar to the Microsoft guys inventing time-travel UB which lead to their compiler being broken. As a standard committee we need to push back against such nonsense instead of endorsing it. I am all for formalizing this stuff precisely though.

replies(1): >>43546491 #

24. uecker ◴[01 Apr 25 06:37 UTC] No.43543516{6}[source]▶

>>43540139 #

Unspecified means you pick a value at a specific point in time (the standard defines this point in time) and then you treat is a regular value. It is not the same as poison. It might be similar to frozen poison, but I am not 100% sure about clang does there.

25. uecker ◴[01 Apr 25 06:47 UTC] No.43543580[source]▶

>>43538186 #

One can have sanitizers also for defined behavior. The issue is that a sanitizer that has no false positives is about 100x more useful than a sanitizer that has false positives. You can treat each case where a sanitizer detects signed overflow as an error, even in production. You can not do this same when the behavior is defined and not an error. (you can make it an error and still define it, but there is not much of a practical difference)

replies(1): >>43543907 #

26. tsimionescu ◴[01 Apr 25 07:37 UTC] No.43543907{3}[source]▶

>>43543580 #

If you think signed overflow is a mistake, you could forbid it from your code base, even if it weren't UB, and then any instance of it that a sanitizer finds would not be a true positive, because your code style forbids signed integer overflow.

replies(2): >>43545471 #>>43579710 #

27. uecker ◴[01 Apr 25 11:26 UTC] No.43545471{4}[source]▶

>>43543907 #

If you only have little self-contained projects where you can control everything, then yes.

28. jcranmer ◴[01 Apr 25 13:21 UTC] No.43546491{6}[source]▶

>>43543491 #

It was undef that was broken, not poison.

(See https://llvm.org/devmtg/2016-11/Slides/Lopes-LongLivePoison.... for the start of the discussion that leads to the development of the current situation).

replies(1): >>43548932 #

29. uecker ◴[01 Apr 25 16:48 UTC] No.43548932{7}[source]▶

>>43546491 #

Ah right, thanks! undef is what I meant.

30. ndiddy ◴[01 Apr 25 17:06 UTC] No.43549143{4}[source]▶

>>43542268 #

Thanks, I hadn't heard of that.

31. amavect ◴[01 Apr 25 22:56 UTC] No.43552153{4}[source]▶

>>43542957 #

Personal preference, hence nitpicking. It forms a special case of the signed integer algorithm, which feels nice.

32. uecker ◴[02 Apr 25 07:00 UTC] No.43554303{4}[source]▶

>>43540958 #

I am not sure more types are the solution. I like types, but I do not like complicated things.

The practical solution is simply -fsanitize=signed-integer-overflow. If you need complete assurance that there can not be a trap at run-time, in the rare case where I wanted this, just looking at the optimized code and making sure the traps have been optimized out was surprisingly effective.

33. fc417fc802 ◴[04 Apr 25 08:41 UTC] No.43579710{4}[source]▶

>>43543907 #

You can do even better than this. You can _make_ it undefined behavior with __builtin_unreachable().

No idea if the optimization potential is the same but at least then you can feel like your sanitizer is necessary in all cases (to help you find and add these calls).