Most active commenters

pcwalton(9)
AlotOfReading(4)
dcrazy(3)
bobmcnamara(3)
frumplestlatz(3)

Popular/hot comments

>>43537833 #

←back to thread

C and C++ prioritize performance over correctness (2023)

(research.swtch.com)

1. pcwalton ◴[31 Mar 25 17:18 UTC] No.43537392[source]▶

>>43509524 (OP) #

I was disappointed that Russ didn't mention the strongest argument for making arithmetic overflow UB. It's a subtle thing that has to do with sign extension and loops. The best explanation is given by ryg here [1].

As a summary: The most common way given in C textbooks to iterate over an array is "for (int i = 0; i < n; i++) { ... array[i] ... }". The problem comes from these three facts: (1) i is a signed integer; (2) i is 32-bit; (3) pointers nowadays are usually 64-bit. That means that a compiler that can't prove that the increment on "i" won't overflow (perhaps because "n" was passed in as a function parameter) has to do a sign extend on every loop iteration, which adds extra instructions in what could be a hot loop, especially since you can't fold a sign extending index into an addressing mode on x86. Since this pattern is so common, compiler developers are loath to change the semantics here--even a 0.1% fleet-wide slowdown has a cost to FAANG measured in the millions.

Note that the problem goes away if you use pointer-width indices for arrays, which many other languages do. It also goes away if you use C++ iterators. Sadly, the C-like pattern persists.

[1]: https://gist.github.com/rygorous/e0f055bfb74e3d5f0af20690759...

replies(6): >>43537702 #>>43537771 #>>43537976 #>>43538026 #>>43538237 #>>43538348 #

2. AlotOfReading ◴[31 Mar 25 17:47 UTC] No.43537702[source]▶

>>43537392 (TP) #

There's half a dozen better ways that could have been addressed anytime in the past decade.

Anything from making it implementation defined to unspecified behavior to just throwing a diagnostic warning or having a clang-tidy performance rule.

I'm also incredibly suspicious of the idea that FAANG in particular won't accept minor compiler slowdowns for useful safety. Google and Apple for example have both talked publicly about how they're pushing bounds checking by default internally and you can see that in the Apple Buffer hardening RFC and the Abseil hardening modes.

replies(1): >>43537833 #

3. dcrazy ◴[31 Mar 25 17:54 UTC] No.43537771[source]▶

>>43537392 (TP) #

The C language does not specify that `int` is 32-bits. That is a choice made by compiler developers to make compiling non-portable code written for 32-bit platforms easier, because most codebases wind up baking in assumptions about variable sizes.

In Swift, for example, `Int` is 64 bits wide on 64-bit targets. If we ever move to 128-bit CPUs, the Swift project will be forced to decide to stick to their guns or make `Int` 64-bits on 128-bit targets.

replies(1): >>43537843 #

4. pcwalton ◴[31 Mar 25 18:00 UTC] No.43537833[source]▶

>>43537702 #

> Anything from making it implementation defined to unspecified behavior to just throwing a diagnostic warning or having a clang-tidy performance rule.

To be clear, you're proposing putting a warning on "for (int i = 0; i < n; i++)"? The most common textbook way to write a loop in C?

> I'm also incredibly suspicious of the idea that FAANG in particular won't accept minor compiler slowdowns for useful safety.

I worked on compilers at FAANG for quite a while and know quite well how these teams justify their existence. Telling executives "we cost the company $1M a quarter, but good news, we made the semantics of the language easier for programming language nerds to understand" instead of "we saved the company $10M last quarter" is an excellent strategy for getting the team axed next time downsizing comes around.

replies(4): >>43538103 #>>43538139 #>>43538208 #>>43551655 #

5. pcwalton ◴[31 Mar 25 18:01 UTC] No.43537843[source]▶

>>43537771 #

> The C language does not specify that `int` is 32-bits. That is a choice made by compiler developers to make compiling non-portable code written for 32-bit platforms easier, because most codebases wind up baking in assumptions about variable sizes.

Making int 32-bit also results in not-insignificant memory savings.

replies(1): >>43538095 #

6. tmoravec ◴[31 Mar 25 18:15 UTC] No.43537976[source]▶

>>43537392 (TP) #

size_t has been in the C standard since C89. "for (int i = 0..." might have it's uses so it doesn't make sense to disallow it. But I'd argue that it's not really a common textbook way to iterate over an array.

replies(1): >>43537997 #

7. pcwalton ◴[31 Mar 25 18:17 UTC] No.43537997[source]▶

>>43537976 #

The first example program that demonstrates arrays in The C Programming Language 2nd edition (page 22) uses signed integers for both the induction variable and the array length (the literal 10 becomes int).

replies(2): >>43538148 #>>43539066 #

8. Someone ◴[31 Mar 25 18:20 UTC] No.43538026[source]▶

>>43537392 (TP) #

> Sadly, the C-like pattern persists.

I think that’s the main problem: C-style “arrays are almost identical to pointers” and C-style for loops may be good ideas for the first version of your compiler, but once you’ve bootstrapped your compiler, you should ditch them.

9. bobmcnamara ◴[31 Mar 25 18:26 UTC] No.43538095{3}[source]▶

>>43537843 #

And even wastes cycles on 16bit size_t MCUs.

replies(2): >>43538244 #>>43538284 #

10. frumplestlatz ◴[31 Mar 25 18:27 UTC] No.43538103{3}[source]▶

>>43537833 #

Even if it is the most common method in text books (I’m not sure that’s true), it’s also almost always wrong. The index must always be sized to fit what you’re indexing over.

As for your compiler statement — yes. At least at Apple, there is ongoing clang compiler work, focused on security, that actively makes things slower, and there has been for years.

replies(1): >>43538414 #

11. AlotOfReading ◴[31 Mar 25 18:30 UTC] No.43538139{3}[source]▶

>>43537833 #

No, I'm saying that there could be anything from a one word change to the standard that doesn't affect compilers at all to safety by default with a clang tidy performance warning.

Clang tidy and the core guidelines have already broken the textbook Hello, World! with performance-avoid-endl warning, so I don't see why the common textbook way to write things should be our guiding principle here. Of course, the common textbook way to write things would continue working regardless, it'd just have a negligible performance cost.

replies(1): >>43538290 #

12. frumplestlatz ◴[31 Mar 25 18:31 UTC] No.43538148{3}[source]▶

>>43537997 #

The language has evolved significantly, and we’ve learned a lot about how to write safer C, since that was published in 1988.

13. agwa ◴[31 Mar 25 18:36 UTC] No.43538208{3}[source]▶

>>43537833 #

> I worked on compilers at FAANG for quite a while and know quite well how these teams justify their existence. Telling executives "we cost the company $1M a quarter, but good news, we made the semantics of the language easier for programming language nerds to understand" instead of "we saved the company $10M last quarter" is an excellent strategy for getting the team axed next time downsizing comes around.

And yet, Google is willing to take a performance hit of not 0.1% but 0.3% for improved safety: https://security.googleblog.com/2024/11/retrofitting-spatial...

And obviously there are better justifications for this than "we made the semantics of the language easier for programming language nerds to understand".

replies(1): >>43538271 #

14. sapiogram ◴[31 Mar 25 18:39 UTC] No.43538237[source]▶

>>43537392 (TP) #

Thank you so much for this comment. I think Russ Cox (along with many others) is way too quick to declare that removing one source of UB is worth a (purportedly) minuscule performance reduction. While I'm sure that's sometimes true, he hasn't measured it, and even a 1% slowdown of all C/C++ would have huge costs globally.

15. moefh ◴[31 Mar 25 18:40 UTC] No.43538244{4}[source]▶

>>43538095 #

Is there any MCU where `size_t` is 16 bits but `int` is 32 bits? I'm genuinely curious, I have never seen one.

replies(2): >>43538605 #>>43541701 #

16. pcwalton ◴[31 Mar 25 18:42 UTC] No.43538271{4}[source]▶

>>43538208 #

> And obviously there are better justifications for this than "we made the semantics of the language easier for programming language nerds to understand".

There are not. For all the noise that we make on message boards about signed overflow being UB, there are very few actual security problems traceable to it. The problems from it are largely theoretical.

That entire blog post talking about the 0.3% regression illustrates just how much of an effort it is, and how unusual it is, to make those kinds of changes. The reason why Google engineers managed to justify that change is that memory safety results in actual measurable security improvements by enforcing spatial safety. Signed overflow being UB just doesn't make the cut (and let's be clear: at that scale the cost of a 0.3% regression is measured in millions per year).

17. dcrazy ◴[31 Mar 25 18:43 UTC] No.43538284{4}[source]▶

>>43538095 #

Now that you mention it, at least on Wintel compiler vendors did not preserve the definition of `int` during the transition from 16-bit to 32-bit. I started in the 386 era myself so I have no frame of reference for porting code from 286. But Windows famously retains a lot of 16-bit heritage, such as defining `DWORD` as 32 bits, making it now a double-anachronism. I wonder if the decision to model today’s popular 64-bit processors as LP64 is related to not wanting to go through that again.

Edit: of course, I completely forgot that Windows chose LLP64, not LP64, for x86_64 and AArch64. Raymond Chen has commented on this [1], but only as an addendum to reasons given elsewhere which have since bitrotted.

[1]: https://devblogs.microsoft.com/oldnewthing/20050131-00/?p=36...

replies(1): >>43541674 #

18. pcwalton ◴[31 Mar 25 18:44 UTC] No.43538290{4}[source]▶

>>43538139 #

Have you measured the performance cost? I highly suspect it is not negligible, given that it's the most common way to write inner loops, which are the most performance-sensitive type of code.

replies(1): >>43538438 #

19. dooglius ◴[31 Mar 25 18:48 UTC] No.43538348[source]▶

>>43537392 (TP) #

As mentioned in the linked post, the compiler could in fact prove the increment on i wont overflow, and in my testing, -fwrapv does produce identical assembly (YMMV). The post talks about hypothetical more complex cases where the compiler would not prove the loop bound. But if -fwrapv semantics were mandated by spec, then presumably compilers would at least hardcode a check for such a common optimization (if they are not doing so already).

replies(1): >>43538430 #

20. pcwalton ◴[31 Mar 25 18:53 UTC] No.43538414{4}[source]▶

>>43538103 #

> Even if it is the most common method in text books (I’m not sure that’s true), it’s also almost always wrong. The index must always be sized to fit what you’re indexing over.

"The code was wrong, so it was OK that I made it slower" is a message board argument, not a business argument.

> As for your compiler statement — yes. At least at Apple, there is ongoing clang compiler work, focused on security, that actively makes things slower, and there has been for years.

The performance of code that runs on consumer devices has less of a measurable economic impact than that of code that runs on the server.

replies(1): >>43574658 #

21. pcwalton ◴[31 Mar 25 18:54 UTC] No.43538430[source]▶

>>43538348 #

> But if -fwrapv semantics were mandated by spec, then presumably compilers would at least hardcode a check for such a common optimization (if they are not doing so already).

I don't know what this means. The optimization becomes invalid if fwrapv is mandated, so compilers can't do it anymore.

replies(1): >>43538931 #

22. AlotOfReading ◴[31 Mar 25 18:55 UTC] No.43538438{5}[source]▶

>>43538290 #

There's a range of possible performance costs. Let's talk about the lowest: the standard changes "the behavior is undefined" to "the behavior is unspecified". Everything else is the same, except that suddenly your program still has semantic meaning if there's signed overflow.

Why would that be problematic?

replies(1): >>43538560 #

23. pcwalton ◴[31 Mar 25 19:06 UTC] No.43538560{6}[source]▶

>>43538438 #

What does "unspecified" mean in this context? If you mean "the value of a signed integer is unspecified if it overflows", then I can still construct cases in which you lose the optimization that widens a loop induction variable from 32-bit to 64-bit. For example, take this code:

    for (int i = 0; i != n; i++)
        sum += a[i];

If n is -2, will this function include the element at a[(uint64_t)INT_MAX + 1] in the sum? If signed overflow is UB, then the answer is "maybe", and thus i and n can legally be promoted to 64-bit. If signed overflow is unspecified, but the result must still fit in a 32-bit integer, then the answer is "no", and i and n can't legally be promoted.

replies(1): >>43538650 #

24. dcrazy ◴[31 Mar 25 19:10 UTC] No.43538605{5}[source]▶

>>43538244 #

Me either, but it wouldn’t be unreasonable if the target has 32-bit ALUs but only 16 address lines and no MMU.

25. AlotOfReading ◴[31 Mar 25 19:14 UTC] No.43538650{7}[source]▶

>>43538560 #

"unspecified" in the standards sense of "unspecified behavior", i.e. any chosen behavior is permissible. The compiler doesn't have to document the behavior and the behavior isn't constrained, it just isn't undefined. That's still better than where we are today.

replies(1): >>43539976 #

26. dooglius ◴[31 Mar 25 19:36 UTC] No.43538931{3}[source]▶

>>43538430 #

The optimization is still valid under -fwrapv semantics. To see this, observe the invariant (0 <= i && i < count) when entering the loop body, and (i==0 || (0 < i && i <= count)) when doing the loop test -- in particular, 0<=i<INT_MAX when entering the loop body (since count <= INT_MAX), so wraparound cannot occur.

27. Maxatar ◴[31 Mar 25 19:48 UTC] No.43539066{3}[source]▶

>>43537997 #

From what I see, that book was published in 1988.

28. steveklabnik ◴[31 Mar 25 21:06 UTC] No.43539976{8}[source]▶

>>43538650 #

I looked this up after you mentioned it upthread, I thought you meant implementation-defined, but I learned that yeah, this exists too. The difference is that implementation-defined requires documentation of the choice made, whereas unspecified does not. However,

> the behavior isn't constrained

C++23, at least, does say

> The range of possible behaviors is usually delineated by this document.

So there are sometimes some constraints.

29. bobmcnamara ◴[01 Apr 25 00:48 UTC] No.43541674{5}[source]▶

>>43538284 #

Some of the 8-bit MCUs I started with defaulted to standards noncompliant 8-bit int. 16-bit was an option, but slower and took much more code.

30. bobmcnamara ◴[01 Apr 25 00:53 UTC] No.43541701{5}[source]▶

>>43538244 #

The original 32-bit machine, the Manchester Baby, would've likely had a 32-bit int, but with only 32 words of RAM, C would be rather limited, though static-stack implementations would work.

31. UncleMeat ◴[01 Apr 25 21:45 UTC] No.43551655{3}[source]▶

>>43537833 #

I know the team at Google that is doing exactly this. They've very explicitly accepted a small but significant performance overhead in order to improve safety.

32. frumplestlatz ◴[03 Apr 25 20:01 UTC] No.43574658{5}[source]▶

>>43538414 #

> “The code was wrong, so it was OK that I made it slower" is a message board argument, not a business argument.

And yet, here I am, in a business, watching that argument play out in realtime.

> The performance of code that runs on consumer devices has less of a measurable economic impact than that of code that runs on the server.

The performance of the device literally in the entire world’s hands every day is arguably quite a bit more impactful that Facebook having to buy more servers.

↑