←back to thread

93 points endorphine | 8 comments | | HN request time: 1.369s | source | bottom
Show context
pcwalton ◴[] No.43537392[source]

I was disappointed that Russ didn't mention the strongest argument for making arithmetic overflow UB. It's a subtle thing that has to do with sign extension and loops. The best explanation is given by ryg here [1].

As a summary: The most common way given in C textbooks to iterate over an array is "for (int i = 0; i < n; i++) { ... array[i] ... }". The problem comes from these three facts: (1) i is a signed integer; (2) i is 32-bit; (3) pointers nowadays are usually 64-bit. That means that a compiler that can't prove that the increment on "i" won't overflow (perhaps because "n" was passed in as a function parameter) has to do a sign extend on every loop iteration, which adds extra instructions in what could be a hot loop, especially since you can't fold a sign extending index into an addressing mode on x86. Since this pattern is so common, compiler developers are loath to change the semantics here--even a 0.1% fleet-wide slowdown has a cost to FAANG measured in the millions.

Note that the problem goes away if you use pointer-width indices for arrays, which many other languages do. It also goes away if you use C++ iterators. Sadly, the C-like pattern persists.

[1]: https://gist.github.com/rygorous/e0f055bfb74e3d5f0af20690759...

replies(6): >>43537702 #>>43537771 #>>43537976 #>>43538026 #>>43538237 #>>43538348 #
1. dcrazy ◴[] No.43537771[source]

The C language does not specify that `int` is 32-bits. That is a choice made by compiler developers to make compiling non-portable code written for 32-bit platforms easier, because most codebases wind up baking in assumptions about variable sizes.

In Swift, for example, `Int` is 64 bits wide on 64-bit targets. If we ever move to 128-bit CPUs, the Swift project will be forced to decide to stick to their guns or make `Int` 64-bits on 128-bit targets.

replies(1): >>43537843 #
2. pcwalton ◴[] No.43537843[source]

> The C language does not specify that `int` is 32-bits. That is a choice made by compiler developers to make compiling non-portable code written for 32-bit platforms easier, because most codebases wind up baking in assumptions about variable sizes.

Making int 32-bit also results in not-insignificant memory savings.

replies(1): >>43538095 #
3. bobmcnamara ◴[] No.43538095[source]

And even wastes cycles on 16bit size_t MCUs.

replies(2): >>43538244 #>>43538284 #
4. moefh ◴[] No.43538244{3}[source]

Is there any MCU where `size_t` is 16 bits but `int` is 32 bits? I'm genuinely curious, I have never seen one.

replies(2): >>43538605 #>>43541701 #
5. dcrazy ◴[] No.43538284{3}[source]

Now that you mention it, at least on Wintel compiler vendors did not preserve the definition of `int` during the transition from 16-bit to 32-bit. I started in the 386 era myself so I have no frame of reference for porting code from 286. But Windows famously retains a lot of 16-bit heritage, such as defining `DWORD` as 32 bits, making it now a double-anachronism. I wonder if the decision to model today’s popular 64-bit processors as LP64 is related to not wanting to go through that again.

Edit: of course, I completely forgot that Windows chose LLP64, not LP64, for x86_64 and AArch64. Raymond Chen has commented on this [1], but only as an addendum to reasons given elsewhere which have since bitrotted.

[1]: https://devblogs.microsoft.com/oldnewthing/20050131-00/?p=36...

replies(1): >>43541674 #
6. dcrazy ◴[] No.43538605{4}[source]

Me either, but it wouldn’t be unreasonable if the target has 32-bit ALUs but only 16 address lines and no MMU.

7. bobmcnamara ◴[] No.43541674{4}[source]

Some of the 8-bit MCUs I started with defaulted to standards noncompliant 8-bit int. 16-bit was an option, but slower and took much more code.

8. bobmcnamara ◴[] No.43541701{4}[source]

The original 32-bit machine, the Manchester Baby, would've likely had a 32-bit int, but with only 32 words of RAM, C would be rather limited, though static-stack implementations would work.