Compiler Options Hardening Guide for C and C++

If my C++ project is a simple utility supposed to take some files, crunch numbers, and spit out results, is there still the possibility it can be used for nefarious purposes?

replies(4): >>43535127 #>>43535170 #>>43535744 #>>43542888 #

7. tuananh ◴[31 Mar 25 12:53 UTC] No.43534463[source]▶

>>43533516 (OP) #

Wolfi OS (by Chainguard) is one of a few decided to adopt openssf compiler options

https://github.com/wolfi-dev/os/blob/main/openssf-compiler-o...

replies(1): >>43544071 #

8. derriz ◴[31 Mar 25 12:58 UTC] No.43534525[source]▶

>>43533516 (OP) #

Sane defaults should be table stakes for toolchains but C++ has "history".

All significant C++ code-bases and projects I've worked on have had 10s of lines (if not screens) of compiler and linker options - a maintenance nightmare particularly with stuff related to optimization. This stuff is so brittle, who knows when (with which release of the compiler or linker) a particular combination of optimization flags were actually beneficial? How do you regression test this stuff? So everyone is afraid to touch this stuff.

Other compiled languages have similar issues but none to the extent of C++ that I've experienced.

replies(4): >>43534781 #>>43535229 #>>43535747 #>>43543362 #

9. nly ◴[31 Mar 25 13:24 UTC] No.43534781[source]▶

>>43534525 #

I've rarely seen more than a handful of compiler options even on very large codebase

If anything there's tonnes people should be using more of.

The problem with all these hardening options though is they noticeably reduce performance

replies(1): >>43535548 #

10. fuhsnn ◴[31 Mar 25 13:55 UTC] No.43535105[source]▶

>>43533516 (OP) #

If you are taking notes, add `-fzero-init-padding-bits=all` to the list, without this flag, GCC 15 onwards will not zero-initialize a full union if you wrote pre-C23 style ={0} and the largest member is not the first one. `-ftrivial-auto-var-init` cannot help this case. https://godbolt.org/z/7zKccfnea

replies(1): >>43539467 #

11. thfuran ◴[31 Mar 25 13:56 UTC] No.43535127[source]▶

>>43534439 #

How does it get its input files? Where does it run? What's the output used for?

12. kibwen ◴[31 Mar 25 14:00 UTC] No.43535170[source]▶

>>43534439 #

It doesn't matter what the tool does, what matters is 1) whether it is ever exposed to untrusted input, 2) what permissions it has.

If you don't ever expose something to untrusted input, then you're probably fine. But be VERY careful, because you should defensively consider anything downloaded off the internet to be untrusted input.

As for permissions, if you run a tool inside of a sandbox inside of a virtual machine on an airgapped computer inside a Faraday cage six stories underground, then you're probably fine.

13. duped ◴[31 Mar 25 14:05 UTC] No.43535229[source]▶

>>43534525 #

I mean if you emit compiler commands from any build system they're going to be completely illegible due to the number of -L,-l,-I,-i,-D flags which are mostly generated by things like pkg-config and your build configuration.

There's not many optimization flags that people get fine grained with, the exception being floating point because -ffast-math alone is extremely inadvisable

replies(2): >>43535861 #>>43547656 #

14. stabbles ◴[31 Mar 25 14:19 UTC] No.43535355[source]▶

>>43533516 (OP) #

> The keyword $ORIGIN in rpath is expanded by the dynamic loader to the path of the directory where the object is found, which may be set by an attacker (e.g., via hard links) to a directory with a malicious dependency. On Linux, the fs.protected_hardlinks sysctl can help prevent this attack.

This has nothing to do with hardlinks, the same applies to symlinks. On linux the status quo is that the dynamic loader finds the library by symlink, the convention is `libfoo.so.x -> libfoo.so.a.b.c` where `x` is the ABI version and `a.b.c` the full version.

But if `libfoo.so.x -> /absolute/path/libfoo.so.a.b.c` and it has `$ORIGIN/libbar.so.y` in DT_NEEDED, those are resolved relative to the dir of the symlink, not to realpath of the symlink.

That makes sense, cause it would be a lot of startup overhead to lstat every path component of every library that uses $ORIGIN.

I don't see the point of including this gotcha in a security overview to be honest.

15. grandempire ◴[31 Mar 25 14:30 UTC] No.43535490[source]▶

>>43533516 (OP) #

> Our threat model is that all software developers make mistakes, and sometimes those mistakes lead to vulnerabilities

That’s not a threat model. What are the attackers going to do if there are vulnerabilities in your executable? Is it connected to a web server?

Does it have access to privileged resources?

replies(1): >>43536014 #

16. grandempire ◴[31 Mar 25 14:35 UTC] No.43535548{3}[source]▶

>>43534781 #

> The problem with all these hardening options though is they noticeably reduce performance

Yep. What I would really like is 2 lists, one for debug/checked mode and one for release.

17. duped ◴[31 Mar 25 14:54 UTC] No.43535744[source]▶

>>43534439 #

Read/write access to a filesystem is a pretty large surface area for attack, so yes.

18. rollcat ◴[31 Mar 25 14:54 UTC] No.43535747[source]▶

>>43534525 #

It's because the UB must be continuously exploited by compilers for that extra 1% perf gain.

I've been eyeing Zig recently. It makes a lot of choices straightforward yet explicit, e.g. you choose between four optimisation strategies: debug, safety, size, perf. Individual programs/libraries can have a default or force one (for the whole program or a compilation unit), but it's customary to delegate that choice to the person actually building from source.

Even simpler story with Go. It's been designed by people who favour correctness over performance, and most compiler flags (like -race, -asan, -clobberdead) exist to help debug problems.

I've been observing a lot of people complain about declining software quality; yearly update treadmills delivering unwanted features and creating two bugs for each one fixed. Simplicity and correctness still seem to be a niche thing; I salute everyone who actually cares.

replies(1): >>43539554 #

19. z_open ◴[31 Mar 25 14:57 UTC] No.43535776[source]▶

>>43534297 #

His opinions on include files have fallen out of favor because compiling is faster and it adds needless work. Are there organizations that still do this? All the style guides I've seen do not.

replies(3): >>43535836 #>>43536213 #>>43537091 #

20. dapperdrake ◴[31 Mar 25 15:01 UTC] No.43535836{3}[source]▶

>>43535776 #

If your filesystem and disks are fast enough, then maybe Rob's assumptions don't apply.

21. dapperdrake ◴[31 Mar 25 15:04 UTC] No.43535861{3}[source]▶

>>43535229 #

-ffast-math and -Ofast are inadvisable on principle:

Tl;dr: python gevent messes up your x87 float registers (yes.)

https://moyix.blogspot.com/2022/09/someones-been-messing-wit...

replies(2): >>43535920 #>>43542503 #

22. duped ◴[31 Mar 25 15:08 UTC] No.43535920{4}[source]▶

>>43535861 #

I disagree with "on principle." There are flaws in the design of IEEE 754 and omitting strict adherence for the purposes of performance is fine, if not required for some applications.

For example, recursive filters (even the humble averaging filter) will suffer untold pain without enabling DAZ/FTZ mode.

fwiw the linked issue has been remedied in recent compilers and isn't a python problem, it's a gcc problem. Even that said, if your algorithm requires subnormal numbers, for the love of numeric stability, guard your scopes and set the mxcsr register accordingly!

replies(3): >>43536009 #>>43536613 #>>43538502 #

23. javier_e06 ◴[31 Mar 25 15:16 UTC] No.43535998[source]▶

>>43533516 (OP) #

Last week a build broke because there was space after the Wl, some-linker-option The Warning messages can't be very challenging to decipher.

Most importantly: Are the warnings show-stoppers? Not in part of my pay grade.

There is a pragma to ignore specific warnings. This is "#pragma GCC diagnostic ignore "some-compiler-warning" which is useful when dealing with several versions of the GCC compiler.

Yes, it happens.

replies(1): >>43536289 #

24. dapperdrake ◴[31 Mar 25 15:17 UTC] No.43536009{5}[source]▶

>>43535920 #

In practice, "some applications" seems to include almost all of NumPy and Python. Good call.

Like with the Java sin() fixes: if you don't care about the results being correct why not constant-fold an arbitrary number? Way faster at run-time.

replies(1): >>43536528 #

25. steveklabnik ◴[31 Mar 25 15:17 UTC] No.43536014[source]▶

>>43535490 #

They're using it in the sense of "the scope of this document covers this scenario," so the answer to all of your questions are out of scope.

26. ryandrake ◴[31 Mar 25 15:32 UTC] No.43536213{3}[source]▶

>>43535776 #

I still adhere to this for personal hobby projects, more out of a sense of craftsmanship than anything practical at this point.

27. mid-kid ◴[31 Mar 25 15:36 UTC] No.43536251[source]▶

>>43533516 (OP) #

While all of these are very useful, you'll find that a lot of these are already enabled by default in many distributions of the gcc compiler. Sometimes they're embedded in the compiler itself through a patch or configure flag, and sometimes they're added through CFLAGS variables during the compilation of distribution packages. I can only really speak of gentoo, but here's a non-exhaustive list:

* -fPIE is enabled with --enable-default-pie in GCC's ./configure script

* -fstack-protector-strong is enabled with --enable-default-ssp in GCC's ./configure script

* -Wl,-z,relro is enabled with --enable-relro in Binutils' ./configure script

* -Wp,-D_FORTIFY_SOURCE=2, -fstack-clash-protection, -Wl,-z,now and -fcf-protection=full are enabled by default through patches to GCC in Gentoo.

* -Wl,--as-needed is enabled through the default LDFLAGS

For reference, here's the default compiler flags for a few other distributions. Note that these don't include GCC patches:

* Arch Linux: https://gitlab.archlinux.org/archlinux/packaging/packages/pa...

* Alpine Linux: https://gitlab.alpinelinux.org/alpine/abuild/-/blob/master/d...

* Debian: It's a tiny bit more obscure, but running `dpkg-buildflags` on a fresh container returns the following: CFLAGS=-g -O2 -Werror=implicit-function-declaration -ffile-prefix-map=/home/<myuser>=. -fstack-protector-strong -fstack-clash-protection -Wformat -Werror=format-security -fcf-protection

replies(2): >>43536607 #>>43551959 #

28. ryandrake ◴[31 Mar 25 15:39 UTC] No.43536289[source]▶

>>43535998 #

> Most importantly: Are the warnings show-stoppers? Not in part of my pay grade.

The best places (code quality wise) I've ever worked were the strictest on compiler warnings. Turn on all warnings, turn on extra warnings, treat warnings as errors, and forbid disabling warnings via #pragma. The absolute worst was the one where compiling the software using the compiler's default warning level produced a deluge of 40,000 warnings, and the culture was to disable warnings when they became annoying (vs. you know, fixing them).

My philosophy: Compilers don't issue warnings for fun. Every one of them is a potential problem and they are almost always worth fixing.

I also adhere to this in my personal hobby projects, too. It can be challenging when integrating with third party libraries, where the library maintainer doesn't care as much. I once submitted a patch to an open source project I won't name here, which fixed a bunch of warnings that seem to be only present in macOS builds (XCode's defaults tend to be quite strict). The response was not to merge it because "I don't regularly do macOS builds, and besides, they're just warnings." Alright, bro, sorry I tried to help.

29. duped ◴[31 Mar 25 16:02 UTC] No.43536528{6}[source]▶

>>43536009 #

All numerical methods define "correct" to be within a range or to some precision. There are very few algorithms that require FTZ mode to be "correct" - the linked article and the article it links don't even have an example (there are good examples of where say, -ffinite-math is super dangerous, because inf/NaNs are way more common than arithmetic on subnormal numbers).

And yea, the fact that crt1.o being linked into shared libraries fucking up the precision of some computations depending on library dependencies (and the order they're loaded!) was bad.. but it lingered in the entire Linux ecosystem for over a decade. So how bad was it, if it took that long to notice?

If you have a numerical algorithm that requires subnormal arithmetic to converge, a) don't that's super shaky, b) set/unset mxcsr at the top/bottom of your function and ensure you never unwind the stack without resetting it. It's preserved across context switches so you're not going to get blown away by the OS scheduler.

This isn't practical numerical methods in C 101 but it's at least 201. In practice you don't trust floats for bit exact math. Use different types for that.

replies(1): >>43537192 #

30. jpfr ◴[31 Mar 25 16:07 UTC] No.43536607[source]▶

>>43536251 #

Most of these are implicit with -fhardened.

https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.h...

replies(1): >>43537145 #

31. usefulcat ◴[31 Mar 25 16:08 UTC] No.43536613{5}[source]▶

>>43535920 #

A big problem with -ffast-math is that it causes isnan and isinf to be completely, silently broken (gcc and clang).

Like, "oh you want faster FP operations? well then surely you have no need to ever be able to detect infinite or NaN values.."

replies(1): >>43546817 #

32. csb6 ◴[31 Mar 25 16:50 UTC] No.43537091{3}[source]▶

>>43535776 #

I believe clang and gcc avoid reading in and re-processing include files that are already included, so his advice is unnecessary and creates a lot of maintenance burden, especially for C++ where a lot more code is in header files. It may still be useful for old compilers, though.

replies(1): >>43537546 #

33. dapperdrake ◴[31 Mar 25 16:54 UTC] No.43537145{3}[source]▶

>>43536607 #

Finally.

34. dapperdrake ◴[31 Mar 25 16:59 UTC] No.43537192{7}[source]▶

>>43536528 #

IEEE 754 defaults are for people who don't get deeply into numerical analysis and Cauchy sequences. Like, ostensibly, most FOSS maintainers. Or most people who write software in general.

There are people that do. HPC and the demoscene have numerous examples. Most of the people I met here are capable of reading gcc's manual and picking the optimizations they actually need. And they know how to debug this stuff.

If it's not obvious who gcc's defaults should cater to, then redefine human-friendly until it becomes obvious.

35. kevin_thibedeau ◴[31 Mar 25 17:33 UTC] No.43537546{4}[source]▶

>>43537091 #

They recognize include guards and skip any further inclusions for those cases. There are scenarios where you may want multiple inclusion and you can still have that.

36. a_e_k ◴[31 Mar 25 19:01 UTC] No.43538502{5}[source]▶

>>43535920 #

I find that building and testing my code with -Ofast and -ffast-math from the beginning helps to avoid a lot of the issues with them. Any new code that breaks with them on probably wasn't particularly stable anyway and should be rethought.

37. naitgacem ◴[31 Mar 25 20:20 UTC] No.43539467[source]▶

>>43535105 #

I have been always stuck with C99, what is the "post" C23 way that will zero initialize a full union?

Or am I misunderstanding this?

replies(1): >>43539815 #

38. nayuki ◴[31 Mar 25 20:27 UTC] No.43539554{3}[source]▶

>>43535747 #

> It's because the UB must be continuously exploited by compilers for that extra 1% perf gain.

Your framing of a compiler exploiting UB in programs to gain performance, has an undeserved negative connotation. The fact is, programs are mathematical structures/arguments, and if any single step in the program code or execution is wrong, no matter how small, it can render the whole program invalid. Drawing from math analogies where one wrong step leads to an absurd conclusion:

* https://en.wikipedia.org/wiki/All_horses_are_the_same_color

* https://en.wikipedia.org/wiki/Principle_of_explosion

* https://proofwiki.org/wiki/False_Statement_implies_Every_Sta...

* https://en.wikipedia.org/wiki/Mathematical_fallacy#Division_...

Back to programming, hopefully this example will not be controversial: If a program contains at least one write to an arbitrary address (e.g. `*(char*)0x123 = 0x456;`), the overall behavior will be unpredictable and effectively meaningless. In this case, I would fully agree with a compiler deleting, reordering, and manipulating code as a result of that particular UB.

You could argue that C shouldn't have been designed so that reading out of bounds is UB. Instead, it should read some arbitrary value without crashing or cleanly segfault at that instruction, with absolutely no effects on any surrounding code.

You could argue that C/C++ shouldn't have made it UB to dereference a null pointer for reading, but I fully agree that dereferencing a null pointer for a method call or writing a field must be UB.

Another analogy in programming is, let's forget about UB. Let's say you're writing a hash table in Java (in the normal safe subset without using JNI or Unsafe). If you get even one statement wrong in the data structure implementation, there still might be arbitrarily large consequences like dropping values when you shouldn't, miscounting how many values exist, duplicating values when you shouldn't, having an incorrect state that causes subtle failures far in the future, etc. The consequences are not as severe and pervasive as UB at the language level, but it will still result in corrupt data and/or unpredictable behavior for the user of that library code, which can in turn have arbitrarily large consequences. I guess the only difference compared to C/C++ UB is that for C/C++, there is more "spooky action at a distance", where some piece of UB can have very non-local consequences. But even incorrect code in safe Java can produce large consequences, maybe just not as large on average.

I am not against compilers "exploiting" UB for performance gain. But these are the ways forward that I believe in, for any programming language in general:

* In the language specification, reduce the number of cases/places that are undefined. Not only does it reduce the chances of bad things happening, but it also makes the rules easier to remember for humans, thus making it easier to avoid triggering these cases.

* Adding to that point, favor compile-time errors over run-time UB. For example, reading from an uninitialized local variable is a compile error in Java but UB in C. Rust's whole shtick about lifetimes and borrowing is one huge transformation of run-time problems into compile-time problems.

* Overwhelmingly favor safety by default. For example, array accesses should be bounds-checked using the convenient operator like `array[index]`, whereas the unsafe unchecked version should be something obnoxious and ugly like `unsafe { array.get_unchecked(index) }`. Make the safe way easy and make the unsafe way hard - the exact opposite of C/C++.

* Provide good (and preferably complete) sanitizer tools to check that UB isn't triggered at run time. C/C++ did not have these for the first few decades of their lives, and you were flying blind when triggering UB.

replies(1): >>43543310 #

39. klysm ◴[31 Mar 25 20:49 UTC] No.43539814[source]▶

>>43533516 (OP) #

It would be really nice if we had a versioning scheme that enabled developers to get secure by default and opt into performance tradeoffs

40. dzaima ◴[31 Mar 25 20:49 UTC] No.43539815{3}[source]▶

>>43539467 #

`={}` in place of `={0}` is the new option in C23.

41. plainOldText ◴[31 Mar 25 21:03 UTC] No.43539946[source]▶

>>43534354 #

The 5-part series by Fabien Sanglard is really good. Thanks for sharing!

42. vkaku ◴[01 Apr 25 01:08 UTC] No.43541796[source]▶

>>43533516 (OP) #

It is hard that people have to remember these options on a per-compiler basis. I'd rather prefer people use easy to remember flags like `-O2` than the word soup mentioned here.

Compiler writers should revisit their option matrices and come up with easy defaults today.

Disclaimer: Used to work on the GCC code for option handling back in the day. Options like -O2 map to a whole bunch of individual options, and people only needed to remember adding -O2, which corresponded to different values in every era and yet subjectively meant: decently optimized code.

replies(1): >>43543061 #

43. bobmcnamara ◴[01 Apr 25 03:22 UTC] No.43542503{4}[source]▶

>>43535861 #

"what kind of math does the compile usually do without this funsafemath flag? Sad dangerous math?"

replies(1): >>43543114 #

44. rramadass ◴[01 Apr 25 04:35 UTC] No.43542888[source]▶

>>43534439 #

It depends on what exactly your program does and equally important, where it is deployed and used. Security is a matter of degree based on context i.e. there are levels of Security. It is not a all or nothing proposition.

If your program is going to be used for some non-critical work internally you don't have to bother much about attack surface/vectors etc. Just use some standard "healthy" compiler options and you are good.

If you would like to know more on this subject, i recommend reading the classic The Art of Software Security Assessment: Identifying and Preventing Software Vulnerabilities by Mark Dowd et al.

45. teo_zero ◴[01 Apr 25 05:12 UTC] No.43543061[source]▶

>>43541796 #

> I'd rather prefer people use easy to remember flags

Like -fhardened?

replies(1): >>43546439 #

46. dapperdrake ◴[01 Apr 25 05:22 UTC] No.43543114{5}[source]▶

>>43542503 #

There are things like floating point exceptions (IEEE 754) and subnormal numbers (close to zero, have less precision than the small approximation error "machine-epsilon"). The idea is to degrade gracefully. These additional features require additional transistors and processing which raises latency.

If you really know (and want to know) what you are doing, turning this stuff off may help. Some people even advocate brute-forcing all 2^32 single floats in your test cases, because it is kind if feasible to do so: https://news.ycombinator.com/item?id=34726919

47. motorest ◴[01 Apr 25 05:58 UTC] No.43543310{4}[source]▶

>>43539554 #

> Your framing of a compiler exploiting UB in programs to gain performance, has an undeserved negative connotation. The fact is, programs are mathematical structures/arguments, and if any single step in the program code or execution is wrong, no matter how small, it can render the whole program invalid.

You're failing to understand the problem domain, and consequently you're oblivious to how UB is actually a solution to problems.

There are two sides to UB: the one which is associated with erroneous programs, because clueless developers unwittingly do things that the standards explicitly states that lead to unknown and unpredictable behavior, and the one which leads to valid programs, because developers knowingly adopted an implementation that specifies exactly what behavior they should expect from doing things that the standards specify as UB.

Somehow, those who mindlessly criticize UB only parrot the simplistic take on UB, the "nasal demons" blurb. They don't even stop to think about what is undefined behavior and why would a programming language specification purposely leave specific behavior as undefined instead of unspecified or even implementation-defined. They do not understand what they are discussing and don't invest any moment trying to understand why things are the way they are, and what problems are solved by them. The just parrot cliches.

replies(2): >>43546745 #>>43552317 #

48. motorest ◴[01 Apr 25 06:09 UTC] No.43543362[source]▶

>>43534525 #

> Sane defaults should be table stakes for toolchains but C++ has "history".

Yes, it has. By "history" you actually mean "production software that is expected to not break just because someone upgrades a compiler". Yes, C++ does have a lot of that.

> All significant C++ code-bases and projects I've worked on have had 10s of lines (if not screens) of compiler and linker options - a maintenance nightmare particularly with stuff related to optimization.

No, not really. That is definitely not the norm, at all. I can tell you as a matter of fact that release builds of some production software that's even a household name is built with only a couple of basic custom compiler flags, such as specifying the exact version of the target language.

Moreover, if your project uses a build system such as CMake and your team is able to spend 5 minutes reading an onboarding guide onto modern CMake, you do not even need or care to set compiler flags. You set a few high-level target properties and you never look at it ever again.

replies(1): >>43545033 #

49. rurban ◴[01 Apr 25 08:08 UTC] No.43544071[source]▶

>>43534463 #

No, he does not. He skipped most warnings

50. rowanG077 ◴[01 Apr 25 10:20 UTC] No.43545033{3}[source]▶

>>43543362 #

> Yes, it has. By "history" you actually mean "production software that is expected to not break just because someone upgrades a compiler". Yes, C++ does have a lot of that.

I disagree. Disproportionately in my career random C and C++ code bases failed to build because some new warning was introduced. And this is precisely because compiler options are so bad in that a lot of projects do Wall, Wextra and Werror.

Also the way undefined behavior is exploited means that you don't really know of your software that worked fine 10 years ago will actually work fine today, unless you have exhaustive tests.

replies(1): >>43553616 #

51. vkaku ◴[01 Apr 25 13:15 UTC] No.43546439{3}[source]▶

>>43543061 #

Sure.

-f is technically machine independent.

-m should be used when having it implemented as machine dependent options.

So if you are telling me all these security features are only developed without requiring to implement per machine level support then it makes sense.

replies(2): >>43547705 #>>43591402 #

52. rollcat ◴[01 Apr 25 13:47 UTC] No.43546745{5}[source]▶

>>43543310 #

Perhaps I'm spoiled by ever so slightly higher-level languages, but it seems your entire point is that if a program is ever so slightly incorrect, the programmer (and/or the end user) should suffer all of the consequences.

From where I stand, compilers are tools to aid the programmer. We invented them, because we found out that it was more productive than writing machine code by hand[1]. If an off-by-one error or a null pointer dereference[2] in a trivial program can invoke time travel several frames up the call stack[3], it isn't just missing the entire point of having a compiler - it can drive people insane.

[1]: https://en.wikipedia.org/wiki/Grace_Hopper#UNIVAC

[2]: https://en.wikipedia.org/wiki/Tony_Hoare#Research_and_career

[3]: https://devblogs.microsoft.com/oldnewthing/20140627-00/?p=63...

As far as I can tell, no popular language created in the past 30 years (including those with official specs and multiple implementations) makes heavy use of UB.

replies(1): >>43573322 #

53. wavemode ◴[01 Apr 25 13:54 UTC] No.43546817{6}[source]▶

>>43536613 #

> well then surely you have no need to ever be able to detect infinite or NaN values

Well yeah, maybe I actually don't.

replies(1): >>43562214 #

54. dapperdrake ◴[01 Apr 25 15:04 UTC] No.43547656{3}[source]▶

>>43535229 #

It goes even further.

Technically, the compilers can choose to make undefined-behavior implementation-defined-behavior instead. But they don't.

That's kind of also how C++ std::span wound up without overflow checks in practice. And my_arr.at(i) just isn't really being used by anybody.

Seems very user-hostile to me.

55. dapperdrake ◴[01 Apr 25 15:08 UTC] No.43547705{4}[source]▶

>>43546439 #

The interactions between different optimization passes may have surprising consequences.

Endless loops are technically undefined behavior, can be dropped, except for their assembly jump tag entry point, and collide with the next function's assembly jump tag.

All because of UB.

Huge headache. Try debugging that.

And interaction loops on games are sometimes endlessly waiting for input.

56. westurner ◴[01 Apr 25 22:26 UTC] No.43551959[source]▶

>>43536251 #

From https://news.ycombinator.com/item?id=38505448 :

> There are default gcc and/or clang compiler flags in distros' default build tools; e.g. `make` specifies additional default compiler flags (that e.g. cmake, ninja, gn, or bazel/buck/pants may not also specify for you).

Is there a good reference for comparing these compile-time build flags and their defaults with Make, CMake, Ninja Build, and other build systems, on each platform and architecture?

From https://news.ycombinator.com/item?id=41306658 :

> From "Building optimized packages for conda-forge and PyPI" at EuroSciPy 2024: https://pretalx.com/euroscipy-2024/talk/JXB79J/ :

>> Since some time, conda-forge defines multiple "cpu-levels". These are defined for sse, avx2, avx512 or ARM Neon. On the client-side the maximum CPU level is detected and the best available package is then installed. This opens the doors for highly optimized packages on conda-forge that support the latest CPU features.

But those are per-arch performance flags, not security flags.

replies(1): >>43566532 #

57. cowboylowrez ◴[01 Apr 25 23:19 UTC] No.43552317{5}[source]▶

>>43543310 #

from the ubc.pdf paper linked in this thread.

    int d[16];
    int SATD (void)
    {
    int satd = 0, dd, k;
    for (dd=d[k=0]; k<16; dd=d[++k]) {
    satd += (dd < 0 ? -dd : dd);
    }
    return satd;
    }

This was “optimized” by a pre-release of gcc-4.8 into the following infinite loop: SATD: .L2: jmp .L2

(simply because k<16 is always true because k is used as an index to an array with a known size)

I mean thats just sort of nuts, how do you loop over an array then in an UB free manner? The paper referred to this situation being remediated:

"The GCC maintainers subsequently disabled this optimization for the case occuring in SPEC"

I try to keep up with the UB thing, while for current code I just use o0 because its fast enough and apparently allows me to keep an array index in bounds. Reading about this leaves me thinking that some of this UB criticism might not be so mindless.

replies(3): >>43553392 #>>43553519 #>>43556332 #

58. nayuki ◴[02 Apr 25 03:30 UTC] No.43553392{6}[source]▶

>>43552317 #

Reference: https://c9x.me/compile/bib/ubc.pdf#page=4

Both the parent comment and the referenced paper fail to mention the out-of-bounds access of d[16]. At best, the paper says:

> The compiler assumed that no out-of-bounds access to d would happen, and from that derived that k is at most 15 after the access

Here is my analysis. By unrolling the loop and tracing the statements and values, we get:

    k = 0;  dd = d[k];
    k is 0;  k < 16 is true;  loop body;  ++k;  k is 1;  dd = d[k];
    k is 1;  k < 16 is true;  loop body;  ++k;  k is 2;  dd = d[k];
    k is 2;  k < 16 is true;  loop body;  ++k;  k is 3;  dd = d[k];
    k is 3;  k < 16 is true;  loop body;  ++k;  k is 4;  dd = d[k];
    k is 4;  k < 16 is true;  loop body;  ++k;  k is 5;  dd = d[k];
    k is 5;  k < 16 is true;  loop body;  ++k;  k is 6;  dd = d[k];
    k is 6;  k < 16 is true;  loop body;  ++k;  k is 7;  dd = d[k];
    k is 7;  k < 16 is true;  loop body;  ++k;  k is 8;  dd = d[k];
    k is 8;  k < 16 is true;  loop body;  ++k;  k is 9;  dd = d[k];
    k is 9;  k < 16 is true;  loop body;  ++k;  k is 10;  dd = d[k];
    k is 10;  k < 16 is true;  loop body;  ++k;  k is 11;  dd = d[k];
    k is 11;  k < 16 is true;  loop body;  ++k;  k is 12;  dd = d[k];
    k is 12;  k < 16 is true;  loop body;  ++k;  k is 13;  dd = d[k];
    k is 13;  k < 16 is true;  loop body;  ++k;  k is 14;  dd = d[k];
    k is 14;  k < 16 is true;  loop body;  ++k;  k is 15;  dd = d[k];
    k is 15;  k < 16 is true;  loop body;  ++k;  k is 16;  dd = d[k];  OUT OF BOUNDS!

As long as we enter the loop, the loop must eventually execute undefined behavior. Furthermore, every instance of testing `k < 16` is true before we hit UB. Therefore it can be simplified to true without loss of functionality, because after we hit UB, we are allowed to do absolutely anything. In my ancestor post where I said that any mistake, no matter how small, can have unbounded consequences, I fully mean it and believe it.

Please stop blaming the compiler. The problem is buggy code. Either fix the code, or fix the language specification so that wild reads either return an arbitrary value or crashes cleanly at that instruction.

Note that we cannot change the spec to give definite behavior to writing out of bounds, because it is always possible to overwrite something critical like a return address or an instruction, and then it is literally undefined behavior and anything can happen.

> I mean thats just sort of nuts, how do you loop over an array then in an UB free manner?

The code is significantly transformed, but the nasty behavior can be prevented by designing code that does not read out of bounds! The trick is that the test `k < 16` must be false before any attempt to read/write `d[k]`. Which 99.99% of programmers get right, especially by writing a loop in the standard way and not in the obtuse way demonstrated in the referenced code. The obvious and correct implementation is:

    for (int k = 0; k < 16; k++) {
        int dd = d[k];
        satd += dd < 0 ? -dd : dd;
    }

The fact that the SPEC code chose to load `d[k]` before checking that `k` is still in bounds is an overly clever, counterproductive "jumping the gun" tactic. Putting assignment statements into indexing expressions is also needless obfuscation (which I untangled in the unrolled analysis).

replies(1): >>43555156 #

59. tyg13 ◴[02 Apr 25 03:54 UTC] No.43553519{6}[source]▶

>>43552317 #

Leaving aside the fact that that code reads an array out-of-bounds (which is not a trivial security issue) that's a ridiculously obtuse way to write that code. For loop conditions should be almost always be expressed in terms of their induction variable. A much cleaner and safe version is

    int d[16];
    int SATD (void)
    {
    int satd = 0, k = 0;
    for (k = 0; k < 16; ++k)
      satd += d[k] < 0 ? -d[k] : d[k];
    return satd;
    }

replies(1): >>43555099 #

60. motorest ◴[02 Apr 25 04:12 UTC] No.43553616{4}[source]▶

>>43545033 #

> I disagree. Disproportionately in my career random C and C++ code bases failed to build because some new warning was introduced. And this is precisely because compiler options are so bad in that a lot of projects do Wall, Wextra and Werror.

There is nothing to disagree. It is a statement of fact that there is production software that is not expected to break just because someone breaks a compiler. This is not up for debate. Setting flags like Werror is not even relevant, because that is an explicit choice of development teams and one which is strongly discouraged beyond local builds.

> Also the way undefined behavior is exploited means that you don't really know of your software that worked fine 10 years ago will actually work fine today, unless you have exhaustive tests.

No, not really. There are only two scenarios with UB: either you unwittingly used UB and thus you introduced an error, or you purposely used a feature provided by your specific choice of compiler+OS+hardware that leverages UB.

The latter involves a ton of due diligence and pinning your particular platform, particularly compiler version.

So either you don't know what you're doing, or you are very well aware and very specific about what you're doing.

replies(1): >>43555041 #

61. rowanG077 ◴[02 Apr 25 09:41 UTC] No.43555041{5}[source]▶

>>43553616 #

What you are doing is pushing all responsibility on the user. Which is exactly the ridiculous mindset that continues to make C and C++ software shit. It's like having a lawless society and complaining your shit is getting stolen. You can blame the thieves, but that is pretty stupid way to handle things. Maybe just maybe the environment has a huge impact on how people act. And telling people "lol, this single line of code here is invoking UB in your 100mloc codebase. Too bad that we emptied out the database, removed the backups and launched world ending nukes" is just bonkers insane. In no other human made tool is this acceptable. It's like having a hammer that accept only striking nails with between 13.732 newtons and 13.733 newtons of force if you go over or under it shoots you and your family in the face. "Skill issue lel, just hit it with the right amount of force", no how about you fix your fucking tool so it doesn't shoot me in the face at the slightest mistake.

> There is nothing to disagree. It is a statement of fact that there is production software that is not expected to break just because someone breaks a compiler. This is not up for debate. Setting flags like Werror is not even relevant, because that is an explicit choice of development teams and one which is strongly discouraged beyond local builds.

Those two statements don't mix. You can't claim C++ has "a lot of that" backwards compatibility. And in the same breath blame user when the same compiler flags break their code on a new compiler version. It's not like this is hard. Don't add any warnings to existing container flags. Specify new ones instead. Don't exploit new UB which you didn't in previous version. The C++ just don't really care about maintaining backwards compatibility. They may think they do, but they really don't.

> So either you don't know what you're doing, or you are very well aware and very specific about what you're doing.

I dare say that almost no non-trivial C++ codebase contains no UB at all all scenarios. So you can make the argument that all those people don't know what they are doing. But then you admit that C++ is just not a tool that should be touched by mere mortals.

replies(1): >>43573253 #

62. cowboylowrez ◴[02 Apr 25 09:51 UTC] No.43555099{7}[source]▶

>>43553519 #

I don't understand your logic though, according to my obviously inadequate understanding of UB, "k < 16" can never be false because its used as an array index for d[16]??? is the critical difference here that the array access happens outside of the "for" expression?

replies(1): >>43565775 #

63. cowboylowrez ◴[02 Apr 25 10:00 UTC] No.43555156{7}[source]▶

>>43553392 #

according to my understanding UB mechanics says "k < 16" is always true because its used as an index into d[16]. obviously my understanding is wrong because of your correct code allows it, so I'm just looking for the takeaway here. why is yours not UB when original is? Is there some magic going on that the compiler knows that the value 16 will be used as an index?

is the compiler doing a loop unroll and study that parallels your analysis?

aside from trying to understand this, it also seems a bit nuts that these c guys remediated that behavior by allowing an out of bounds index access with their change (?). so are they saying somehow that in this one case, the out of bounds should be allowed?

replies(1): >>43556591 #

64. dapperdrake ◴[02 Apr 25 13:15 UTC] No.43556332{6}[source]▶

>>43552317 #

After thinking about it and reading the other comments, the for loop needs to be changed (k<15), because ++k sets k to 16 in the last step of the version in the parent comment.

This one is nasty.

And it will still cause trouble close to arrays if size INT_MAX.

Fun times.

replies(1): >>43556626 #

65. nayuki ◴[02 Apr 25 13:42 UTC] No.43556591{8}[source]▶

>>43555156 #

Regarding the wrong code, I already wrote (and you can confirm in my unrolled analysis):

> every instance of testing `k < 16` is true before we hit UB. Therefore it can be simplified to true without loss of functionality, because after we hit UB, we are allowed to do absolutely anything.

Regarding the correct code, the key insight is that when k is 16, the test `k < 16` is false, so we break out of the loop and avoid reading out of bounds. Always check bounds before indexing the array, not after!

This article is a helpful introduction: https://blog.regehr.org/archives/213

> is the compiler doing a loop unroll and study that parallels your analysis?

Yes, compilers do all sorts of crazy stuff from bounds analyses to loop induction transformations. Personally, I've seen GCC eliminate my second loop variable in an implementation of the TEA cipher because it figured out that the first variable can be derived from the second, and it adjusted the bounds.

> it also seems a bit nuts that these c guys remediated that behavior by allowing an out of bounds index access with their change

You are right, it's insane that they decided to make a special case for one piece of wrong code (SPEC), probably because it's too popular to fail. That's a social/political problem, not a technical one.

replies(1): >>43557803 #

66. nayuki ◴[02 Apr 25 13:45 UTC] No.43556626{7}[source]▶

>>43556332 #

The problem isn't ++k per se. The problem is the expression d[++k], which immediately reads d[16] before realizing that `k < 16` is false. Look at your sibling comments for explanations and corrections: https://news.ycombinator.com/item?id=43553392 , https://news.ycombinator.com/item?id=43553519

> And it will still cause trouble close to arrays if size INT_MAX.

Fun fact, `for (int i = 0; i <= INT_MAX; i++) {}` is undefined behavior in C/C++ but a well-defined infinite loop in Java.

replies(1): >>43558046 #

67. cowboylowrez ◴[02 Apr 25 15:30 UTC] No.43557803{9}[source]▶

>>43556591 #

seems like gcc is being pragmatic, at least with the version 12 something I'm running, it doesn't remove the k<16 test/expression, loop terminates, and with enough -W's I see the warning. still, even to detect this I have to use something other than -O0 which is news to me lol

68. dapperdrake ◴[02 Apr 25 15:49 UTC] No.43558046{8}[source]▶

>>43556626 #

The problem is checking for k<16 and incrementing k and then using k to access array d[16] afterwards. That's how it goes out if bounds. The condition is inadequate for ensuring that k stays in the bounds of array d[16].

It seems like k is "obviously" an array index for array d[16] here, but whatever. Not in a position to have that discussion right now.

As for Java, (too lazy to look that up right now): That at least sounds like they enforce twos-complement representation for signed int values. C89 has UB here because ancient hardware also used ones-complement. And C89 wanted to be portable as a portable assembler. Well, if UB really makes anything portable. They didn't even go with platform-dependent a.k.a. implementation defined. Instead they specifically chose undefined behavior. The argument brought forward, at least today, is that C isn't supposed to be a portable assembler. Now which one is it supposed to be? Cannot have both at the same time.

69. usefulcat ◴[02 Apr 25 22:04 UTC] No.43562214{7}[source]▶

>>43546817 #

If you don't, that's fine. But is it really necessary or desirable for -ffast-math to silently break isinf() and isnan()? I don't think it's inconceivable that some people might need those.

I mean, I can write my own implementations, but it seems kind of silly that I should need to in the first place.

70. tyg13 ◴[03 Apr 25 06:54 UTC] No.43565775{8}[source]▶

>>43555099 #

> Is the critical difference here that the array access happens outside of the "for" expression?

Precisely: this means that `d[k]` is guaranteed to execute before the check that `k < 16`. In general, if you have some access like `d[k]` where `k` is some integer and `d` is some array of size `N`, you can assume that `k < N` on all paths which are dominated by the statement containing `d[k]`. In simpler terms, the optimizer will assume that `k < N` is true on every path after the access to `d[k]` occurs.

To make this clearer, consider an equivalent, slightly-transformed version of the original code:

    int d[16];
    int SATD (void)
    {
      int satd = 0, dd, k;
      dd = d[k=0];
      do {
        satd += (dd < 0 ? -dd : dd);
        k = k + 1;
        dd=d[k]; // At this point, k must be < 16.
      } while (k < 16); // The value of `k` has not changed, thus our previous
                        // assumption that `k < 16` must still hold. Thus `k < 16`
                        // can be simplified to `true`.
      return satd;
    }

Now consider a slightly-transformed version of the correct code:

    int d[16];
    int SATD (void)
    {
    int satd = 0, k = 0;
    k = 0;
    do
    {
      satd += d[k] < 0 ? -d[k] : d[k]; // At this point, k must be < 16
      k = k + 1;
    } while (k < 16); // The value of `k` has changed -- at best, we can assert
                      // that `k < 17` since its value increased by 1 since we
                      // last assumed it was less than 16. But the assumption
                      // that `k < 16` doesn't hold, and this check cannot be
                      // simplified.
    return satd;
    }

It's important that this is understood in terms of dominance (in the graph-theoretical sense), because statements like "k < 16 can never be false because it's used in d[k] where k == 16" or "the compiler will delete checks for k < 16 if it knows that d[16] occurs" which seem equivalent to the previously-stated dominance criterion simply are not. It's not that the compiler is detecting UB, thus deleting your checks -- it's that it assumes UB never occurs in the first place.

71. mid-kid ◴[03 Apr 25 08:20 UTC] No.43566532{3}[source]▶

>>43551959 #

In my experience distributions only patch GCC or modify the package building environment variables to add compiler flags. You can be certain that the compiler flags used in build systems like cmake and meson will be vanilla.

Make adds no additional compiler flags (check the output of "make -n -p"). Neither does Ninja.

Autotools is extremely conservative with compiler flags and will only really add -O2 -g, as well as include paths and defines specified by the developer.

CMake has some default compiler flags, depending on your CMAKE_BUILD_TYPE, mostly affecting optimization, and disabling asserts() with Release (-DNDEBUG). It also has some helpers for precompiled headers and link-time optimizations that enable the relevant flags.

Meson uses practically the same flags as cmake, with the exception of not passing -DNDEBUG unless the developer of the meson build really wants it to.

These are all the relevant build systems for linux packages. I'm not familiar with gn, bazel, and etc. In general, build systems dabble a bit in optimization flags, but pay no mind to hardening.

72. nayuki ◴[03 Apr 25 18:02 UTC] No.43573253{6}[source]▶

>>43555041 #

> What you are doing is pushing all responsibility on the user. Which is exactly the ridiculous mindset that continues to make C and C++ software shit.

Correct, because that's what the C and C++ language standards say. Once the programmer writes code that hits undefined behavior, the standard says that there is no requirement on the compiler and runtime to behave in any certain way. Don't shoot the messenger; the compiler is maximizing its legal exploitation within the language standard. Blame the language standard and petition for change. (See my comment https://news.ycombinator.com/item?id=43539554 .)

> It's like having a lawless society and complaining your shit is getting stolen.

Quite the contrary; the law is written explicitly. The language standard says that something is undefined. You don't get to complain that your shit got stolen if it's undefined. It's like leaving your wallet on the street and complaining that it gets stolen, when there is absolutely no law saying that you should expect your stuff to be untouched when it's not within your private property.

> And telling people "lol, this single line of code here is invoking UB in your 100mloc codebase. Too bad that we emptied out the database, removed the backups and launched world ending nukes" is just bonkers insane.

Sadly, it's not insane. A single erroneous write can corrupt a return address or an instruction, and then the attacker can start a remote shell on your system, and nuke your machine. Not theoretical. When it comes to math, logic, and programming, any single error, no matter how small, has the potential to bring the whole house down.

Or, your program already has routines for formatting your hard drive and launching nukes, and it's protected by one flimsy if-statement. You corrupt the variable that controls that condition, and boom, everything's dead.

I hope you don't find it controversial that one arbitrary write to memory (like `(char)0x123 = 0x456;`) is sufficient grounds that nobody can predict the unbounded consequences of it. I do heavily prefer reads and overflows to be not UB though - like in Java and most other languages that came after C & C++. Again, blame the language standards, not the compiler.

> In no other human made tool is this acceptable. It's like having a hammer that accept only striking nails with between 13.732 newtons and 13.733 newtons of force if you go over or under it shoots you and your family in the face.

Hah, I take it that you're unfamiliar with mechanical engineering. For things like carbon fiber bike frames, every bolt must be tightened to a specific torque or you risk cracking the frame and possibly dying. Like, "the stem bolt must be tightened to 5 to 6 newton-metres; too loose and it'll come undone as you ride; too tight and it can damage the part". Likewise, there are examples of airplane crashes that happened because one component that should've been 1.50 mm was manufactured to like 1.40 mm, caused inappropriate rubbing with another part, caused a fuel line to burst, and you can predict the rest. At least debugging software is hell of a lot easier than debugging physical materials.

> Those two statements don't mix. You can't claim C++ has "a lot of that" backwards compatibility.

You can achieve a lot of backwards compatibility if you write code that is within the language standard and not touch UB!

Also, if you haven't done so already, read up on the notion of the C abstract machine. When the compiler analyzes you code, it doesn't think "oh you're on x86, and the add instruction doesn't trap, so your overflow is safe". No, it thinks, "the C abstract machine says that addition overflow is undefined, so we will assume that in never happens, and simplify any logic as a consequence of that assumption".

replies(1): >>43575029 #

73. nayuki ◴[03 Apr 25 18:09 UTC] No.43573322{6}[source]▶

>>43546745 #

> no popular language created in the past 30 years makes heavy use of UB

Yeah, I'm happy that Rust's list is relatively short, with approximately 10 items on it: https://doc.rust-lang.org/reference/behavior-considered-unde...

Also relevant reading: https://doc.rust-lang.org/nomicon/what-unsafe-does.html

replies(1): >>43574480 #

74. steveklabnik ◴[03 Apr 25 19:47 UTC] No.43574480{7}[source]▶

>>43573322 #

That list for Rust is not necessarily comprehensive. What matters more than the number is the segregation; you can’t cause UB from safe rust, only unsafe rust.

75. rowanG077 ◴[03 Apr 25 20:34 UTC] No.43575029{7}[source]▶

>>43573253 #

> Correct, because that's what the C and C++ language standards say. Once the programmer writes code that hits undefined behavior, the standard says that there is no requirement on the compiler and runtime to behave in any certain way. Don't shoot the messenger; the compiler is maximizing its legal exploitation within the language standard. Blame the language standard and petition for change. (See my comment https://news.ycombinator.com/item?id=43539554 .)

I take issue with the standard and by extension the entire C abstract machine itself. I thought that was clear. Compiler writers aren't blameless they have were huffing some strong stuff when they decided what to do on undefined behavior in some cases. But the original sin is in the standard.

> Quite the contrary; the law is written explicitly. The language standard says that something is undefined. You don't get to complain that your shit got stolen if it's undefined. It's like leaving your wallet on the street and complaining that it gets stolen, when there is absolutely no law saying that you should expect your stuff to be untouched when it's not within your private property.

The language standard is the entire reason that C++ is the shit show it is. You are holding up like it's a book handed down by god. It's not. It's an extremely flawed document that has cost the world trillions of dollars and has meaningfully set back the advancement of the human race.

> Or, your program already has routines for formatting your hard drive and launching nukes, and it's protected by one flimsy if-statement. You corrupt the variable that controls that condition, and boom, everything's dead.

So you agree with me how ridiculous this is? I'm not even sure anymore.

> I hope you don't find it controversial that one arbitrary write to memory (like `(char)0x123 = 0x456;`) is sufficient grounds that nobodtheyy can predict the unbounded consequences of it. I do heavily prefer reads and overflows to be not UB though - like in Java and most other languages that came after C & C++. Again, blame the language standards, not the compiler.

I indeed don't find it controversial that one arbitrary write to memory (like `(char)0x123 = 0x456;`) is sufficient grounds that nobody can predict the unbounded consequences of it. I do find it controversial that this is possible to achieve by default in any modern language.

> Hah, I take it that you're unfamiliar with mechanical engineering. For things like carbon fiber bike frames, every bolt must be tightened to a specific torque or you risk cracking the frame and possibly dying. Like, "the stem bolt must be tightened to 5 to 6 newton-metres; too loose and it'll come undone as you ride; too tight and it can damage the part". Likewise, there are examples of airplane crashes that happened because one component that should've been 1.50 mm was manufactured to like 1.40 mm, caused inappropriate rubbing with another part, caused a fuel line to burst, and you can predict the rest. At least debugging software is hell of a lot easier than debugging physical materials.

Mechanical engineers have to content with physical reality. Programming language designers don't, programming languages are literally made up. Besides that specific torque your bike frame must be tightened too is not a torque that 99.99999% of trained mechanics are unable to hit. Because that is approximately the number of professional C++ programmers that are incapable of writing undefined behaviour free code.

> You can achieve a lot of backwards compatibility if you write code that is within the language standard and not touch UB!

"We got a lot of that backwards compatibility. If you are a once in a lifetime super genius.". Very convincing...

> Also, if you haven't done so already, read up on the notion of the C abstract machine. When the compiler analyzes you code, it doesn't think "oh you're on x86, and the add instruction doesn't trap, so your overflow is safe". No, it thinks, "the C abstract machine says that addition overflow is undefined, so we will assume that in never happens, and simplify any logic as a consequence of that assumption".

The C abstract machine is an abomination that should be banished to the deepest pits of hell and never see the light of day again. The whole concept of undefined behavior is something that should never have existed. It's the original sin.

replies(1): >>43576000 #

76. nayuki ◴[03 Apr 25 22:01 UTC] No.43576000{8}[source]▶

>>43575029 #

Thanks for your response and taking my comments seriously.

> I take issue with the standard ... the original sin is in the standard

Me too! As per my other comments, I think the C and C++ standards are insane with so many cases of UB, and even totally preventable cases like "if a source file doesn't end with newline". Pretty much every other language has far less UB than C and C++.

> Compiler writers aren't blameless

This is probably where we differ. Yes, I acknowledge that compilers became more aggressive over the years. But as someone who writes math proofs, I appreciate the notion of deriving logical consequences of things. For example, if you sneak just a single division-by-zero step, you can prove that 1 = 2, and in turn prove that 0 = 1. I maintain that compilers are maximizing what they can exploit within the bounds of UB; i.e. if it can prove that you triggered UB, then it can do anything it wants.

> Why should I care about that when this machine decided it is oke to irradiate me to lethal levels?

Look, even if you had the friendliest C compiler, there can still be a million other reasons why the machine irradiated you. Maybe there's a manufacturing defect and a wire is flailing around and short circuited something. Maybe you wrote code in assembly language and read an out-of-bounds memory address and simply got back the same value that was most recently on the memory bus (open bus syndrome; a bunch of SNES exploits rely on that); that is UB at a hardware level. And then maybe you dereferenced that value as a pointer.

I am aware that many people are offended by compilers exploiting UB, but that sentiment seems extremely misdirected because they're not willing to confront the fact that the code writer did not comply with the preconditions of the language standard; they wrote faulty code and so the compiler made a faulty executable, GIGO.

> The language standard is the entire reason that C++ is the shit show it is. You are holding up like it's a book handed down by god. It's not.

I am extremely frustrated with C/C++ too ( https://www.nayuki.io/page/undefined-behavior-in-c-and-cplus... , https://www.nayuki.io/page/summary-of-c-cpp-integer-rules ). I fully know that humans wrote that, very fallible humans who chose questionable compromises with respect to performance and compatibility. My other comment ( https://news.ycombinator.com/item?id=43539554 ) already advocates for tightening the standard to reduce as many cases of UB as possible - I even said that I think reading out of bounds should either return an arbitrary value or crash cleanly, which might be more radical than most viewpoints!

> It's an extremely flawed document that has cost the world trillions of dollars and has meaningfully set back the advancement of the human race.

I agree. The C and C++ committees refuse to curb UB, and the rest of us have to deal with the consequences.

> when [compilers] decided what to do on undefined behavior in some cases

Correct, because that's what the C and C++ language standards say. Once the programmer writes code that hits undefined behavior, the standard says that there is no requirement on the compiler and runtime to behave in any certain way. Don't shoot the messenger; the compiler is maximizing its legal exploitation within the language standard. Blame the language standard and petition for change.

> I indeed don't find it controversial that one arbitrary write to memory (like `(char)0x123 = 0x456;`) is sufficient grounds that nobody can predict the unbounded consequences of it. I do find it controversial that this is possible to achieve by default in any modern language.

Sure, a compiler writer like GCC could pledge that they have a dialect of C that is gentler than the standard. They could, like my example above, guarantee that reading out of bounds will simply generate a machine instruction, and let the machine either read some value or page fault, and not infer the fact that the act of reading implies to the rest of the program that the index is in bounds (relating to the `k < 16`, `d[k]` subthread). But let's call it for what it is - it would be a dialect of C, probably only supported by GCC, and not by any other compiler (LLVM, Microsoft, Intel, etc.). It would be an island, not a standard. If they want their semantics to be universally accepted, they will have no choice but to propagate it up to the C standard.

> Programming language designers don't [contend with physical reality], programming languages are literally made up.

There's some truth in that, but it's incomplete. One thing that's not debatable is that programming and math are intertwined - from basic stuff like arithmetic, to inequalities/ranges, to iteration and recursion, to full-blown proofs about behavior. Once you see it in that lens, you realize that certain features lead to contradictions that would make for a very confusing programming language.

In the case of C/C++, this is how I see it: If you have arrays as objects that live in memory, then naturally you have bounds; the array is finite in start and end. If you index that array, you need to decide what happens if the index is out of bounds. You can either go through extensive math proofs to show at compile time that the index is always in bounds, so you can go ahead with no run-time overhead. Otherwise, you either need to pay for run-time checks, or you throw your hands up and say "whatever happens, happens" - and that's a source of UB.

> Because that is approximately the number of professional C++ programmers that are capable of writing undefined free code.

I fully agree with this. In my C & C++ learning journey, I had to learn a lot of awful habits that were taught to me implicitly from other people's writing and code. As the simplest example, the notion that you can overflow a signed integer and then print its value and examine what happened - no, the moment you overflowed it, all bets are off and you cannot debug that in general (unless you have UBSan enabled, which is basically a dialect). Even now that I'm very aware of UB, it is very mentally taxing to remember all the rules and check against them for every line of code that I write. There's a reason I don't use these languages, and use managed ones like Java/JavaScript/Python or Rust, because I don't have enough time and brain cells to write 100% perfect C/C++ code 100% of the time. Based on the difficulties I faced, I have low trust in other people writing correct C & C++ code.

> The C abstract machine is an abomination that should be banished

Sorry, no. If anything, I somewhat appreciate that it calls out the name "abstract machine" instead of implying it through semantics.

If C code can be compiled for x86 and ARM, with or without an MMU, the compiler is reasoning about an abstract machine and not doing all its transformations and optimizations with respect to each concrete machine that it supports. It is literally an abstraction.

Can the abstract machine be tightened up to be less programmer-hostile? Absolutely. Change the language standard.

Even in the case of the Java virtual machine, it is still an abstract machine because there are still a bunch of implementation details that differ somewhat on real JVMs (from different vendors) running on real CPUs. And of course many code optimizations/transformations are first done with with respect to the abstract JVM, and only later refined for the physical machine (e.g. the choice of ADD vs. LEA on x86; specific instructions like POPCNT vs. fallback).

> The whole concept of undefined behavior is something that should never have existed.

Sorry, nope, this is impossible. Look at Rust; even it acknowledges UB, which is a subset of C and C++ UB - https://doc.rust-lang.org/reference/behavior-considered-unde... . Even Java has UB if you use sun.misc.Unsafe, which can be useful for performance-sensitive code, native memory management, building reflective frameworks; also of course when using JNI.

I guess you can fully eliminate UB if you heavily restrict the semantics of the language and/or accept performance reductions. For example, you can definitely make a UB-free version of Brainfuck with its mere 8 instructions (just need to define how you want to handle overflow and the left side of the tape). You can also make UB-free C/C++ if you bounds-check all your writes and keep track of variable lifetimes - actually, someone did that as the Fil-C project, and can compile and run existing code with a small performance penalty.

For better or for worse, C & C++ emphasize backwards compatibility and avoiding runtime costs - to the detriment of almost every other desirable feature like human comprehension. You probably don't like this choice, but it is the tradeoff that they made because they prioritized some features over others.

77. teo_zero ◴[05 Apr 25 06:36 UTC] No.43591402{4}[source]▶

>>43546439 #

Sorry, I'm not sure I understand your point. Are you complaining because it has a leading -f instead of a different spelling?