Compiler Options Hardening Guide for C and C++

(best.openssf.org)

232 points pjmlp | 1 comments | 31 Mar 25 11:01 UTC | HN request time: 0.258s | source

Show context

derriz ◴[31 Mar 25 12:58 UTC] No.43534525[source]▶

Sane defaults should be table stakes for toolchains but C++ has "history".

All significant C++ code-bases and projects I've worked on have had 10s of lines (if not screens) of compiler and linker options - a maintenance nightmare particularly with stuff related to optimization. This stuff is so brittle, who knows when (with which release of the compiler or linker) a particular combination of optimization flags were actually beneficial? How do you regression test this stuff? So everyone is afraid to touch this stuff.

Other compiled languages have similar issues but none to the extent of C++ that I've experienced.

replies(4): >>43534781 #>>43535229 #>>43535747 #>>43543362 #

motorest ◴[01 Apr 25 06:09 UTC] No.43543362[source]▶

>>43534525 #

> Sane defaults should be table stakes for toolchains but C++ has "history".

Yes, it has. By "history" you actually mean "production software that is expected to not break just because someone upgrades a compiler". Yes, C++ does have a lot of that.

> All significant C++ code-bases and projects I've worked on have had 10s of lines (if not screens) of compiler and linker options - a maintenance nightmare particularly with stuff related to optimization.

No, not really. That is definitely not the norm, at all. I can tell you as a matter of fact that release builds of some production software that's even a household name is built with only a couple of basic custom compiler flags, such as specifying the exact version of the target language.

Moreover, if your project uses a build system such as CMake and your team is able to spend 5 minutes reading an onboarding guide onto modern CMake, you do not even need or care to set compiler flags. You set a few high-level target properties and you never look at it ever again.

replies(1): >>43545033 #

rowanG077 ◴[01 Apr 25 10:20 UTC] No.43545033[source]▶

>>43543362 #

> Yes, it has. By "history" you actually mean "production software that is expected to not break just because someone upgrades a compiler". Yes, C++ does have a lot of that.

I disagree. Disproportionately in my career random C and C++ code bases failed to build because some new warning was introduced. And this is precisely because compiler options are so bad in that a lot of projects do Wall, Wextra and Werror.

Also the way undefined behavior is exploited means that you don't really know of your software that worked fine 10 years ago will actually work fine today, unless you have exhaustive tests.

replies(1): >>43553616 #

motorest ◴[02 Apr 25 04:12 UTC] No.43553616[source]▶

>>43545033 #

> I disagree. Disproportionately in my career random C and C++ code bases failed to build because some new warning was introduced. And this is precisely because compiler options are so bad in that a lot of projects do Wall, Wextra and Werror.

There is nothing to disagree. It is a statement of fact that there is production software that is not expected to break just because someone breaks a compiler. This is not up for debate. Setting flags like Werror is not even relevant, because that is an explicit choice of development teams and one which is strongly discouraged beyond local builds.

> Also the way undefined behavior is exploited means that you don't really know of your software that worked fine 10 years ago will actually work fine today, unless you have exhaustive tests.

No, not really. There are only two scenarios with UB: either you unwittingly used UB and thus you introduced an error, or you purposely used a feature provided by your specific choice of compiler+OS+hardware that leverages UB.

The latter involves a ton of due diligence and pinning your particular platform, particularly compiler version.

So either you don't know what you're doing, or you are very well aware and very specific about what you're doing.

replies(1): >>43555041 #

rowanG077 ◴[02 Apr 25 09:41 UTC] No.43555041[source]▶

>>43553616 #

What you are doing is pushing all responsibility on the user. Which is exactly the ridiculous mindset that continues to make C and C++ software shit. It's like having a lawless society and complaining your shit is getting stolen. You can blame the thieves, but that is pretty stupid way to handle things. Maybe just maybe the environment has a huge impact on how people act. And telling people "lol, this single line of code here is invoking UB in your 100mloc codebase. Too bad that we emptied out the database, removed the backups and launched world ending nukes" is just bonkers insane. In no other human made tool is this acceptable. It's like having a hammer that accept only striking nails with between 13.732 newtons and 13.733 newtons of force if you go over or under it shoots you and your family in the face. "Skill issue lel, just hit it with the right amount of force", no how about you fix your fucking tool so it doesn't shoot me in the face at the slightest mistake.

> There is nothing to disagree. It is a statement of fact that there is production software that is not expected to break just because someone breaks a compiler. This is not up for debate. Setting flags like Werror is not even relevant, because that is an explicit choice of development teams and one which is strongly discouraged beyond local builds.

Those two statements don't mix. You can't claim C++ has "a lot of that" backwards compatibility. And in the same breath blame user when the same compiler flags break their code on a new compiler version. It's not like this is hard. Don't add any warnings to existing container flags. Specify new ones instead. Don't exploit new UB which you didn't in previous version. The C++ just don't really care about maintaining backwards compatibility. They may think they do, but they really don't.

> So either you don't know what you're doing, or you are very well aware and very specific about what you're doing.

I dare say that almost no non-trivial C++ codebase contains no UB at all all scenarios. So you can make the argument that all those people don't know what they are doing. But then you admit that C++ is just not a tool that should be touched by mere mortals.

replies(1): >>43573253 #

nayuki ◴[03 Apr 25 18:02 UTC] No.43573253[source]▶

>>43555041 #

> What you are doing is pushing all responsibility on the user. Which is exactly the ridiculous mindset that continues to make C and C++ software shit.

Correct, because that's what the C and C++ language standards say. Once the programmer writes code that hits undefined behavior, the standard says that there is no requirement on the compiler and runtime to behave in any certain way. Don't shoot the messenger; the compiler is maximizing its legal exploitation within the language standard. Blame the language standard and petition for change. (See my comment https://news.ycombinator.com/item?id=43539554 .)

> It's like having a lawless society and complaining your shit is getting stolen.

Quite the contrary; the law is written explicitly. The language standard says that something is undefined. You don't get to complain that your shit got stolen if it's undefined. It's like leaving your wallet on the street and complaining that it gets stolen, when there is absolutely no law saying that you should expect your stuff to be untouched when it's not within your private property.

> And telling people "lol, this single line of code here is invoking UB in your 100mloc codebase. Too bad that we emptied out the database, removed the backups and launched world ending nukes" is just bonkers insane.

Sadly, it's not insane. A single erroneous write can corrupt a return address or an instruction, and then the attacker can start a remote shell on your system, and nuke your machine. Not theoretical. When it comes to math, logic, and programming, any single error, no matter how small, has the potential to bring the whole house down.

Or, your program already has routines for formatting your hard drive and launching nukes, and it's protected by one flimsy if-statement. You corrupt the variable that controls that condition, and boom, everything's dead.

I hope you don't find it controversial that one arbitrary write to memory (like `(char)0x123 = 0x456;`) is sufficient grounds that nobody can predict the unbounded consequences of it. I do heavily prefer reads and overflows to be not UB though - like in Java and most other languages that came after C & C++. Again, blame the language standards, not the compiler.

> In no other human made tool is this acceptable. It's like having a hammer that accept only striking nails with between 13.732 newtons and 13.733 newtons of force if you go over or under it shoots you and your family in the face.

Hah, I take it that you're unfamiliar with mechanical engineering. For things like carbon fiber bike frames, every bolt must be tightened to a specific torque or you risk cracking the frame and possibly dying. Like, "the stem bolt must be tightened to 5 to 6 newton-metres; too loose and it'll come undone as you ride; too tight and it can damage the part". Likewise, there are examples of airplane crashes that happened because one component that should've been 1.50 mm was manufactured to like 1.40 mm, caused inappropriate rubbing with another part, caused a fuel line to burst, and you can predict the rest. At least debugging software is hell of a lot easier than debugging physical materials.

> Those two statements don't mix. You can't claim C++ has "a lot of that" backwards compatibility.

You can achieve a lot of backwards compatibility if you write code that is within the language standard and not touch UB!

Also, if you haven't done so already, read up on the notion of the C abstract machine. When the compiler analyzes you code, it doesn't think "oh you're on x86, and the add instruction doesn't trap, so your overflow is safe". No, it thinks, "the C abstract machine says that addition overflow is undefined, so we will assume that in never happens, and simplify any logic as a consequence of that assumption".

replies(1): >>43575029 #

rowanG077 ◴[03 Apr 25 20:34 UTC] No.43575029[source]▶

>>43573253 #

> Correct, because that's what the C and C++ language standards say. Once the programmer writes code that hits undefined behavior, the standard says that there is no requirement on the compiler and runtime to behave in any certain way. Don't shoot the messenger; the compiler is maximizing its legal exploitation within the language standard. Blame the language standard and petition for change. (See my comment https://news.ycombinator.com/item?id=43539554 .)

I take issue with the standard and by extension the entire C abstract machine itself. I thought that was clear. Compiler writers aren't blameless they have were huffing some strong stuff when they decided what to do on undefined behavior in some cases. But the original sin is in the standard.

> Quite the contrary; the law is written explicitly. The language standard says that something is undefined. You don't get to complain that your shit got stolen if it's undefined. It's like leaving your wallet on the street and complaining that it gets stolen, when there is absolutely no law saying that you should expect your stuff to be untouched when it's not within your private property.

The language standard is the entire reason that C++ is the shit show it is. You are holding up like it's a book handed down by god. It's not. It's an extremely flawed document that has cost the world trillions of dollars and has meaningfully set back the advancement of the human race.

> Or, your program already has routines for formatting your hard drive and launching nukes, and it's protected by one flimsy if-statement. You corrupt the variable that controls that condition, and boom, everything's dead.

So you agree with me how ridiculous this is? I'm not even sure anymore.

> I hope you don't find it controversial that one arbitrary write to memory (like `(char)0x123 = 0x456;`) is sufficient grounds that nobodtheyy can predict the unbounded consequences of it. I do heavily prefer reads and overflows to be not UB though - like in Java and most other languages that came after C & C++. Again, blame the language standards, not the compiler.

I indeed don't find it controversial that one arbitrary write to memory (like `(char)0x123 = 0x456;`) is sufficient grounds that nobody can predict the unbounded consequences of it. I do find it controversial that this is possible to achieve by default in any modern language.

> Hah, I take it that you're unfamiliar with mechanical engineering. For things like carbon fiber bike frames, every bolt must be tightened to a specific torque or you risk cracking the frame and possibly dying. Like, "the stem bolt must be tightened to 5 to 6 newton-metres; too loose and it'll come undone as you ride; too tight and it can damage the part". Likewise, there are examples of airplane crashes that happened because one component that should've been 1.50 mm was manufactured to like 1.40 mm, caused inappropriate rubbing with another part, caused a fuel line to burst, and you can predict the rest. At least debugging software is hell of a lot easier than debugging physical materials.

Mechanical engineers have to content with physical reality. Programming language designers don't, programming languages are literally made up. Besides that specific torque your bike frame must be tightened too is not a torque that 99.99999% of trained mechanics are unable to hit. Because that is approximately the number of professional C++ programmers that are incapable of writing undefined behaviour free code.

> You can achieve a lot of backwards compatibility if you write code that is within the language standard and not touch UB!

"We got a lot of that backwards compatibility. If you are a once in a lifetime super genius.". Very convincing...

> Also, if you haven't done so already, read up on the notion of the C abstract machine. When the compiler analyzes you code, it doesn't think "oh you're on x86, and the add instruction doesn't trap, so your overflow is safe". No, it thinks, "the C abstract machine says that addition overflow is undefined, so we will assume that in never happens, and simplify any logic as a consequence of that assumption".

The C abstract machine is an abomination that should be banished to the deepest pits of hell and never see the light of day again. The whole concept of undefined behavior is something that should never have existed. It's the original sin.

replies(1): >>43576000 #

1. nayuki ◴[03 Apr 25 22:01 UTC] No.43576000[source]▶

>>43575029 #

Thanks for your response and taking my comments seriously.

> I take issue with the standard ... the original sin is in the standard

Me too! As per my other comments, I think the C and C++ standards are insane with so many cases of UB, and even totally preventable cases like "if a source file doesn't end with newline". Pretty much every other language has far less UB than C and C++.

> Compiler writers aren't blameless

This is probably where we differ. Yes, I acknowledge that compilers became more aggressive over the years. But as someone who writes math proofs, I appreciate the notion of deriving logical consequences of things. For example, if you sneak just a single division-by-zero step, you can prove that 1 = 2, and in turn prove that 0 = 1. I maintain that compilers are maximizing what they can exploit within the bounds of UB; i.e. if it can prove that you triggered UB, then it can do anything it wants.

> Why should I care about that when this machine decided it is oke to irradiate me to lethal levels?

Look, even if you had the friendliest C compiler, there can still be a million other reasons why the machine irradiated you. Maybe there's a manufacturing defect and a wire is flailing around and short circuited something. Maybe you wrote code in assembly language and read an out-of-bounds memory address and simply got back the same value that was most recently on the memory bus (open bus syndrome; a bunch of SNES exploits rely on that); that is UB at a hardware level. And then maybe you dereferenced that value as a pointer.

I am aware that many people are offended by compilers exploiting UB, but that sentiment seems extremely misdirected because they're not willing to confront the fact that the code writer did not comply with the preconditions of the language standard; they wrote faulty code and so the compiler made a faulty executable, GIGO.

> The language standard is the entire reason that C++ is the shit show it is. You are holding up like it's a book handed down by god. It's not.

I am extremely frustrated with C/C++ too ( https://www.nayuki.io/page/undefined-behavior-in-c-and-cplus... , https://www.nayuki.io/page/summary-of-c-cpp-integer-rules ). I fully know that humans wrote that, very fallible humans who chose questionable compromises with respect to performance and compatibility. My other comment ( https://news.ycombinator.com/item?id=43539554 ) already advocates for tightening the standard to reduce as many cases of UB as possible - I even said that I think reading out of bounds should either return an arbitrary value or crash cleanly, which might be more radical than most viewpoints!

> It's an extremely flawed document that has cost the world trillions of dollars and has meaningfully set back the advancement of the human race.

I agree. The C and C++ committees refuse to curb UB, and the rest of us have to deal with the consequences.

> when [compilers] decided what to do on undefined behavior in some cases

Correct, because that's what the C and C++ language standards say. Once the programmer writes code that hits undefined behavior, the standard says that there is no requirement on the compiler and runtime to behave in any certain way. Don't shoot the messenger; the compiler is maximizing its legal exploitation within the language standard. Blame the language standard and petition for change.

> I indeed don't find it controversial that one arbitrary write to memory (like `(char)0x123 = 0x456;`) is sufficient grounds that nobody can predict the unbounded consequences of it. I do find it controversial that this is possible to achieve by default in any modern language.

Sure, a compiler writer like GCC could pledge that they have a dialect of C that is gentler than the standard. They could, like my example above, guarantee that reading out of bounds will simply generate a machine instruction, and let the machine either read some value or page fault, and not infer the fact that the act of reading implies to the rest of the program that the index is in bounds (relating to the `k < 16`, `d[k]` subthread). But let's call it for what it is - it would be a dialect of C, probably only supported by GCC, and not by any other compiler (LLVM, Microsoft, Intel, etc.). It would be an island, not a standard. If they want their semantics to be universally accepted, they will have no choice but to propagate it up to the C standard.

> Programming language designers don't [contend with physical reality], programming languages are literally made up.

There's some truth in that, but it's incomplete. One thing that's not debatable is that programming and math are intertwined - from basic stuff like arithmetic, to inequalities/ranges, to iteration and recursion, to full-blown proofs about behavior. Once you see it in that lens, you realize that certain features lead to contradictions that would make for a very confusing programming language.

In the case of C/C++, this is how I see it: If you have arrays as objects that live in memory, then naturally you have bounds; the array is finite in start and end. If you index that array, you need to decide what happens if the index is out of bounds. You can either go through extensive math proofs to show at compile time that the index is always in bounds, so you can go ahead with no run-time overhead. Otherwise, you either need to pay for run-time checks, or you throw your hands up and say "whatever happens, happens" - and that's a source of UB.

> Because that is approximately the number of professional C++ programmers that are capable of writing undefined free code.

I fully agree with this. In my C & C++ learning journey, I had to learn a lot of awful habits that were taught to me implicitly from other people's writing and code. As the simplest example, the notion that you can overflow a signed integer and then print its value and examine what happened - no, the moment you overflowed it, all bets are off and you cannot debug that in general (unless you have UBSan enabled, which is basically a dialect). Even now that I'm very aware of UB, it is very mentally taxing to remember all the rules and check against them for every line of code that I write. There's a reason I don't use these languages, and use managed ones like Java/JavaScript/Python or Rust, because I don't have enough time and brain cells to write 100% perfect C/C++ code 100% of the time. Based on the difficulties I faced, I have low trust in other people writing correct C & C++ code.

> The C abstract machine is an abomination that should be banished

Sorry, no. If anything, I somewhat appreciate that it calls out the name "abstract machine" instead of implying it through semantics.

If C code can be compiled for x86 and ARM, with or without an MMU, the compiler is reasoning about an abstract machine and not doing all its transformations and optimizations with respect to each concrete machine that it supports. It is literally an abstraction.

Can the abstract machine be tightened up to be less programmer-hostile? Absolutely. Change the language standard.

Even in the case of the Java virtual machine, it is still an abstract machine because there are still a bunch of implementation details that differ somewhat on real JVMs (from different vendors) running on real CPUs. And of course many code optimizations/transformations are first done with with respect to the abstract JVM, and only later refined for the physical machine (e.g. the choice of ADD vs. LEA on x86; specific instructions like POPCNT vs. fallback).

> The whole concept of undefined behavior is something that should never have existed.

Sorry, nope, this is impossible. Look at Rust; even it acknowledges UB, which is a subset of C and C++ UB - https://doc.rust-lang.org/reference/behavior-considered-unde... . Even Java has UB if you use sun.misc.Unsafe, which can be useful for performance-sensitive code, native memory management, building reflective frameworks; also of course when using JNI.

I guess you can fully eliminate UB if you heavily restrict the semantics of the language and/or accept performance reductions. For example, you can definitely make a UB-free version of Brainfuck with its mere 8 instructions (just need to define how you want to handle overflow and the left side of the tape). You can also make UB-free C/C++ if you bounds-check all your writes and keep track of variable lifetimes - actually, someone did that as the Fil-C project, and can compile and run existing code with a small performance penalty.

For better or for worse, C & C++ emphasize backwards compatibility and avoiding runtime costs - to the detriment of almost every other desirable feature like human comprehension. You probably don't like this choice, but it is the tradeoff that they made because they prioritized some features over others.

↑