←back to thread

386 points ingve | 2 comments | | HN request time: 0.688s | source
Show context
SleepyMyroslav ◴[] No.35738552[source]
If being branchless is important property of the algorithm then it is better to enforce it. Or at least test for it. If his GCC version will get an update and it will stop producing assembly that he wants no-one will ever know.

Which brings us back to regular discussion: C ( and C++ ) does not match hardware anymore. There is no real control over important properties of generated code. Programmers need tools to control what they write. Plug and pray compilation is not a solid engineering approach.

replies(6): >>35738629 #>>35738633 #>>35738677 #>>35738943 #>>35741009 #>>35745436 #
TuringTest ◴[] No.35738677[source]
Wasn't C created to avoid matching hardware? I.e. not caring about word size, number of registers, instruction set, etc.

I thought the whole point of writing programs in C was originally being able to write portable software (and a portable OS) that could be executed at different machines? It only became specialized in giving machine-specific instructions when other languages took over.

replies(3): >>35739004 #>>35739054 #>>35740495 #
1. segfaultbuserr ◴[] No.35740495[source]
> Wasn't C created to avoid matching hardware?

Not exactly. C code from the early Research Unix days often makes very specific assumptions of how the hardware behaves. As a starter, C was created in an age when 36-bit mainframes still ruled the world, yet it decided to only use 8-bit integers as its base word size, not 6-bit - because the PDP-11 is a 16-bit machine.

More appropriately, you can say C was created to avoid writing assembly code. Or you can say, the original purpose of C was to create a minimum high-level programming language that hits the lowest bar of being portable, and not more. C itself is a paradox, it's sometimes known as "portable assembly" which further reflects this paradox.

On one hand, it provided a lightweight abstraction layer that allows basic high-level programming, while still being simple enough to write or port a compiler to different platforms easily (for early C at least).

On the other hand, C was in fact intimately related to the hardware platform it was running on. Originally, the operations in C were designed with compiling directly to its original hardware platform PDP-11 in mind, rather than being defined by some formal mathematical specifications. So the behavior of C was basically, "the most natural result of the platform it's running on." This is why C has a ton of undefined behaviors - But paradoxically, this is also what made C portable - it could be matched directly to hardware without heavy abstractions, and thus C was simple.

Today we like to see C as portable, so "never rely on unspecified and undefined behaviors" is the rule, language lawyers tell us that C should be seen as an abstract machine in the symbolic sense. Compilers are performing increasingly complicated and aggressive logic and symbolic transformations for optimization and vectorization, with the assumption that there's no undefined behavior.

But if you read early C programs on Unix, you would see that developers made liberal use of unspecified and undefined behaviors, with very specific assumptions of their machines - in early C, undefined behaviors were arguably a feature, not a bug. C didn't support floating-point numbers until a hardware FPU was installed to the PDP-11, and even then, it only supported double-precision math, not single-precision, simply because the PDP-11 FPU had a global mode, making mode-switching messy, so Unix developers didn't want to manage it in the kernel. The famous Unix code, "you're not expected to understand this" went as far as depending on the assembly code generated by the compiler (to be fair, it was only a temporarily hack and was later removed, but it just shows how C was capable of being used). Meanwhile, under today's portability requirements, C programmers are not even supposed to assume signed integers use 2's complement encoding, and signed overflow is undefined (before C23)!

So there's an inherent contradiction that exists inside C on whether it's portable or it's machine-dependent, simultaneously.

The original C was "it is what the hardware is" (but it's still portable at large, because of its simplicity), and today's C is "it is what the abstract machine is, as defined by esoteric rules by language lawyers."

To show this conflict, I would quote Linus Torvalds:

> Yeah, let's just say that the original C designers were better at their job than a gaggle of standards people who were making bad crap up to make some Fortran-style programs go faster.

I don't exactly agree with Linus, and I don't believe today's heavy symbolic transformation and auto-vectorization should be taken away from C, I don't believe we should go back to "pcc" in which the compiler did nothing more than straight translation. I think it's reasonable to demand highly optimized code, of course. I'm just saying that there is a mismatch between C's hacker-friendly root and its role as a general-purpose language in the industry after it took over the world (ironically, exactly due to its hacker-friendless). The original hacker-friendly design is just not the most appropriate tool for this job. It was not designed to do this to begin with, so it has created this unfortunate situation.

So C in today's form is neither hacker-friendly nor production-friendly. But its old "hacker-friendly" image is still deeply attractive, even if it's illusory.

replies(1): >>35750731 #
2. rocqua ◴[] No.35750731[source]
I always believed the difference between unspecified and undefined behavior was specifically tuned so that unspecified behavior was fine to use un-portably, whereas undefined behavior was never meant to be used. Hence the wording of the standard beinh carefully chosen.

Of course, this makes it surprising that relying on 2s complement is undefined rather than unspecified. As well as other examples like left shifting 2^31.

From this story, it sounds like the distinction between undefined and unspecified behavior is newer than I thought, and perhaps more about optimization from the beginning?