←back to thread

296 points gyre007 | 3 comments | | HN request time: 0s | source
Show context
_han ◴[] No.21281004[source]
The top comment on YouTube raises a valid point:

> I've programmed both functional and non-functional (not necessarily OO) programming languages for ~2 decades now. This misses the point. Even if functional programming helps you reason about ADTs and data flow, monads, etc, it has the opposite effect for helping you reason about what the machine is doing. You have no control over execution, memory layout, garbage collection, you name it. FP will always occupy a niche because of where it sits in the abstraction hierarchy. I'm a real time graphics programmer and if I can't mentally map (in rough terms, specific if necessary) what assembly my code is going to generate, the language is a non-starter. This is true for any company at scale. FP can be used at the fringe or the edge, but the core part demands efficiency.

replies(29): >>21281094 #>>21281291 #>>21281346 #>>21281363 #>>21281366 #>>21281483 #>>21281490 #>>21281516 #>>21281702 #>>21282026 #>>21282130 #>>21282232 #>>21283002 #>>21283041 #>>21283257 #>>21283351 #>>21283424 #>>21283461 #>>21285789 #>>21285877 #>>21285892 #>>21285914 #>>21286539 #>>21286651 #>>21287177 #>>21287195 #>>21288087 #>>21288669 #>>21347699 #
journalctl ◴[] No.21283424[source]
Unless you’ve written a modern, optimizing C/C++ compiler, you have absolutely no idea what kind of machine code a complex program is going to spit out. It’s not 1972 anymore, and C code is no longer particularly close to the metal. It hasn’t been for some time.
replies(2): >>21283537 #>>21283627 #
Crinus ◴[] No.21283537[source]
This is wrong, you absolutely can have an idea of what a C (and most of the time, C++) compiler will generate. You may not know the exact instructions, but if you are familiar with the target CPU you can have a general idea what sort of instructions will be generated. And the more you check the assembly that a compiler generates for pieces of code, the better your idea will be.

Note that you almost never need to care about what the entirety of a "complex program" will generate - but often you need to care about what specific pieces you are working on will generate.

The C language itself might be defined in terms of an abstract machine, but it is still implemented by real compilers - compilers that, btw, you also have control over and often provide a lot of options on how they will generate code.

And honestly, if you have "absolutely no idea what kind of machine code" your C compiler will generate then perhaps it it will be a good idea to get some understanding.

(though i'd agree that it isn't easy since a lot of people treat compiler options as wells of wishes where they put "-O90001" and compiler developers are perfectly fine with that - there is even a literal "-Ofast" nowadays - instead of documenting what exactly they do)

replies(2): >>21283652 #>>21285460 #
gpderetta ◴[] No.21283652[source]
To be fair, most of the set of optimization parameters enabled for the various for -Ox, including x=fast, is usually documented.

At least in GCC though, there are a few optimizations included in the various -O flags that have no corresponding fine grained flag (usually because they affect optimization pass ordering or tuning parameters).

replies(1): >>21289891 #
1. Crinus ◴[] No.21289891[source]
Yes they are documented, though the documentation is really something like "-fawesome-optimization, enabled by default on -O3" and the "-fawesome-optimization" has documentation like "enables awesome optimization" without explaining much more than that.

And even then pretty much every project out there uses "-Ofast" instead of whatever "-Ofast" enables without caring about what it does or how its behavior will change across compilers.

replies(1): >>21291055 #
2. gpderetta ◴[] No.21291055[source]
-Ofast enables fast-math optimizations and generally is not standard compliant. I hope projects do not deliberately enable it without thinking (as they say, it is hard to make stuff fool proof because fools are so resourceful).
replies(1): >>21292581 #
3. Crinus ◴[] No.21292581[source]
My point was that options like -O<number> and -Ofast aren't the actual optimization switches, they turn on other switches and you do not know what you'll get - essentially wishing for fast code and hoping you'll get some (i mentioned -Ofast explicitly because of its name).

For example according to the documentation in GCC 7.4 -O3 turns on:

    -fgcse-after-reload
    -finline-functions
    -fipa-cp-clone
    -fpeel-loops
    -fpredictive-commoning
    -fsplit-paths
    -ftree-loop-distribute-patterns
    -ftree-loop-vectorize
    -ftree-partial-pre
    -ftree-slp-vectorize
    -funswitch-loops
    -fvect-cost-model
whereas in GCC 9.2 -O3 turns the above, plus:

    -floop-interchange 
    -floop-unroll-and-jam 
    -ftree-loop-distribution 
    -fversion-loops-for-strides
So unless you control the exact version of the compiler that will generate the binaries you will give out, you do not exactly know what specifying "-O3" will do.

Moreover even though you do know the switches, their documentation is basically nothing. For a random example what "-floop-unroll-and-jam" does? The GCC 9.2 documentation combines it with "-ftree-loop-linear", "-floop-interchange", "-floop-strip-mine" and "-floop-block" and all it says is:

> Perform loop nest optimizations. Same as -floop-nest-optimize. To use this code transformation, GCC has to be configured with --with-isl to enable the Graphite loop transformation infrastructure.

...what does that even mean? What sort of effect will those transformations have on the code? Why are they all jumbled in one explanation? Are they exactly the same? Why does it say that they are the same "-floop-nest-optimize"? Which option is the same? All of them? The -"floop-nest-optimize" documentation says:

> Enable the isl based loop nest optimizer. This is a generic loop nest optimizer based on the Pluto optimization algorithms. It calculates a loop structure optimized for data-locality and parallelism. This option is experimental.

Based on the Pluto optimization algorithms? Even assuming that this refers to "PLUTO - An automatic parallelizer and locality optimizer for affine loop nests" (this is a guess, no other references in the GCC documentation as far as i can tell), does it mean they are the same as the the code in pluto, that they based on the code and are modified or that they are based on the general idea/concepts/algorithms?

--

So it isn't really a surprise that most people simple throw out "-Ofast" (or -O3 or -O2 or whatever) and hope for the best. They do not know better and they cannot know better since their compiler doesn't provide them any further information. And this is where all the FUD and fear about C's undefined behavior comes - people not knowing what exactly happens because they are not even told.