Most active commenters

    133 points chmaynard | 19 comments | | HN request time: 1.266s | source | bottom
    1. BSDobelix ◴[] No.42061819[source]
    One can try it out with CachyOS/Arch:

    https://cachyos.org/blog/2411-kernel-autofdo/

    replies(3): >>42065875 #>>42068991 #>>42069221 #
    2. kardos ◴[] No.42063801[source]
    Does it work with Intel fortran-compiled code?
    replies(1): >>42067765 #
    3. JoelJacobson ◴[] No.42064455[source]
    Here is another interesting BOLT article, this one on PostgreSQL optimization:

    https://vondra.me/posts/playing-with-bolt-and-postgres/

    "results are unexpectedly good, in some cases up to 40%"

    replies(1): >>42065575 #
    4. vsskanth ◴[] No.42065443[source]
    Anyone know of a windows equivalent to BOLT ?
    replies(2): >>42069021 #>>42071879 #
    5. pfdietz ◴[] No.42065575[source]
    That's amazing.
    6. knowitnone ◴[] No.42065875[source]
    wanted to see what CachyOS is about. https://www.phoronix.com/review/cachyos-linux-perf/5 it came second place to ClearLinux which is not bad.
    7. OnlyMortal ◴[] No.42066401[source]
    Back in the day on the Mac, the order of source files in your project would determine locality in the binary.

    If memory serves, this was with MPW C or maybe CodeWarrior.

    You could see the jump (jmp) instructions use short jumps rather than long ones.

    replies(3): >>42067152 #>>42068345 #>>42072091 #
    8. rurban ◴[] No.42067152[source]
    This is still relevant. I had big success in writing an order optimizer for perl5
    9. kijiki ◴[] No.42067765[source]
    As long as you relink with relocations preserved in the final ELF binary, it should.
    replies(1): >>42071248 #
    10. fsflyer ◴[] No.42068345[source]
    The Metrowerks profiler and linker worked together to optimize locality in the binary, the focus was on PowerPC code. The linker could generate the static call tree, but the profiler could generate a dynamic call tree of what was actually called. Separating out the cold portions of the call tree into portions of the executable that didn't get paged in was the goal.

    I worked on the Profiler and I seem to remember that Microsoft was one of the developers that put a bunch of effort into using this to optimize the Office suite on Mac. I remember the release of Word that used it was snappier.

    11. ndesaulniers ◴[] No.42068991[source]
    Note: that's autoFDO+propeller. This article is about BOLT.
    12. Cieric ◴[] No.42069021[source]
    Some google searching brought up this. https://learn.microsoft.com/en-us/cpp/build/profile-guided-o... I'm only reading over it now, but I'm going to test it out a bit when I can.
    replies(1): >>42071233 #
    13. ◴[] No.42069221[source]
    14. stephc_int13 ◴[] No.42070059[source]
    Instruction Cache and TLB trashing is an often overlooked consequence of code bloat and sometimes of overly aggressive micro-benchmark driven optimization.

    Reorganizing the binary is an interesting approach to minimize the cost, but I think that any performance oriented developer should keep in mind that most projects are rarely dependent on a single hot loop but on many systems working together and competing for space in the cache(s).

    I generally use -Os instead of -O2 and -O3 in my projects, while trying to reduce code bloat to a minimum for that reason.

    15. dwattttt ◴[] No.42071233{3}[source]
    PGO describes the using extra data to guide optimisations, but it doesn't define what those optimisations are.

    Reading the link, there's several that sound like they match what BOLT is applying (Basic Block Optimization, Function Layout, Conditional Branch Optimization, and Dead Code Separation).

    16. kardos ◴[] No.42071248{3}[source]
    Thank you!
    17. neerajsi ◴[] No.42071879[source]
    Microsoft had internal tooling very similar to bolt almost 20 years ago. Most of those opts were moved to the compiler in ltcg mode with pgo.
    18. Iwan-Zotow ◴[] No.42072091[source]
    same in MS DOS

    you have far and near pointers modifiers