Most active commenters

    ←back to thread

    305 points todsacerdoti | 16 comments | | HN request time: 0s | source | bottom
    1. mmastrac ◴[] No.44061671[source]
    The associated issue for comparing two u16s is interesting.

    https://github.com/rust-lang/rust/issues/140167

    replies(3): >>44061906 #>>44065911 #>>44066028 #
    2. heybales ◴[] No.44061906[source]
    The thing I like most about this is that the discussion isn't just 14 pages of "I'm having this issue as well" and "Any updates on when this will be fixed?" As a web dev, GitHub issues kinda suck.
    replies(2): >>44063190 #>>44073866 #
    3. eterm ◴[] No.44063190[source]
    It was worse before emoji reactions were added and 90% of messages were literally just "+1"
    replies(1): >>44064094 #
    4. heybales ◴[] No.44064094{3}[source]
    +1
    5. rhdjsjebshjffn ◴[] No.44065911[source]
    This just seems to illustrate the complexity of compiler authorship. I am very sure c compilers are wble to address this issue any better in the general case.
    replies(2): >>44066162 #>>44066204 #
    6. ack_complete ◴[] No.44066028[source]
    I'm surprised there's no mention of store forwarding in that discussion. The -O3 codegen is bonkers, but the -O2 output is reasonable. In the case where one of the structs has just been computed, attempting to load it as a single 32-bit load can result in a store forwarding failure that would negate the benefit of merging the loads. In a non-inlined, non-PGO scenario the compiler doesn't have enough information to tell whether the optimization is suitable.
    replies(2): >>44069905 #>>44070022 #
    7. runevault ◴[] No.44066162[source]
    Keep in mind Rust is using the same backend as one of the main C compilers, LLVM. So if it is handling it any better that means the Clang developers handle it before it even reaches the shared LLVM backend. Well, or there is something about the way Clang structures the code that catches a pattern in the backend the Rust developers do not know about.
    replies(1): >>44068937 #
    8. vlovich123 ◴[] No.44066204[source]
    The rust issue has people trying this with c code and the compiler generates the same issue. This will get fixed and it’ll help c and Rust code
    replies(1): >>44068993 #
    9. rhdjsjebshjffn ◴[] No.44068937{3}[source]
    I mean yea, i just view rust as the quality-oriented spear of western development.

    Rust is absolutely an improvement over C in every way.

    10. runevault ◴[] No.44068993{3}[source]
    Out of curiosity just clang or gcc as well?
    replies(1): >>44072736 #
    11. Dylan16807 ◴[] No.44069905[source]
    > In the case where one of the structs has just been computed, attempting to load it as a single 32-bit load can result in a store forwarding failure that would negate the benefit of merging the loads

    Would that failure be significantly worse than separate loading?

    Just negating the optimization wouldn't be much reason against doing it. A single load is simpler and in the general case faster.

    replies(2): >>44078234 #>>44084378 #
    12. mshockwave ◴[] No.44070022[source]
    > In the case where one of the structs has just been computed, attempting to load it as a single 32-bit load can result in a store forwarding failure

    It actually depends on the uArch, Apple silicon doesn't seem to have this restriction: https://news.ycombinator.com/item?id=43888005

    > In a non-inlined, non-PGO scenario the compiler doesn't have enough information to tell whether the optimization is suitable.

    I guess you're talking about stores and load across function boundaries?

    Trivia: X86 LLVM creates a whole Pass just to prevent this partial-store-to-load issue on Intel CPUs: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Targ...

    13. josephg ◴[] No.44072736{4}[source]
    I just tried it, and the problem is even worse in gcc.

    Given this C code:

        typedef struct { uint16_t a, b; } pair;
    
        int eq_copy(pair a, pair b) {
            return a.a == b.a && a.b == b.b;
        }
        int eq_ref(pair *a, pair *b) {
            return a->a == b->a && a->b == b->b;
        }
    
    Clang generates clean code for the eq_copy variant, but complex code for the eq_ref variant. Gcc emits pretty complex code in both variants.

    For example, here's eq_ref from gcc -O2:

        eq_ref:
            movzx   edx, WORD PTR [rsi]
            xor     eax, eax
            cmp     WORD PTR [rdi], dx
            je      .L9
            ret
        .L9:
            movzx   eax, WORD PTR [rsi+2]
            cmp     WORD PTR [rdi+2], ax
            sete    al
            movzx   eax, al
            ret
    
    Have a play around: https://c.godbolt.org/z/79Eaa3jYf
    14. NoMoreNicksLeft ◴[] No.44073866[source]
    Wonder if it's a poor interface issue... if people could just click a button that says "me too" but didn't add a full comment but rather just added some minimal notation at the bottom of the comment that indicated their username, 1) would people use it and 2) would that be not overly-busy enough to not be annoying? It could even mute notifications for the me-toos.
    replies(1): >>44134415 #
    15. ack_complete ◴[] No.44084378{3}[source]
    Usually, yeah, it's noticeably worse than using individual loads and stores as it adds around a dozen cycles of latency. This is usually enough for the load to light up hot in a sampling profile. It's possible for that extra latency to be hidden, but then in that case the extra loads/stores wouldn't be an issue either.
    16. IshKebab ◴[] No.44134415{3}[source]
    This seems like an area where LLMs would actually be extremely useful. You can manually mark comments as irrelevant. Why can't GitHub use AI to do it automatically? Or to highlight the "resolution" comment automatically? On very big issues it can take a non-trivial amount of time just to find out what the outcome was.