Most active commenters

    ←back to thread

    128 points RGBCube | 17 comments | | HN request time: 0.422s | source | bottom
    1. qwertox ◴[] No.44497845[source]
    LLMs are broken, too:

    > "Of course. This is an excellent example that demonstrates a fundamental and powerful concept in Rust: the distinction between cloning a smart pointer and cloning the data it points to. [...]"

    Then I post the compiler's output:

    > "Ah, an excellent follow-up! You are absolutely right to post the compiler error. My apologies—my initial explanation described how one might expect it to work logically, but I neglected a crucial and subtle detail [...]"

    Aren't you also getting very tired of this behavior?

    replies(6): >>44497870 #>>44497876 #>>44497896 #>>44497926 #>>44498249 #>>44498405 #
    2. renewiltord ◴[] No.44497870[source]
    Haha, I encountered the opposite of this when I did a destructive thing recently but first asked Gemini, then countered it saying it’s wrong and it insisted it was right. So the reality they encountered is probably that: it either is stubbornly wrong or overly obsequious with no ability to switch.

    My friend was a big fan of Gemini 2.5 Pro and I kept telling him it was garbage except for OCR and he nearly followed what it recommended. Haha, he’s never touching it again. Every other LLM changed its tune on pushback.

    3. ramon156 ◴[] No.44497876[source]
    You should check Twitter nowadays, people love this kind of response. Some even use it as an argument
    replies(1): >>44497943 #
    4. the_mitsuhiko ◴[] No.44497896[source]
    > Aren't you also getting very tired of this behavior?

    The part that annoys me definitely is how confident they all sound. However the way I'm using them is with tool usage loops and so it usually runs into part 2 immediately and course corrects.

    replies(1): >>44498000 #
    5. rfoo ◴[] No.44497926[source]
    TBH I'm tired of only the "Ah, an excellent follow-up! You are absolutely right <...> My apologies" part.
    replies(1): >>44498097 #
    6. darkwater ◴[] No.44497943[source]
    And this is why basically LLMs are "bad". They have already reached critical mass adoption, they are right or mostly right most of the time but they also screw up badly many times as well. And people will just not know, trust blindly and going even deeper down in the spiral of the total absence of critical judgement. And yeah, it also happened with Google and search engines back in the day ("I found it on the web so it must be true") but now with LLMs it is literally tailored to what you are asking, for every possible question you can ask (well, minus the censored ones).

    I keep thinking the LLM contribution to humanity is/will be a net negative in the long run.

    replies(2): >>44498026 #>>44498079 #
    7. bt1a ◴[] No.44498000[source]
    Well, they're usually told that they're some unicorn master of * languages, frameworks, skillsets, etc., so can you really fault them? :)
    8. bt1a ◴[] No.44498026{3}[source]
    Yet theyre fantastic personal tutors / assistants who can provide a deeply needed 1:1 learning interface for less privileged individuals. I emphasize 'can'. Not saying kids should have them by their side in their current rough around the edges and mediocre intelligent forms. Many will get burned as you describe, but it should be a lesson to curate information from multiple sources and practice applying reasoning skills!
    replies(2): >>44498164 #>>44498178 #
    9. chrismorgan ◴[] No.44498079{3}[source]
    > they are right or mostly right most of the time

    It’s times like this when I wonder if we’re even using the same tools. Maybe it’s because I only even try to actively use them when I expect failure and am curious how it will be (occasionally it just decides to interpose itself on a normal search result, and I’m including those cases in my results) but my success rate with DuckDuckGo Assist (GPT-4o) is… maybe 10% of the time success but the first few search results gave the answer anyway, 30% obviously stupidly wrong answer (and some of the time the first couple of results actually had the answer, but it messed it up), 60% plausible but wrong answer. I have literally never had something I would consider an insightful answer to the sorts of things I might search the web for. Not once. I just find it ludicrously bad, for something so popular. Yet somehow lots of people sing their praises and clearly have a better result than me, and that sometimes baffles, sometimes alarms me. Baffles—they must be using it completely differently from me. Alarms—or are they just failing to notice errors?

    (I also sometimes enjoy running things like llama3.2 locally, but that’s just playing with it, and it’s small enough that I don’t expect it to be any good at these sorts of things. For some sorts of tasks like exploring word connections when I just can’t quite remember a word, or some more mechanical things, they can be handy. But for search-style questions, using models like GPT-4o, how do I get such consistently useless or pernicious results from them!?)

    replies(1): >>44498484 #
    10. IshKebab ◴[] No.44498097[source]
    Yeah they definitely didn't do that in the past. We've lost "as a large language model" and "it's important to remember" but gained "you're absolutely right!"

    I would have thought they'd add "don't apologise!!!!" or something like that to the system prompt like they do to avoid excessive lists.

    11. darkwater ◴[] No.44498164{4}[source]
    I agree with your take, and I personally used Claude and ChatGPT to learn better/hone some skills while interviewing to land a new job. And they also help me get unstucked when doing small home fixes, because it's a custom-tailored answer to my current doubt/issue that a normal web search would make much more complicated to answer (I had to know more context about it). But still, they get things wrong and can lead you astray even if you know the topic.
    12. AIPedant ◴[] No.44498178{4}[source]
    My dad taught high school science until retiring this year, and at least in 2024 the LLM tutors were totally useless for honest learning. They were good at the “happy path” but the space of high schoolers’ misconceptions about physics greatly exceeds the training data and can’t be cheaply RLHFed, so they crap the bed when you role play as a dumb high schooler.

    In my experience this is still true for the reasoning models with undergraduate mathematics - if you ask it to do your point-set topology homework (dishonest learning) it will score > 85/100, if you are confused about point-set topology and try to ask it an honest (but ignorant) question it will give you a pile of pseudo-mathematical BS.

    13. ◴[] No.44498249[source]
    14. codedokode ◴[] No.44498405[source]
    Languages like Rust and C seem to be too complicated for them. I also asked different LLMs to write a C macro or function that creates a struct and a function to print it (so that I don't have to duplicate a field list) and it generates plausible garbage.
    replies(1): >>44499241 #
    15. Ukv ◴[] No.44498484{4}[source]
    Probably depends a lot of the type of questions you're asking. I think LLMs are inherently better at language-based tasks (translate this, reword this, alternate terms for this, etc.) than technical fact-based tasks, and within technical tasks someone using it as their first port of call will be giving it a much larger proportion of easy questions than someone using it only once stumped having exhausted other sources (or, as here, challenging it with questions where they expect failure).

    There's a difference in question difficulty distribution between me asking "how do I do X in FFmpeg" because I'm too lazy to check the docs and don't use FFmpeg frequently enough to memorize, compared to someone asking because they have already checked the docs and/or use FFmpeg frequently but couldn't figure out how to do specifically X (say cropping videos to an odd width/height, which many formats just don't support), for instance. Former probably makes up majority of my LLM usage, but have still occasionally been suprirsed on the latter where I've come up empty checking docs/traditional search but an LLM pulls out something correct.

    replies(1): >>44498944 #
    16. chrismorgan ◴[] No.44498944{5}[source]
    A few days ago I tried something along the “how do I do X in FFmpeg” lines, but something on the web platform, I don’t remember what. Maybe something to do with XPath, or maybe something comparatively new (3–5y) in JS-land with CSS connections. It was something where there was a clear correct answer, no research or synthesis required, I was just blanking on the term, or something like that. (Frustratingly, I can’t remember exactly what it was.) Allegedly using two of the search results, one of which was correct and one of which was just completely inapplicable, it gave a third answer which sounded plausible but was a total figment.

    It’s definitely often good at finding the relevant place in the docs, but painfully frequently it’s stupendously bad, declaring in tone authoritative how it snatched defeat from the jaws of victory.

    The startlingly variety of people’s experiences, and its marked bimodal distribution, has been observed and remarked upon before. And it’s honestly quite disturbing, because they’re frequently incompatible enough to suggest that at least one of the two perspectives is mostly wrong.

    17. bigfishrunning ◴[] No.44499241[source]
    LLMs only ever produce plausible garbage -- sometimes that garbage happens to be right, but that's down to luck.