←back to thread

128 points RGBCube | 2 comments | | HN request time: 0s | source
Show context
qwertox ◴[] No.44497845[source]
LLMs are broken, too:

> "Of course. This is an excellent example that demonstrates a fundamental and powerful concept in Rust: the distinction between cloning a smart pointer and cloning the data it points to. [...]"

Then I post the compiler's output:

> "Ah, an excellent follow-up! You are absolutely right to post the compiler error. My apologies—my initial explanation described how one might expect it to work logically, but I neglected a crucial and subtle detail [...]"

Aren't you also getting very tired of this behavior?

replies(6): >>44497870 #>>44497876 #>>44497896 #>>44497926 #>>44498249 #>>44498405 #
ramon156 ◴[] No.44497876[source]
You should check Twitter nowadays, people love this kind of response. Some even use it as an argument
replies(1): >>44497943 #
darkwater ◴[] No.44497943[source]
And this is why basically LLMs are "bad". They have already reached critical mass adoption, they are right or mostly right most of the time but they also screw up badly many times as well. And people will just not know, trust blindly and going even deeper down in the spiral of the total absence of critical judgement. And yeah, it also happened with Google and search engines back in the day ("I found it on the web so it must be true") but now with LLMs it is literally tailored to what you are asking, for every possible question you can ask (well, minus the censored ones).

I keep thinking the LLM contribution to humanity is/will be a net negative in the long run.

replies(2): >>44498026 #>>44498079 #
chrismorgan ◴[] No.44498079[source]
> they are right or mostly right most of the time

It’s times like this when I wonder if we’re even using the same tools. Maybe it’s because I only even try to actively use them when I expect failure and am curious how it will be (occasionally it just decides to interpose itself on a normal search result, and I’m including those cases in my results) but my success rate with DuckDuckGo Assist (GPT-4o) is… maybe 10% of the time success but the first few search results gave the answer anyway, 30% obviously stupidly wrong answer (and some of the time the first couple of results actually had the answer, but it messed it up), 60% plausible but wrong answer. I have literally never had something I would consider an insightful answer to the sorts of things I might search the web for. Not once. I just find it ludicrously bad, for something so popular. Yet somehow lots of people sing their praises and clearly have a better result than me, and that sometimes baffles, sometimes alarms me. Baffles—they must be using it completely differently from me. Alarms—or are they just failing to notice errors?

(I also sometimes enjoy running things like llama3.2 locally, but that’s just playing with it, and it’s small enough that I don’t expect it to be any good at these sorts of things. For some sorts of tasks like exploring word connections when I just can’t quite remember a word, or some more mechanical things, they can be handy. But for search-style questions, using models like GPT-4o, how do I get such consistently useless or pernicious results from them!?)

replies(1): >>44498484 #
1. Ukv ◴[] No.44498484[source]
Probably depends a lot of the type of questions you're asking. I think LLMs are inherently better at language-based tasks (translate this, reword this, alternate terms for this, etc.) than technical fact-based tasks, and within technical tasks someone using it as their first port of call will be giving it a much larger proportion of easy questions than someone using it only once stumped having exhausted other sources (or, as here, challenging it with questions where they expect failure).

There's a difference in question difficulty distribution between me asking "how do I do X in FFmpeg" because I'm too lazy to check the docs and don't use FFmpeg frequently enough to memorize, compared to someone asking because they have already checked the docs and/or use FFmpeg frequently but couldn't figure out how to do specifically X (say cropping videos to an odd width/height, which many formats just don't support), for instance. Former probably makes up majority of my LLM usage, but have still occasionally been suprirsed on the latter where I've come up empty checking docs/traditional search but an LLM pulls out something correct.

replies(1): >>44498944 #
2. chrismorgan ◴[] No.44498944[source]
A few days ago I tried something along the “how do I do X in FFmpeg” lines, but something on the web platform, I don’t remember what. Maybe something to do with XPath, or maybe something comparatively new (3–5y) in JS-land with CSS connections. It was something where there was a clear correct answer, no research or synthesis required, I was just blanking on the term, or something like that. (Frustratingly, I can’t remember exactly what it was.) Allegedly using two of the search results, one of which was correct and one of which was just completely inapplicable, it gave a third answer which sounded plausible but was a total figment.

It’s definitely often good at finding the relevant place in the docs, but painfully frequently it’s stupendously bad, declaring in tone authoritative how it snatched defeat from the jaws of victory.

The startlingly variety of people’s experiences, and its marked bimodal distribution, has been observed and remarked upon before. And it’s honestly quite disturbing, because they’re frequently incompatible enough to suggest that at least one of the two perspectives is mostly wrong.