Recent AI model progress feels mostly like bullshit

(www.lesswrong.com)

579 points paulpauper | 4 comments | 06 Apr 25 18:01 UTC | HN request time: 0.968s | source

Show context

aerhardt ◴[06 Apr 25 19:30 UTC] No.43604214[source]▶

My mom told me yesterday that Paul Newman had massive problems with alcohol. I was somewhat skeptical, so this morning I asked ChatGPT a very simple question:

"Is Paul Newman known for having had problems with alcohol?"

All of the models up to o3-mini-high told me he had no known problems. Here's o3-mini-high's response:

"Paul Newman is not widely known for having had problems with alcohol. While he portrayed characters who sometimes dealt with personal struggles on screen, his personal life and public image were more focused on his celebrated acting career, philanthropic work, and passion for auto racing rather than any issues with alcohol. There is no substantial or widely reported evidence in reputable biographies or interviews that indicates he struggled with alcohol abuse."

There is plenty of evidence online that he struggled a lot with alcohol, including testimony from his long-time wife Joanne Woodward.

I sent my mom the ChatGPT reply and in five minutes she found an authoritative source to back her argument [1].

I use ChatGPT for many tasks every day, but I couldn't fathom that it would get so wrong something so simple.

Lesson(s) learned... Including not doubting my mother's movie trivia knowledge.

[1] https://www.newyorker.com/magazine/2022/10/24/who-paul-newma...

replies(27): >>43604240 #>>43604254 #>>43604266 #>>43604352 #>>43604411 #>>43604434 #>>43604445 #>>43604447 #>>43604474 #>>43605109 #>>43605148 #>>43605609 #>>43605734 #>>43605773 #>>43605938 #>>43605941 #>>43606141 #>>43606176 #>>43606197 #>>43606455 #>>43606465 #>>43606551 #>>43606632 #>>43606774 #>>43606870 #>>43606938 #>>43607090 #

stavros ◴[06 Apr 25 19:59 UTC] No.43604447[source]▶

>>43604214 #

LLMs aren't good at being search engines, they're good at understanding things. Put an LLM on top of a search engine, and that's the appropriate tool for this use case.

I guess the problem with LLMs is that they're too usable for their own good, so people don't realizing that they can't perfectly know all the trivia in the world, exactly the same as any human.

replies(4): >>43604471 #>>43604558 #>>43606272 #>>43610103 #

MegaButts ◴[06 Apr 25 20:00 UTC] No.43604471[source]▶

>>43604447 #

> LLMs aren't good at being search engines, they're good at understanding things.

LLMs are literally fundamentally incapable of understanding things. They are stochastic parrots and you've been fooled.

replies(5): >>43604573 #>>43604575 #>>43604616 #>>43604708 #>>43604736 #

fancyfredbot ◴[06 Apr 25 20:17 UTC] No.43604616[source]▶

>>43604471 #

We're talking about a stochastic parrot which in many circumstances responds in a way which is indistinguishable from actual understanding.

replies(1): >>43604629 #

MegaButts ◴[06 Apr 25 20:19 UTC] No.43604629[source]▶

>>43604616 #

I've always been amazed by this. I have never not been frustrated with the profound stupidity of LLMs. Obviously I must be using it differently because I've never been able to trust it with anything and more than half the time I fact check it even for information retrieval it's objectively incorrect.

replies(2): >>43604789 #>>43604874 #

fancyfredbot ◴[06 Apr 25 20:53 UTC] No.43604874[source]▶

>>43604629 #

If you got as far as checking the output it must have appeared to understand your question.

I wouldn't claim LLMs are good at being factual, or good at arithmetic, or at drawing wine glasses, or that they are "clever". What they are very good at is responding to questions in a way which gives you the very strong impression they've understood you.

replies(1): >>43604890 #

1. MegaButts ◴[06 Apr 25 20:55 UTC] No.43604890[source]▶

>>43604874 #

I vehemently disagree. If I ask a question with an objective answer, and it simply makes something up and is very confident the answer is correct, what the fuck has it understood other than how to piss me off?

It clearly doesn't understand that the question has a correct answer, or that it does not know the answer. It also clearly does not understand that I hate bullshit, no matter how many dozens of times I prompt it to not make something up and would prefer an admittance of ignorance.

replies(1): >>43605010 #

2. fancyfredbot ◴[06 Apr 25 21:14 UTC] No.43605010[source]▶

>>43604890 (TP) #

It didn't understand you but the response was plausible enough to require fact checking.

Although that isn't literally indistinguishable from 'understanding' (because your fact checking easily discerned that) it suggests that at a surface level it did appear to understand your question and knew what a plausible answer might look like. This is not necessarily useful but it's quite impressive.

replies(1): >>43605061 #

3. MegaButts ◴[06 Apr 25 21:21 UTC] No.43605061[source]▶

>>43605010 #

There are times it just generates complete nonsense that has nothing to do with what I said, but it's certainly not most of the time. I do not know how often, but I'd say it's definitely under 10% and almost certainly under 5% that the above happens.

Sure, LLMs are incredibly impressive from a technical standpoint. But they're so fucking stupid I hate using them.

> This is not necessarily useful but it's quite impressive.

I think we mostly agree on this. Cheers.

replies(1): >>43606058 #

4. ◴[06 Apr 25 23:52 UTC] No.43606058{3}[source]▶

>>43605061 #

↑