Recent AI model progress feels mostly like bullshit

(www.lesswrong.com)

579 points paulpauper | 3 comments | 06 Apr 25 18:01 UTC | HN request time: 0s | source

Show context

aerhardt ◴[06 Apr 25 19:30 UTC] No.43604214[source]▶

My mom told me yesterday that Paul Newman had massive problems with alcohol. I was somewhat skeptical, so this morning I asked ChatGPT a very simple question:

"Is Paul Newman known for having had problems with alcohol?"

All of the models up to o3-mini-high told me he had no known problems. Here's o3-mini-high's response:

"Paul Newman is not widely known for having had problems with alcohol. While he portrayed characters who sometimes dealt with personal struggles on screen, his personal life and public image were more focused on his celebrated acting career, philanthropic work, and passion for auto racing rather than any issues with alcohol. There is no substantial or widely reported evidence in reputable biographies or interviews that indicates he struggled with alcohol abuse."

There is plenty of evidence online that he struggled a lot with alcohol, including testimony from his long-time wife Joanne Woodward.

I sent my mom the ChatGPT reply and in five minutes she found an authoritative source to back her argument [1].

I use ChatGPT for many tasks every day, but I couldn't fathom that it would get so wrong something so simple.

Lesson(s) learned... Including not doubting my mother's movie trivia knowledge.

[1] https://www.newyorker.com/magazine/2022/10/24/who-paul-newma...

replies(27): >>43604240 #>>43604254 #>>43604266 #>>43604352 #>>43604411 #>>43604434 #>>43604445 #>>43604447 #>>43604474 #>>43605109 #>>43605148 #>>43605609 #>>43605734 #>>43605773 #>>43605938 #>>43605941 #>>43606141 #>>43606176 #>>43606197 #>>43606455 #>>43606465 #>>43606551 #>>43606632 #>>43606774 #>>43606870 #>>43606938 #>>43607090 #

stavros ◴[06 Apr 25 19:59 UTC] No.43604447[source]▶

>>43604214 #

LLMs aren't good at being search engines, they're good at understanding things. Put an LLM on top of a search engine, and that's the appropriate tool for this use case.

I guess the problem with LLMs is that they're too usable for their own good, so people don't realizing that they can't perfectly know all the trivia in the world, exactly the same as any human.

replies(4): >>43604471 #>>43604558 #>>43606272 #>>43610103 #

MegaButts ◴[06 Apr 25 20:00 UTC] No.43604471[source]▶

>>43604447 #

> LLMs aren't good at being search engines, they're good at understanding things.

LLMs are literally fundamentally incapable of understanding things. They are stochastic parrots and you've been fooled.

replies(5): >>43604573 #>>43604575 #>>43604616 #>>43604708 #>>43604736 #

1. mitthrowaway2 ◴[06 Apr 25 20:30 UTC] No.43604708[source]▶

>>43604471 #

What does the word "understand" mean to you?

replies(1): >>43604771 #

2. MegaButts ◴[06 Apr 25 20:37 UTC] No.43604771[source]▶

>>43604708 (TP) #

An ability to answer questions with a train of thought showing how the answer was derived, or the self-awareness to recognize you do not have the ability to answer the question and declare as much. More than half the time I've used LLMs they will simply make answers up, and when I point out the answer is wrong it simply regurgitates another incorrect answer ad nauseum (regularly cycling through answers I've already pointed out are incorrect).

Rather than give you a technical answer - if I ever feel like an LLM can recognize its limitations rather than make something up, I would say it understands. In my experience LLMs are just algorithmic bullshitters. I would consider a function that just returns "I do not understand" to be an improvement, since most of the time I get confidently incorrect answers instead.

Yes, I read Anthropic's paper from a few days ago. I remain unimpressed until talking to an LLM isn't a profoundly frustrating experience.

replies(1): >>43604944 #

3. mitthrowaway2 ◴[06 Apr 25 21:03 UTC] No.43604944[source]▶

>>43604771 #

I just want to say that's a much better answer than I anticipated!

↑