Recent AI model progress feels mostly like bullshit

(www.lesswrong.com)

579 points paulpauper | 2 comments | 06 Apr 25 18:01 UTC | HN request time: 0s | source

Show context

aerhardt ◴[06 Apr 25 19:30 UTC] No.43604214[source]▶

My mom told me yesterday that Paul Newman had massive problems with alcohol. I was somewhat skeptical, so this morning I asked ChatGPT a very simple question:

"Is Paul Newman known for having had problems with alcohol?"

All of the models up to o3-mini-high told me he had no known problems. Here's o3-mini-high's response:

"Paul Newman is not widely known for having had problems with alcohol. While he portrayed characters who sometimes dealt with personal struggles on screen, his personal life and public image were more focused on his celebrated acting career, philanthropic work, and passion for auto racing rather than any issues with alcohol. There is no substantial or widely reported evidence in reputable biographies or interviews that indicates he struggled with alcohol abuse."

There is plenty of evidence online that he struggled a lot with alcohol, including testimony from his long-time wife Joanne Woodward.

I sent my mom the ChatGPT reply and in five minutes she found an authoritative source to back her argument [1].

I use ChatGPT for many tasks every day, but I couldn't fathom that it would get so wrong something so simple.

Lesson(s) learned... Including not doubting my mother's movie trivia knowledge.

[1] https://www.newyorker.com/magazine/2022/10/24/who-paul-newma...

replies(27): >>43604240 #>>43604254 #>>43604266 #>>43604352 #>>43604411 #>>43604434 #>>43604445 #>>43604447 #>>43604474 #>>43605109 #>>43605148 #>>43605609 #>>43605734 #>>43605773 #>>43605938 #>>43605941 #>>43606141 #>>43606176 #>>43606197 #>>43606455 #>>43606465 #>>43606551 #>>43606632 #>>43606774 #>>43606870 #>>43606938 #>>43607090 #

stavros ◴[06 Apr 25 19:59 UTC] No.43604447[source]▶

>>43604214 #

LLMs aren't good at being search engines, they're good at understanding things. Put an LLM on top of a search engine, and that's the appropriate tool for this use case.

I guess the problem with LLMs is that they're too usable for their own good, so people don't realizing that they can't perfectly know all the trivia in the world, exactly the same as any human.

replies(4): >>43604471 #>>43604558 #>>43606272 #>>43610103 #

1. nyarlathotep_ ◴[07 Apr 25 00:33 UTC] No.43606272[source]▶

>>43604447 #

> I guess the problem with LLMs is that they're too usable for their own good, so people don't realizing that they can't perfectly know all the trivia in the world, exactly the same as any human.

They're quite literally being sold as a replacement for human intellectual labor by people that have received uncountable sums of investment money towards that goal.

The author of the post even says this:

"These machines will soon become the beating hearts of the society in which we live. The social and political structures they create as they compose and interact with each other will define everything we see around us."

Can't blame people "fact checking" something that's supposed to fill these shoes.

People should be (far) more critical of LLMs given all of these style of bold claims, not less.

Also, telling people they're "holding it wrong" when they interact with alleged "Ay Gee Eye" "superintelligence" really is a poor selling point, and no way to increase confidence in these offerings.

These people and these companies don't get to make these claims that threaten the livelihood of millions of people, inflate a massive bubble, impact hiring decisions and everything else we've seen and then get excused cause "whoops you're not supposed to use it like that, dummy."

Nah.

replies(1): >>43608864 #

2. stavros ◴[07 Apr 25 07:37 UTC] No.43608864[source]▶

>>43606272 (TP) #

Your point is still trivially disproven by the fact that not even humans are expected to know all the world's trivia off the top of their heads.

We can discuss whether LLMs live up to the hype, or we can discuss how to use this new tool in the best way. I'm really tired of HN insisting on discussing the former, and I don't want to take part in that. I'm happy to discuss the latter, though.

↑