Recent AI model progress feels mostly like bullshit

(www.lesswrong.com)

579 points paulpauper | 3 comments | 06 Apr 25 18:01 UTC | HN request time: 0s | source

Show context

aerhardt ◴[06 Apr 25 19:30 UTC] No.43604214[source]▶

My mom told me yesterday that Paul Newman had massive problems with alcohol. I was somewhat skeptical, so this morning I asked ChatGPT a very simple question:

"Is Paul Newman known for having had problems with alcohol?"

All of the models up to o3-mini-high told me he had no known problems. Here's o3-mini-high's response:

"Paul Newman is not widely known for having had problems with alcohol. While he portrayed characters who sometimes dealt with personal struggles on screen, his personal life and public image were more focused on his celebrated acting career, philanthropic work, and passion for auto racing rather than any issues with alcohol. There is no substantial or widely reported evidence in reputable biographies or interviews that indicates he struggled with alcohol abuse."

There is plenty of evidence online that he struggled a lot with alcohol, including testimony from his long-time wife Joanne Woodward.

I sent my mom the ChatGPT reply and in five minutes she found an authoritative source to back her argument [1].

I use ChatGPT for many tasks every day, but I couldn't fathom that it would get so wrong something so simple.

Lesson(s) learned... Including not doubting my mother's movie trivia knowledge.

[1] https://www.newyorker.com/magazine/2022/10/24/who-paul-newma...

replies(27): >>43604240 #>>43604254 #>>43604266 #>>43604352 #>>43604411 #>>43604434 #>>43604445 #>>43604447 #>>43604474 #>>43605109 #>>43605148 #>>43605609 #>>43605734 #>>43605773 #>>43605938 #>>43605941 #>>43606141 #>>43606176 #>>43606197 #>>43606455 #>>43606465 #>>43606551 #>>43606632 #>>43606774 #>>43606870 #>>43606938 #>>43607090 #

1. lfkdev ◴[06 Apr 25 19:35 UTC] No.43604254[source]▶

>>43604214 #

Thats not really 'simple' for an LLM. This is a niche information about a specifc person, LLM's train on massive amount of data, the more a topic is being present in the data, the better will the answers be.

Also, you can/should use the "research" mode for questions like this.

replies(1): >>43604402 #

2. aerhardt ◴[06 Apr 25 19:52 UTC] No.43604402[source]▶

>>43604254 (TP) #

The question is simple and verifiable - it is impressive to me that it’s not contained in the LLM’s body of knowledge - or rather that it can’t reach the answer.

This is niche in the grand scheme of knowledge but Paul Newman is easily one of the biggest actors in history, and the LLM has been trained on a massive corpus that includes references to this.

Where is the threshold for topics with enough presence in the data?

replies(2): >>43604430 #>>43604511 #

3. Max_aaa ◴[06 Apr 25 19:57 UTC] No.43604430[source]▶

>>43604402 #

The question might be simple and verifiable, but it is not a simple for an LLM to mark a particular question as such. This is the tricky part.

An LLM does not care about your question, it is a bunch of math that will spit out a result based on what you typed in.

↑