←back to thread

579 points paulpauper | 1 comments | | HN request time: 0.219s | source
Show context
aerhardt ◴[] No.43604214[source]
My mom told me yesterday that Paul Newman had massive problems with alcohol. I was somewhat skeptical, so this morning I asked ChatGPT a very simple question:

"Is Paul Newman known for having had problems with alcohol?"

All of the models up to o3-mini-high told me he had no known problems. Here's o3-mini-high's response:

"Paul Newman is not widely known for having had problems with alcohol. While he portrayed characters who sometimes dealt with personal struggles on screen, his personal life and public image were more focused on his celebrated acting career, philanthropic work, and passion for auto racing rather than any issues with alcohol. There is no substantial or widely reported evidence in reputable biographies or interviews that indicates he struggled with alcohol abuse."

There is plenty of evidence online that he struggled a lot with alcohol, including testimony from his long-time wife Joanne Woodward.

I sent my mom the ChatGPT reply and in five minutes she found an authoritative source to back her argument [1].

I use ChatGPT for many tasks every day, but I couldn't fathom that it would get so wrong something so simple.

Lesson(s) learned... Including not doubting my mother's movie trivia knowledge.

[1] https://www.newyorker.com/magazine/2022/10/24/who-paul-newma...

replies(27): >>43604240 #>>43604254 #>>43604266 #>>43604352 #>>43604411 #>>43604434 #>>43604445 #>>43604447 #>>43604474 #>>43605109 #>>43605148 #>>43605609 #>>43605734 #>>43605773 #>>43605938 #>>43605941 #>>43606141 #>>43606176 #>>43606197 #>>43606455 #>>43606465 #>>43606551 #>>43606632 #>>43606774 #>>43606870 #>>43606938 #>>43607090 #
fnordpiglet ◴[] No.43604445[source]
This is less an LLM thing than an information retrieval question. If you choose a model and tell it to “Search,” you find citation based analysis that discusses that he indeed had problems with alcohol. I do find it interesting it quibbles whether he was an alcoholic or not - it seems pretty clear from the rest that he was - but regardless. This is indicative of something crucial when placing LLMs into a toolkit. They are not omniscient nor are they deductive reasoning tools. Information retrieval systems are excellent at information retrieval and should be used for information retrieval. Solvers are excellent at solving deductive problems. Use them. The better they get at these tasks alone is cool but is IMO a parlor trick since we have nearly optimal or actually optimal techniques that don’t need an LLM. The LLM should use those tools. So, click search next time you have an information retrieval question. https://chatgpt.com/share/67f2dac0-3478-8000-9055-2ae5347037...
replies(3): >>43604552 #>>43605281 #>>43605697 #
mvdtnz ◴[] No.43604552[source]
Any information found in a web search about Newman will be available in the training set (more or less). It's almost certainly a problem of alignment / "safety" causing this issue.
replies(2): >>43604681 #>>43604942 #
1. simonw ◴[] No.43604942[source]
"Any information found in a web search about Newman will be available in the training set"

I don't think that is a safe assumption these days. Training modern LLM isn't about dumping in everything on the Internet. To get a really good model you have to be selective about your sources of training data.

They still rip off vast amounts of copyrighted data, but I get the impression they are increasingly picky about what they dump into their training runs.