←back to thread

579 points paulpauper | 1 comments | | HN request time: 0.2s | source
Show context
aerhardt ◴[] No.43604214[source]
My mom told me yesterday that Paul Newman had massive problems with alcohol. I was somewhat skeptical, so this morning I asked ChatGPT a very simple question:

"Is Paul Newman known for having had problems with alcohol?"

All of the models up to o3-mini-high told me he had no known problems. Here's o3-mini-high's response:

"Paul Newman is not widely known for having had problems with alcohol. While he portrayed characters who sometimes dealt with personal struggles on screen, his personal life and public image were more focused on his celebrated acting career, philanthropic work, and passion for auto racing rather than any issues with alcohol. There is no substantial or widely reported evidence in reputable biographies or interviews that indicates he struggled with alcohol abuse."

There is plenty of evidence online that he struggled a lot with alcohol, including testimony from his long-time wife Joanne Woodward.

I sent my mom the ChatGPT reply and in five minutes she found an authoritative source to back her argument [1].

I use ChatGPT for many tasks every day, but I couldn't fathom that it would get so wrong something so simple.

Lesson(s) learned... Including not doubting my mother's movie trivia knowledge.

[1] https://www.newyorker.com/magazine/2022/10/24/who-paul-newma...

replies(27): >>43604240 #>>43604254 #>>43604266 #>>43604352 #>>43604411 #>>43604434 #>>43604445 #>>43604447 #>>43604474 #>>43605109 #>>43605148 #>>43605609 #>>43605734 #>>43605773 #>>43605938 #>>43605941 #>>43606141 #>>43606176 #>>43606197 #>>43606455 #>>43606465 #>>43606551 #>>43606632 #>>43606774 #>>43606870 #>>43606938 #>>43607090 #
jcims ◴[] No.43606176[source]
I just had Cursor Pro + Sonnet 3.7 Max one shot a python script to send this question to every model available through groq.

>Found 24 models: llama3-70b-8192, llama-3.2-3b-preview, meta-llama/llama-4-scout-17b-16e-instruct, allam-2-7b, llama-guard-3-8b, qwen-qwq-32b, llama-3.2-1b-preview, playai-tts-arabic, deepseek-r1-distill-llama-70b, llama-3.1-8b-instant, llama3-8b-8192, qwen-2.5-coder-32b, distil-whisper-large-v3-en, qwen-2.5-32b, llama-3.2-90b-vision-preview, deepseek-r1-distill-qwen-32b, whisper-large-v3, llama-3.3-70b-specdec, llama-3.3-70b-versatile, playai-tts, whisper-large-v3-turbo, llama-3.2-11b-vision-preview, mistral-saba-24b, gemma2-9b-it

Excluding the ones that do not support chat completions, all but one (qwen-qwq-32b) answered in the affirmative. The answer from qwen-qwq-32b said:

    Paul Newman, the renowned actor and humanitarian, did not have a widely publicized 
    struggle with alcohol addiction throughout most of his life, but there were 
    specific instances that indicated challenges.
Using lack of progress in a specialized field as a barometer for overall progress is kind of silly. I just spent the last few days 'vibe coding' an application and I have to say that it's pretty remarkable how capable it is now relative to my experience last year.

It took three minutes for me to do the above from the time I created my API key to when I had an answer.

replies(1): >>43606252 #
mnky9800n ◴[] No.43606252[source]
I find that everyone who replies with examples like this is an expert using expert skills to get the LLM to perform. Which makes me think why is this a skill that is useful to general public as opposed to another useful skill for technical knowledge workers to add to their tool belt?
replies(1): >>43606582 #
1. jcims ◴[] No.43606582[source]
I agree. But I will say that at least in my social circles I'm finding that a lot of people outside of tech are using these tools, and almost all of them seem to have a healthy skepticism about the information they get back. The ones that don't will learn one way or the other.