Recent AI model progress feels mostly like bullshit

(www.lesswrong.com)

579 points paulpauper | 4 comments | 06 Apr 25 18:01 UTC | HN request time: 1.062s | source

Show context

aerhardt ◴[06 Apr 25 19:30 UTC] No.43604214[source]▶

My mom told me yesterday that Paul Newman had massive problems with alcohol. I was somewhat skeptical, so this morning I asked ChatGPT a very simple question:

"Is Paul Newman known for having had problems with alcohol?"

All of the models up to o3-mini-high told me he had no known problems. Here's o3-mini-high's response:

"Paul Newman is not widely known for having had problems with alcohol. While he portrayed characters who sometimes dealt with personal struggles on screen, his personal life and public image were more focused on his celebrated acting career, philanthropic work, and passion for auto racing rather than any issues with alcohol. There is no substantial or widely reported evidence in reputable biographies or interviews that indicates he struggled with alcohol abuse."

There is plenty of evidence online that he struggled a lot with alcohol, including testimony from his long-time wife Joanne Woodward.

I sent my mom the ChatGPT reply and in five minutes she found an authoritative source to back her argument [1].

I use ChatGPT for many tasks every day, but I couldn't fathom that it would get so wrong something so simple.

Lesson(s) learned... Including not doubting my mother's movie trivia knowledge.

[1] https://www.newyorker.com/magazine/2022/10/24/who-paul-newma...

replies(27): >>43604240 #>>43604254 #>>43604266 #>>43604352 #>>43604411 #>>43604434 #>>43604445 #>>43604447 #>>43604474 #>>43605109 #>>43605148 #>>43605609 #>>43605734 #>>43605773 #>>43605938 #>>43605941 #>>43606141 #>>43606176 #>>43606197 #>>43606455 #>>43606465 #>>43606551 #>>43606632 #>>43606774 #>>43606870 #>>43606938 #>>43607090 #

fnordpiglet ◴[06 Apr 25 19:59 UTC] No.43604445[source]▶

>>43604214 #

This is less an LLM thing than an information retrieval question. If you choose a model and tell it to “Search,” you find citation based analysis that discusses that he indeed had problems with alcohol. I do find it interesting it quibbles whether he was an alcoholic or not - it seems pretty clear from the rest that he was - but regardless. This is indicative of something crucial when placing LLMs into a toolkit. They are not omniscient nor are they deductive reasoning tools. Information retrieval systems are excellent at information retrieval and should be used for information retrieval. Solvers are excellent at solving deductive problems. Use them. The better they get at these tasks alone is cool but is IMO a parlor trick since we have nearly optimal or actually optimal techniques that don’t need an LLM. The LLM should use those tools. So, click search next time you have an information retrieval question. https://chatgpt.com/share/67f2dac0-3478-8000-9055-2ae5347037...

replies(3): >>43604552 #>>43605281 #>>43605697 #

1. Vanit ◴[06 Apr 25 21:54 UTC] No.43605281[source]▶

>>43604445 #

I realise your answer wasn't assertive, but if I heard this from someone actively defending AI it would be a copout. If the selling point is that you can ask these AIs anything then one can't retroactively go "oh but not that" when a particular query doesn't pan out.

replies(2): >>43606397 #>>43670143 #

2. philomath_mn ◴[07 Apr 25 00:59 UTC] No.43606397[source]▶

>>43605281 (TP) #

This is a bit of a strawman. There are certainly people who claim that you can ask AIs anything but I don't think the parent commenter ever made that claim.

"AI is making incredible progress but still struggles with certain subsets of tasks" is self-consistent position.

replies(1): >>43607080 #

3. skywhopper ◴[07 Apr 25 02:41 UTC] No.43607080[source]▶

>>43606397 #

It’s not the position of any major AI company, curiously.

4. fnordpiglet ◴[13 Apr 25 04:41 UTC] No.43670143[source]▶

>>43605281 (TP) #

My point is the opposite of this point of view. I believe generative AI is the most significant advance since hypertext and the overlay of inferred semantic relationships via pagerank etc. In fact the creation of hypertext and the toolchains around it led to this point at all - neural networks were understood at that point and transformer attention is just an innovation. It’s the collective human assembly of language and visual interconnected knowledge at a pan cultural and global scale that enabled the current state.

The abilities of LLM alone to do astounding natural language processing beyond the ability of anything prior by unthinkable Turing test passing miles. The fact it can reason abductively, which computing techniques to date have been unable to is amazing. The fact you can mix it with multimodal regimes - images, motion, virtually anything that can be semantically linked via language, is breathtaking. The fact it can be augmented with prior computing techniques - IR, optimization, deductive solvers, and literally everything we’ve achieved to date should give anyone knowledgeable of such things shivers for what the future holds.

But I would never hold that generative AI techniques are replacements for known optimal techniques. But the ensemble is probably the solution to nearly every challenge we face. When we hit the limits of LLMs today, I think, well, at least we already have grand master beating chess solvers and it’s irrelevant the LLM can’t directly. The LLM and other generative AI techniques in my mind are like gasses that fill through learned approximation the things we’ve not been able to solve directly, including the assembly of those solutions ad hoc. This is why since the first time BERT came along I knew agent based techniques were the future.

Right now we live at time like early hypertext with respect to AI. Toolchains suck, LLMs are basically geocities pages with “under construction” signs. We will go through an explosive exploration, some stunning insights that’ll change the basic nature of our shared reality (some wonderful some insidious), then if we aren’t careful - and we rarely are - enshitification at scale unseen before.

↑