Most active commenters

fnordpiglet(5)

Recent AI model progress feels mostly like bullshit

(www.lesswrong.com)

Show context

aerhardt ◴[06 Apr 25 19:30 UTC] No.43604214[source]▶

My mom told me yesterday that Paul Newman had massive problems with alcohol. I was somewhat skeptical, so this morning I asked ChatGPT a very simple question:

"Is Paul Newman known for having had problems with alcohol?"

All of the models up to o3-mini-high told me he had no known problems. Here's o3-mini-high's response:

"Paul Newman is not widely known for having had problems with alcohol. While he portrayed characters who sometimes dealt with personal struggles on screen, his personal life and public image were more focused on his celebrated acting career, philanthropic work, and passion for auto racing rather than any issues with alcohol. There is no substantial or widely reported evidence in reputable biographies or interviews that indicates he struggled with alcohol abuse."

There is plenty of evidence online that he struggled a lot with alcohol, including testimony from his long-time wife Joanne Woodward.

I sent my mom the ChatGPT reply and in five minutes she found an authoritative source to back her argument [1].

I use ChatGPT for many tasks every day, but I couldn't fathom that it would get so wrong something so simple.

Lesson(s) learned... Including not doubting my mother's movie trivia knowledge.

[1] https://www.newyorker.com/magazine/2022/10/24/who-paul-newma...

replies(27): >>43604240 #>>43604254 #>>43604266 #>>43604352 #>>43604411 #>>43604434 #>>43604445 #>>43604447 #>>43604474 #>>43605109 #>>43605148 #>>43605609 #>>43605734 #>>43605773 #>>43605938 #>>43605941 #>>43606141 #>>43606176 #>>43606197 #>>43606455 #>>43606465 #>>43606551 #>>43606632 #>>43606774 #>>43606870 #>>43606938 #>>43607090 #

1. fnordpiglet ◴[06 Apr 25 19:59 UTC] No.43604445[source]▶

>>43604214 #

This is less an LLM thing than an information retrieval question. If you choose a model and tell it to “Search,” you find citation based analysis that discusses that he indeed had problems with alcohol. I do find it interesting it quibbles whether he was an alcoholic or not - it seems pretty clear from the rest that he was - but regardless. This is indicative of something crucial when placing LLMs into a toolkit. They are not omniscient nor are they deductive reasoning tools. Information retrieval systems are excellent at information retrieval and should be used for information retrieval. Solvers are excellent at solving deductive problems. Use them. The better they get at these tasks alone is cool but is IMO a parlor trick since we have nearly optimal or actually optimal techniques that don’t need an LLM. The LLM should use those tools. So, click search next time you have an information retrieval question. https://chatgpt.com/share/67f2dac0-3478-8000-9055-2ae5347037...

replies(3): >>43604552 #>>43605281 #>>43605697 #

2. mvdtnz ◴[06 Apr 25 20:09 UTC] No.43604552[source]▶

>>43604445 (TP) #

Any information found in a web search about Newman will be available in the training set (more or less). It's almost certainly a problem of alignment / "safety" causing this issue.

replies(2): >>43604681 #>>43604942 #

3. fnordpiglet ◴[06 Apr 25 20:27 UTC] No.43604681[source]▶

>>43604552 #

There’s a simpler explanation than that’s that the model weights aren’t an information retrieval system and other sequences of tokens are more likely given the totality of training data. This is why for an information retrieval task you use an information retrieval tool similarly to how for driving nails you use a hammer rather than a screw driver. It may very well be you could drive the nail with the screw driver, but why?

replies(1): >>43604857 #

4. mvdtnz ◴[06 Apr 25 20:49 UTC] No.43604857{3}[source]▶

>>43604681 #

You think that's a simpler explanation? Ok. I think given the amount of effort that goes into "safety" on these systems that my explanation is vastly more likely than somehow this information got lost in the vector soup despite being attached to his name at the top of every search result[0].

0 https://www.google.com/search?q=did+paul+newman+have+a+drink...

replies(1): >>43605541 #

5. simonw ◴[06 Apr 25 21:02 UTC] No.43604942[source]▶

>>43604552 #

"Any information found in a web search about Newman will be available in the training set"

I don't think that is a safe assumption these days. Training modern LLM isn't about dumping in everything on the Internet. To get a really good model you have to be selective about your sources of training data.

They still rip off vast amounts of copyrighted data, but I get the impression they are increasingly picky about what they dump into their training runs.

6. Vanit ◴[06 Apr 25 21:54 UTC] No.43605281[source]▶

>>43604445 (TP) #

I realise your answer wasn't assertive, but if I heard this from someone actively defending AI it would be a copout. If the selling point is that you can ask these AIs anything then one can't retroactively go "oh but not that" when a particular query doesn't pan out.

replies(2): >>43606397 #>>43670143 #

7. fnordpiglet ◴[06 Apr 25 22:34 UTC] No.43605541{4}[source]▶

>>43604857 #

Except if safety blocked this, it would have also blocked the linked conversation. Alignment definitely distorts behaviors of models, but treating them as information retrieval systems is using a screw driver to drive nails. Your example didn’t refute this.

8. terminatornet ◴[06 Apr 25 22:58 UTC] No.43605697[source]▶

>>43604445 (TP) #

lotta words here to say AI can't do basic search right

replies(1): >>43670146 #

9. philomath_mn ◴[07 Apr 25 00:59 UTC] No.43606397[source]▶

>>43605281 #

This is a bit of a strawman. There are certainly people who claim that you can ask AIs anything but I don't think the parent commenter ever made that claim.

"AI is making incredible progress but still struggles with certain subsets of tasks" is self-consistent position.

replies(1): >>43607080 #

10. skywhopper ◴[07 Apr 25 02:41 UTC] No.43607080{3}[source]▶

>>43606397 #

It’s not the position of any major AI company, curiously.

11. fnordpiglet ◴[13 Apr 25 04:41 UTC] No.43670143[source]▶

>>43605281 #

My point is the opposite of this point of view. I believe generative AI is the most significant advance since hypertext and the overlay of inferred semantic relationships via pagerank etc. In fact the creation of hypertext and the toolchains around it led to this point at all - neural networks were understood at that point and transformer attention is just an innovation. It’s the collective human assembly of language and visual interconnected knowledge at a pan cultural and global scale that enabled the current state.

The abilities of LLM alone to do astounding natural language processing beyond the ability of anything prior by unthinkable Turing test passing miles. The fact it can reason abductively, which computing techniques to date have been unable to is amazing. The fact you can mix it with multimodal regimes - images, motion, virtually anything that can be semantically linked via language, is breathtaking. The fact it can be augmented with prior computing techniques - IR, optimization, deductive solvers, and literally everything we’ve achieved to date should give anyone knowledgeable of such things shivers for what the future holds.

But I would never hold that generative AI techniques are replacements for known optimal techniques. But the ensemble is probably the solution to nearly every challenge we face. When we hit the limits of LLMs today, I think, well, at least we already have grand master beating chess solvers and it’s irrelevant the LLM can’t directly. The LLM and other generative AI techniques in my mind are like gasses that fill through learned approximation the things we’ve not been able to solve directly, including the assembly of those solutions ad hoc. This is why since the first time BERT came along I knew agent based techniques were the future.

Right now we live at time like early hypertext with respect to AI. Toolchains suck, LLMs are basically geocities pages with “under construction” signs. We will go through an explosive exploration, some stunning insights that’ll change the basic nature of our shared reality (some wonderful some insidious), then if we aren’t careful - and we rarely are - enshitification at scale unseen before.

12. fnordpiglet ◴[13 Apr 25 04:42 UTC] No.43670146[source]▶

>>43605697 #

Lotta words to say AI can’t do basic search in the same way a web browser can’t do basic search, but given a search engine both can.

replies(1): >>43699601 #

13. terminatornet ◴[15 Apr 25 23:22 UTC] No.43699601{3}[source]▶

>>43670146 #

i don't know what this means

↑