Recent AI model progress feels mostly like bullshit

(www.lesswrong.com)

579 points paulpauper | 2 comments | 06 Apr 25 18:01 UTC | HN request time: 0.41s | source

Show context

lukev ◴[06 Apr 25 19:34 UTC] No.43604244[source]▶

This is a bit of a meta-comment, but reading through the responses to a post like this is really interesting because it demonstrates how our collective response to this stuff is (a) wildly divergent and (b) entirely anecdote-driven.

I have my own opinions, but I can't really say that they're not also based on anecdotes and personal decision-making heuristics.

But some of us are going to end up right and some of us are going to end up wrong and I'm really curious what features signal an ability to make "better choices" w/r/t AI, even if we don't know (or can't prove) what "better" is yet.

replies(10): >>43604396 #>>43604472 #>>43604738 #>>43604923 #>>43605009 #>>43605865 #>>43606458 #>>43608665 #>>43609144 #>>43612137 #

1. FiniteIntegral ◴[06 Apr 25 20:00 UTC] No.43604472[source]▶

>>43604244 #

It's not surprising that responses are anecdotal. An easy way to communicate a generic sentiment often requires being brief.

A majority of what makes a "better AI" can be condensed to how effective the slope-gradient algorithms are at getting the local maxima we want it to get to. Until a generative model shows actual progress of "making decisions" it will forever be seen as a glorified linear algebra solver. Generative machine learning is all about giving a pleasing answer to the end user, not about creating something that is on the level of human decision making.

replies(1): >>43608159 #

2. code_biologist ◴[07 Apr 25 05:45 UTC] No.43608159[source]▶

>>43604472 (TP) #

At risk of being annoying, answers that feel like high quality human decision making are extremely pleasing and desirable. In the same way, image generators aren't generating six fingered hands because they think it's more pleasing, they're doing it because they're trying to please and not good enough yet.

I'm just most baffled by the "flashes of brilliance" combined with utter stupidity. I remember having a run with early GPT 4 (gpt-4-0314) where it did refactoring work that amazed me. In the past few days I asked a bunch of AIs about similar characters between a popular gacha mobile game and a popular TV show. OpenAI's models were terrible and hallucinated aggressively (4, 4o, 4.5, o3-mini, o3-mini-high), with the exception of o1. DeepSeek R1 only mildly hallucinated and gave bad answers. Gemini 2.5 was the only flagship model that did not hallucinate and gave some decent answers.

I probably should have used some type of grounding, but I honestly assumed the stuff I was asking about should have been in their training datasets.

↑