Reasoning models reason well, until they don't

(arxiv.org)

Show context

My_Name ◴[31 Oct 25 11:10 UTC] No.45770715[source]▶

I find that they know what they know fairly well, but if you move beyond that, into what can be reasoned from what they know, they have a profound lack of ability to do that. They are good at repeating their training data, not thinking about it.

The problem, I find, is that they then don't stop, or say they don't know (unless explicitly prompted to do so) they just make stuff up and express it with just as much confidence.

replies(9): >>45770777 #>>45770879 #>>45771048 #>>45771093 #>>45771274 #>>45771331 #>>45771503 #>>45771840 #>>45778422 #

PxldLtd ◴[31 Oct 25 11:34 UTC] No.45770879[source]▶

>>45770715 #

I think a good test of this seems to be to provide an image and get the model to predict what will happen next/if x occurs. They fail spectacularly at Rube-Goldberg machines. I think developing some sort of dedicated prediction model would help massively in extrapolating data. The human subconscious is filled with all sorts of parabolic prediction, gravity, momentum and various other fast-thinking paths that embed these calculations.

replies(2): >>45770967 #>>45771555 #

1. yanis_t ◴[31 Oct 25 11:46 UTC] No.45770967[source]▶

>>45770879 #

Any example of that? One would think that predicting what comes next from an image is basically video generation, which works not perfect, but works somehow (Veo/Sora/Grok)

replies(2): >>45771083 #>>45771523 #

2. PxldLtd ◴[31 Oct 25 12:02 UTC] No.45771083[source]▶

>>45770967 (TP) #

Here's one I made in Veo3.1 since gemini is the only premium AI I have access to.

Using this image - https://www.whimsicalwidgets.com/wp-content/uploads/2023/07/... and the prompt: "Generate a video demonstrating what will happen when a ball rolls down the top left ramp in this scene."

You'll see it struggles - https://streamable.com/5doxh2 , which is often the case with video gen. You have to describe carefully and orchestrate natural feeling motion and interactions.

You're welcome to try with any other models but I suspect very similar results.

replies(2): >>45771168 #>>45775925 #

3. chamomeal ◴[31 Oct 25 12:12 UTC] No.45771168[source]▶

>>45771083 #

I love how it still copies the slow pan and zoom from rube goldberg machine videos, but it's just following along with utter nonsense lol

4. mannykannot ◴[31 Oct 25 13:01 UTC] No.45771523[source]▶

>>45770967 (TP) #

It is video generation, but succeeding at this task involves detailed reasoning about cause and effect to construct chains of events, and may not be something that can be readily completed by applying "intuitions" gained from "watching" lots of typical movies, where most of the events are stereotypical.

5. galaxyLogic ◴[31 Oct 25 19:44 UTC] No.45775925[source]▶

>>45771083 #

A Goldbergs machine was not part of their training data. For humans, we have seem such things.

replies(1): >>45776030 #

6. autoexec ◴[31 Oct 25 19:55 UTC] No.45776030{3}[source]▶

>>45775925 #

physics textbooks are though so it should know how they'd work, or at least know that balls don't spontaneously appear and disappear and that gears don't work when they aren't connected

↑