(arxiv.org)

217 points optimalsolver | 1 comments | 31 Oct 25 09:23 UTC | HN request time: 0.255s | source

Show context

My_Name ◴[31 Oct 25 11:10 UTC] No.45770715[source]▶

I find that they know what they know fairly well, but if you move beyond that, into what can be reasoned from what they know, they have a profound lack of ability to do that. They are good at repeating their training data, not thinking about it.

The problem, I find, is that they then don't stop, or say they don't know (unless explicitly prompted to do so) they just make stuff up and express it with just as much confidence.

replies(9): >>45770777 #>>45770879 #>>45771048 #>>45771093 #>>45771274 #>>45771331 #>>45771503 #>>45771840 #>>45778422 #

PxldLtd ◴[31 Oct 25 11:34 UTC] No.45770879[source]▶

>>45770715 #

I think a good test of this seems to be to provide an image and get the model to predict what will happen next/if x occurs. They fail spectacularly at Rube-Goldberg machines. I think developing some sort of dedicated prediction model would help massively in extrapolating data. The human subconscious is filled with all sorts of parabolic prediction, gravity, momentum and various other fast-thinking paths that embed these calculations.

replies(2): >>45770967 #>>45771555 #

pfortuny ◴[31 Oct 25 13:05 UTC] No.45771555[source]▶

>>45770879 #

Most amazing is asking any of the models to draw an 11-sided polygon and number the edges.

replies(1): >>45771707 #

Torkel ◴[31 Oct 25 13:21 UTC] No.45771707[source]▶

>>45771555 #

I asked gpt5, and it worked really well with a correct result. Did you expect it to fail?

replies(1): >>45784360 #

1. pfortuny ◴[01 Nov 25 19:04 UTC] No.45784360[source]▶

>>45771707 #

It has failed me several times already, drawing at most an octagon or a 12-gon: I mean create an image, not a program to do it.

↑

Reasoning models reason well, until they don't