Reasoning models reason well, until they don't

(arxiv.org)

214 points optimalsolver | 1 comments | 31 Oct 25 09:23 UTC | HN request time: 0.311s | source

Show context

My_Name ◴[31 Oct 25 11:10 UTC] No.45770715[source]▶

I find that they know what they know fairly well, but if you move beyond that, into what can be reasoned from what they know, they have a profound lack of ability to do that. They are good at repeating their training data, not thinking about it.

The problem, I find, is that they then don't stop, or say they don't know (unless explicitly prompted to do so) they just make stuff up and express it with just as much confidence.

replies(9): >>45770777 #>>45770879 #>>45771048 #>>45771093 #>>45771274 #>>45771331 #>>45771503 #>>45771840 #>>45778422 #

ftalbot ◴[31 Oct 25 11:18 UTC] No.45770777[source]▶

>>45770715 #

Every token in a response has an element of randomness to it. This means they’re non-deterministic. Even if you set up something within their training data there is some chance that you could get a nonsense, opposite, and/or dangerous result. The chance of that may be low because of things being set up for it to review its result, but there is no way to make a non-deterministic answer fully bound to solving or reasoning anything assuredly, given enough iterations. It is designed to be imperfect.

replies(4): >>45770905 #>>45771745 #>>45774081 #>>45775980 #

1. mannykannot ◴[31 Oct 25 13:26 UTC] No.45771745[source]▶

>>45770777 #

There seems to be more to it than that - in my experience with LLMs, they are good at finding some relevant facts but then quite often present a non-sequitur for a conclusion, and the article's title alone indicates that the problem for LRMs is similar: a sudden fall-off in performance as the task gets more difficult. If the issue was just non-determinism, I would expect the errors to be more evenly distributed, though I suppose one could argue that the sensitivity to non-determinism increases non-linearly.

↑