Reasoning models reason well, until they don't

(arxiv.org)

Show context

alyxya ◴[31 Oct 25 10:33 UTC] No.45770449[source]▶

The key point the paper seems to make is that existing benchmarks have relatively low complexity on reasoning complexity, so they made a new dataset DeepRD with arbitrarily large reasoning complexity and demonstrated that existing models fail at a complex enough problem. Complexity is defined from the complexity of a graph created by modeling the problem as a graph and determining the traversals needed to go from some source node to a target node.

My main critique is that I don't think there's evidence that this issue would persist after continuing to scale models to be larger and doing more RL. With a harness like what coding agents do these days and with sufficient tool use, I bet models could go much further on that reasoning benchmark. Otherwise, if the reasoning problem were entirely done within a single context window, it's expected that a complex enough reasoning problem would be too difficult for the model to solve.

replies(5): >>45771061 #>>45771156 #>>45772667 #>>45775565 #>>45775741 #

1. tomlockwood ◴[31 Oct 25 12:11 UTC] No.45771156[source]▶

>>45770449 #

So the answer is a few more trillion?

replies(1): >>45771324 #

2. code_martial ◴[31 Oct 25 12:33 UTC] No.45771324[source]▶

>>45771156 (TP) #

It’s a worthwhile answer if it can be proven correct because it means that we’ve found a way to create intelligence, even if that way is not very efficient. It’s still one step better than not knowing how to do so.

replies(2): >>45771753 #>>45772739 #

3. tomlockwood ◴[31 Oct 25 13:27 UTC] No.45771753[source]▶

>>45771324 #

So we're sending a trillion on faith?

replies(1): >>45771805 #

4. code_martial ◴[31 Oct 25 13:32 UTC] No.45771805{3}[source]▶

>>45771753 #

No, that’s not what I said.

replies(1): >>45772216 #

5. tomlockwood ◴[31 Oct 25 14:14 UTC] No.45772216{4}[source]▶

>>45771805 #

Why are we sending the trillion?

replies(1): >>45775705 #

6. usrbinbash ◴[31 Oct 25 15:02 UTC] No.45772739[source]▶

>>45771324 #

> if it can be proven correct

Then the first step would be to prove that this works WITHOUT needing to burn through the trillions to do so.

7. measurablefunc ◴[31 Oct 25 19:22 UTC] No.45775705{5}[source]▶

>>45772216 #

It must be deposited into OpenAI's bank account so that they can then deposit it into NVIDIA's account who can then in turn make a deal w/ OpenAI to deposit it back into OpenAI's account for some stock options. I think you can see how it works from here but if not then maybe one of the scaled up "reasoning" AIs will figure it out for you.

replies(1): >>45778896 #

8. tomlockwood ◴[01 Nov 25 02:57 UTC] No.45778896{6}[source]▶

>>45775705 #

I understand perfectly, thank you!!!

↑