←back to thread

215 points optimalsolver | 1 comments | | HN request time: 0.207s | source
Show context
alyxya ◴[] No.45770449[source]
The key point the paper seems to make is that existing benchmarks have relatively low complexity on reasoning complexity, so they made a new dataset DeepRD with arbitrarily large reasoning complexity and demonstrated that existing models fail at a complex enough problem. Complexity is defined from the complexity of a graph created by modeling the problem as a graph and determining the traversals needed to go from some source node to a target node.

My main critique is that I don't think there's evidence that this issue would persist after continuing to scale models to be larger and doing more RL. With a harness like what coding agents do these days and with sufficient tool use, I bet models could go much further on that reasoning benchmark. Otherwise, if the reasoning problem were entirely done within a single context window, it's expected that a complex enough reasoning problem would be too difficult for the model to solve.

replies(5): >>45771061 #>>45771156 #>>45772667 #>>45775565 #>>45775741 #
tomlockwood ◴[] No.45771156[source]
So the answer is a few more trillion?
replies(1): >>45771324 #
code_martial ◴[] No.45771324[source]
It’s a worthwhile answer if it can be proven correct because it means that we’ve found a way to create intelligence, even if that way is not very efficient. It’s still one step better than not knowing how to do so.
replies(2): >>45771753 #>>45772739 #
1. usrbinbash ◴[] No.45772739[source]
> if it can be proven correct

Then the first step would be to prove that this works WITHOUT needing to burn through the trillions to do so.