←back to thread

214 points optimalsolver | 2 comments | | HN request time: 0s | source
Show context
alyxya ◴[] No.45770449[source]
The key point the paper seems to make is that existing benchmarks have relatively low complexity on reasoning complexity, so they made a new dataset DeepRD with arbitrarily large reasoning complexity and demonstrated that existing models fail at a complex enough problem. Complexity is defined from the complexity of a graph created by modeling the problem as a graph and determining the traversals needed to go from some source node to a target node.

My main critique is that I don't think there's evidence that this issue would persist after continuing to scale models to be larger and doing more RL. With a harness like what coding agents do these days and with sufficient tool use, I bet models could go much further on that reasoning benchmark. Otherwise, if the reasoning problem were entirely done within a single context window, it's expected that a complex enough reasoning problem would be too difficult for the model to solve.

replies(5): >>45771061 #>>45771156 #>>45772667 #>>45775565 #>>45775741 #
tomlockwood ◴[] No.45771156[source]
So the answer is a few more trillion?
replies(1): >>45771324 #
code_martial ◴[] No.45771324[source]
It’s a worthwhile answer if it can be proven correct because it means that we’ve found a way to create intelligence, even if that way is not very efficient. It’s still one step better than not knowing how to do so.
replies(2): >>45771753 #>>45772739 #
tomlockwood ◴[] No.45771753{3}[source]
So we're sending a trillion on faith?
replies(1): >>45771805 #
code_martial ◴[] No.45771805{4}[source]
No, that’s not what I said.
replies(1): >>45772216 #
tomlockwood ◴[] No.45772216{5}[source]
Why are we sending the trillion?
replies(1): >>45775705 #
1. measurablefunc ◴[] No.45775705{6}[source]
It must be deposited into OpenAI's bank account so that they can then deposit it into NVIDIA's account who can then in turn make a deal w/ OpenAI to deposit it back into OpenAI's account for some stock options. I think you can see how it works from here but if not then maybe one of the scaled up "reasoning" AIs will figure it out for you.
replies(1): >>45778896 #
2. tomlockwood ◴[] No.45778896[source]
I understand perfectly, thank you!!!