←back to thread

184 points hhs | 1 comments | | HN request time: 0.203s | source
Show context
aabhay ◴[] No.41840024[source]
The ability to use automatic verification + synthetic data is basically common knowledge among practitioners. But all these organizations have also explored endlessly the different ways to overfit on such data and the conclusion is the same -- the current model architecture seems to plateau when it comes to multi-step logical reasoning. You either drift from your common knowledge pre-training too far or you never come up with the right steps in instances where there's a vast design space.

Think -- why has nobody been able to make an LLM play Go better than AlphaZero while still retaining language capabilities? It certainly would have orders of magnitude more parameters.

replies(3): >>41840256 #>>41844066 #>>41848037 #
danielmarkbruce ◴[] No.41840256[source]
AlphaZero is a system including models and search capabilities. This isn't a great example.
replies(2): >>41840329 #>>41845341 #
1. michaelnny ◴[] No.41845341[source]
One important aspect of the success of AlphaGo and its successor is the game environment is closed domain, and has a stable reward function. With this we can guide the agent to do MCTS search and planning for the best move in every state.

However, such reward system is not available for LLM in an open domain setting.