DeepSeek: Advancing theorem proving in LLMs through large-scale synthetic data

(arxiv.org)

186 points hhs | 1 comments | 14 Oct 24 15:44 UTC | HN request time: 0.202s | source

Show context

aabhay ◴[14 Oct 24 17:57 UTC] No.41840024[source]▶

The ability to use automatic verification + synthetic data is basically common knowledge among practitioners. But all these organizations have also explored endlessly the different ways to overfit on such data and the conclusion is the same -- the current model architecture seems to plateau when it comes to multi-step logical reasoning. You either drift from your common knowledge pre-training too far or you never come up with the right steps in instances where there's a vast design space.

Think -- why has nobody been able to make an LLM play Go better than AlphaZero while still retaining language capabilities? It certainly would have orders of magnitude more parameters.

replies(3): >>41840256 #>>41844066 #>>41848037 #

danielmarkbruce ◴[14 Oct 24 18:15 UTC] No.41840256[source]▶

>>41840024 #

AlphaZero is a system including models and search capabilities. This isn't a great example.

replies(2): >>41840329 #>>41845341 #

1. michaelnny ◴[15 Oct 24 05:52 UTC] No.41845341[source]▶

>>41840256 #

One important aspect of the success of AlphaGo and its successor is the game environment is closed domain, and has a stable reward function. With this we can guide the agent to do MCTS search and planning for the best move in every state.

However, such reward system is not available for LLM in an open domain setting.

↑