DeepSeek: Advancing theorem proving in LLMs through large-scale synthetic data

1. aabhay ◴[14 Oct 24 17:57 UTC] No.41840024[source]▶

The ability to use automatic verification + synthetic data is basically common knowledge among practitioners. But all these organizations have also explored endlessly the different ways to overfit on such data and the conclusion is the same -- the current model architecture seems to plateau when it comes to multi-step logical reasoning. You either drift from your common knowledge pre-training too far or you never come up with the right steps in instances where there's a vast design space.

Think -- why has nobody been able to make an LLM play Go better than AlphaZero while still retaining language capabilities? It certainly would have orders of magnitude more parameters.

replies(3): >>41840256 #>>41844066 #>>41848037 #

2. danielmarkbruce ◴[14 Oct 24 18:15 UTC] No.41840256[source]▶

>>41840024 (TP) #

AlphaZero is a system including models and search capabilities. This isn't a great example.

replies(2): >>41840329 #>>41845341 #

3. aabhay ◴[14 Oct 24 18:22 UTC] No.41840329[source]▶

>>41840256 #

AlphaZero is not an llm though? Its primarily a convolutional network with some additional fully connected layers that guide an MCTS at inference time.

replies(1): >>41840372 #

4. danielmarkbruce ◴[14 Oct 24 18:26 UTC] No.41840372{3}[source]▶

>>41840329 #

That too! It's definitely not an LLM. It would be a bad architecture choice (almost certainly...). For math an LLM would be a good choice. Arbitrary length input, sequential data, unclear where/which symbols to pay most attention to. Math notation is a human built language just like english.

Search is a great tool for AI, but you have to have a reasonable search space (like chess or go or poker). I'm not close enough to Lean to understand if it can do something like that (i believe not), but math doesn't feel like a thing where you can reasonably "search" next steps, for most problems.

replies(1): >>41843938 #

5. danenania ◴[15 Oct 24 01:11 UTC] No.41843938{4}[source]▶

>>41840372 #

> It would be a bad architecture choice (almost certainly...)

Naively, it would seem like transformers could line up nicely with turn-based games. Instead of mapping tokens to language as in an LLM, they could map to valid moves given the current game state. And then instead of optimizing the next token for linguistic coherence as LLMs do, you optimize for winning the game.

replies(1): >>41845141 #

6. llm_trw ◴[15 Oct 24 01:36 UTC] No.41844066[source]▶

>>41840024 (TP) #

Transformers are fundamentally linear, proofs are either tree like or dag like. You'd need a new attention mechanism to allow hierarchical reasoning.

replies(1): >>41844144 #

7. disconcision ◴[15 Oct 24 01:50 UTC] No.41844144[source]▶

>>41844066 #

not obviously the case. natural language also has tree/dag structure yet transformers do okay with it. encoding higher order structure in linear sequences seems necessary for transformers to be able to do most anything

replies(1): >>41844539 #

8. llm_trw ◴[15 Oct 24 03:10 UTC] No.41844539{3}[source]▶

>>41844144 #

Not to arbitrary depth. Empirically the majority of text has depth less than 6 when represented as a tree. Proofs on the other hand have no a priori limits to their depth.

replies(1): >>41846997 #

9. danielmarkbruce ◴[15 Oct 24 05:10 UTC] No.41845141{5}[source]▶

>>41843938 #

A lot of the games usually used are markov. All the state is right there in front of you, doesn't matter how you got there. As an example - chess - it matters not how you got to state X (... someone will get the edge cases..).

10. michaelnny ◴[15 Oct 24 05:52 UTC] No.41845341[source]▶

>>41840256 #

One important aspect of the success of AlphaGo and its successor is the game environment is closed domain, and has a stable reward function. With this we can guide the agent to do MCTS search and planning for the best move in every state.

However, such reward system is not available for LLM in an open domain setting.

11. looofooo0 ◴[15 Oct 24 10:16 UTC] No.41846997{4}[source]▶

>>41844539 #

But isn't math strucutured in the same way subdivided in lemmas, theorems, corollarys with limited depth.

replies(1): >>41847305 #

12. llm_trw ◴[15 Oct 24 11:09 UTC] No.41847305{5}[source]▶

>>41846997 #

No. It's pretty common to have depth 20+ on even relatively simple proofs when you start looking at decomposing proofs to their axioms.

The old joke about 1 + 1 = 2 being on page 360 of Principia Mathematica still largely holds.

replies(1): >>41847353 #

13. auggierose ◴[15 Oct 24 11:19 UTC] No.41847353{6}[source]▶

>>41847305 #

That depends on the granularity of your proof steps. That old joke doesn't hold for a very long time now (take a look at Isabelle/Isar proofs), yet people keep bringing it up.

replies(1): >>41847620 #

14. llm_trw ◴[15 Oct 24 11:59 UTC] No.41847620{7}[source]▶

>>41847353 #

Doesn't matter what the granularity of your proof step is you still need to understand what that proof step does to be a mathematician. You seem to mistake syntactic brevity for semantic meaning.

replies(1): >>41851005 #

15. ◴[15 Oct 24 12:52 UTC] No.41848037[source]▶

>>41840024 (TP) #

16. auggierose ◴[15 Oct 24 17:39 UTC] No.41851005{8}[source]▶

>>41847620 #

Mathematicians do very large proof steps, and so can the computer these days. If you do something like (proof by auto: thm_1 thm_2 thm_3) and it goes through, then I know why the proof works (because of these theorems), and this is similar to how a mathematician understands something. Syntactic brevity is indeed an indicator that you understand the proof. The better you understand something, the more concise you can formulate and prove it.