DeepSeek: Advancing theorem proving in LLMs through large-scale synthetic data

(arxiv.org)

Show context

aabhay ◴[14 Oct 24 17:57 UTC] No.41840024[source]▶

The ability to use automatic verification + synthetic data is basically common knowledge among practitioners. But all these organizations have also explored endlessly the different ways to overfit on such data and the conclusion is the same -- the current model architecture seems to plateau when it comes to multi-step logical reasoning. You either drift from your common knowledge pre-training too far or you never come up with the right steps in instances where there's a vast design space.

Think -- why has nobody been able to make an LLM play Go better than AlphaZero while still retaining language capabilities? It certainly would have orders of magnitude more parameters.

replies(3): >>41840256 #>>41844066 #>>41848037 #

llm_trw ◴[15 Oct 24 01:36 UTC] No.41844066[source]▶

>>41840024 #

Transformers are fundamentally linear, proofs are either tree like or dag like. You'd need a new attention mechanism to allow hierarchical reasoning.

replies(1): >>41844144 #

disconcision ◴[15 Oct 24 01:50 UTC] No.41844144[source]▶

>>41844066 #

not obviously the case. natural language also has tree/dag structure yet transformers do okay with it. encoding higher order structure in linear sequences seems necessary for transformers to be able to do most anything

replies(1): >>41844539 #

1. llm_trw ◴[15 Oct 24 03:10 UTC] No.41844539[source]▶

>>41844144 #

Not to arbitrary depth. Empirically the majority of text has depth less than 6 when represented as a tree. Proofs on the other hand have no a priori limits to their depth.

replies(1): >>41846997 #

2. looofooo0 ◴[15 Oct 24 10:16 UTC] No.41846997[source]▶

>>41844539 (TP) #

But isn't math strucutured in the same way subdivided in lemmas, theorems, corollarys with limited depth.

replies(1): >>41847305 #

3. llm_trw ◴[15 Oct 24 11:09 UTC] No.41847305[source]▶

>>41846997 #

No. It's pretty common to have depth 20+ on even relatively simple proofs when you start looking at decomposing proofs to their axioms.

The old joke about 1 + 1 = 2 being on page 360 of Principia Mathematica still largely holds.

replies(1): >>41847353 #

4. auggierose ◴[15 Oct 24 11:19 UTC] No.41847353{3}[source]▶

>>41847305 #

That depends on the granularity of your proof steps. That old joke doesn't hold for a very long time now (take a look at Isabelle/Isar proofs), yet people keep bringing it up.

replies(1): >>41847620 #

5. llm_trw ◴[15 Oct 24 11:59 UTC] No.41847620{4}[source]▶

>>41847353 #

Doesn't matter what the granularity of your proof step is you still need to understand what that proof step does to be a mathematician. You seem to mistake syntactic brevity for semantic meaning.

replies(1): >>41851005 #

6. auggierose ◴[15 Oct 24 17:39 UTC] No.41851005{5}[source]▶

>>41847620 #

Mathematicians do very large proof steps, and so can the computer these days. If you do something like (proof by auto: thm_1 thm_2 thm_3) and it goes through, then I know why the proof works (because of these theorems), and this is similar to how a mathematician understands something. Syntactic brevity is indeed an indicator that you understand the proof. The better you understand something, the more concise you can formulate and prove it.

↑