DeepSeek: Advancing theorem proving in LLMs through large-scale synthetic data

(arxiv.org)

186 points hhs | 4 comments | 14 Oct 24 15:44 UTC | HN request time: 0.831s | source

Show context

youoy ◴[15 Oct 24 02:21 UTC] No.41844306[source]▶

From the abstract:

> Proof assistants like Lean have revolutionized mathematical proof verification... > Both the synthetic dataset and the model will be made available to facilitate further research in this promising field.

Please don't use "revolutionised", "promising" on a scientific abstract... If it has revolutionised, tell me instead what was the important thing that made it happen. If it is promising, tell me instead why.

Appart from that, the second sentence says:

> Although large language models (LLMs) show promise in mathematical reasoning, their advancement in formal theorem proving is hindered by a lack of training data.

Is it really? Of hindered by unsuitable model architecture? Because you could perefectly say that Lean is an AI capable of theorem proving... I feel that that sentence should be the potential conclusion of the paper, not the first observation. Like, "we used more training data and it works better, if we scale theorem proving training data, we can have the performance that we want, so training data is actually hindering the advancement in formal theorem proving on LLMs".

replies(2): >>41844619 #>>41847136 #

1. versteegen ◴[15 Oct 24 10:39 UTC] No.41847136[source]▶

>>41844306 #

> their advancement in formal theorem proving is hindered by a lack of training data

'Formal' which here means inputs to a proof verifier/proof assistant, and I think most people would accept it as a given that by the extremely data-hungry standards of LLMs there is a severe lack of such data, and that this is a severe hindrance to getting an LLM to generalise. The question is how they might scale with more training data.

replies(1): >>41848923 #

2. Loic ◴[15 Oct 24 14:22 UTC] No.41848923[source]▶

>>41847136 (TP) #

Formally proving with a system based on probabilities (temperature, etc.) is for me an oxymoron. But I am not a mathematician.

Edit: Ok, they generate proofs with the LLM, then they validate them with the Lean 4 environment. I am not sure if this is better than generating cooking recipes with a LLM and asking a cook if they make sense or not.

replies(1): >>41850067 #

3. nrfulton ◴[15 Oct 24 16:11 UTC] No.41850067[source]▶

>>41848923 #

> Formally proving with a system based on probabilities (temperature, etc.) is for me an oxymoron. But I am not a mathematician.

In the context of automated theorem proving, proof search and proof checking could not be more different.

Proof search is undecidable for most logics. For that reason, heuristics are commonly used to guide proof search. This has always been the case; heuristic proof search predates the use of LLMs for theorem proving by decades.

Proof checking, on the other hand, is not just decidable but usually quite tractable and cheap. Therefore, proof checkers are typically deterministic and fast. Ideally the checker is also implemented in small amount of relatively simple and heavily audited code.

For a high-level description of how provers are typically architected, see https://www.pls-lab.org/en/de_Bruijn_criterion (or https://lawrencecpaulson.github.io/2022/01/05/LCF.html which is linked to at the bottom of the above page and goes into a bit more historical detail and nuance).

> I am not sure if this is better than generating cooking recipes with a LLM and asking a cook if they make sense or not.

Conceptually it's not at all similar to asking a cook if a recipe is "correct". For lean proofs, "correct" has an entirely precise meaning. The "cook" (i.e., proof checker) is basically free (much less than $0.01/hr). The cook (proof checker) takes at most a few seconds to read and respond to any particular recipe. And the cook (i.e., checker) never makes mistakes.

The relationship between asking a cook if a recipe is good and asking Lean to check a proof is approximately the difference between asking a mediocre human computer to compute a derivative and asking a modern mechanical computer to do the same (using a classical algorithm for doing so, not an LLM; no one is using LLMs to check proofs -- there's really no point to doing so, because the checking problem is easy).

replies(1): >>41928526 #

4. Loic ◴[23 Oct 24 19:31 UTC] No.41928526{3}[source]▶

>>41850067 #

Thank you for this very detailed answer.

↑