DeepSeek: Advancing theorem proving in LLMs through large-scale synthetic data

1. _flux ◴[14 Oct 24 17:10 UTC] No.41839545[source]▶

This must be one of the best applications for LLMs, as you can always automatically verify the results, or reject them otherwise, right?

replies(5): >>41839668 #>>41839669 #>>41839866 #>>41839977 #>>41840268 #

2. yuhfdr ◴[14 Oct 24 17:43 UTC] No.41839866[source]▶

>>41839545 (TP) #

Same with code generation.

But generating useless code, or proofs, just to discard them is hardly a consequence and externality free effort.

replies(1): >>41839910 #

3. kkzz99 ◴[14 Oct 24 17:46 UTC] No.41839910[source]▶

>>41839866 #

But you can far more easily scale llm generation compared to human researchers.

replies(1): >>41839976 #

4. yuhfdr ◴[14 Oct 24 17:53 UTC] No.41839976{3}[source]▶

>>41839910 #

Sloppyjoes always assume the best case for llms and worst case for their counter example of choice.

10 trillion llms, powered by a Dyson sphere, that still output slop is still worse than one unappreciated post doc.

replies(1): >>41841100 #

5. aithrowawaycomm ◴[14 Oct 24 17:53 UTC] No.41839977[source]▶

>>41839545 (TP) #

The dependence on huge amounts of synthetic data makes it hard for me to see an application for mathematicians. The theorems being proved here are incredibly boring, more like calculation-heavy homework problems from a professor who is mean to freshmen. (This is not the same thing as what Terence Tao has been looking at, which is more human language -> Lean "translation.") There is a basic problem that new and interesting mathematics does not have the data for ANNs to learn from; this mathematics is very old and there are thousands of examples online. To be extremely pedantic the authors didn't show DeepSeek was good at proofs, but rather at undergraduate competition math problems. I am assuming some of that transfers... but I am a huge critic of LLMs and even so I constantly overestimate them!

However I could see this being useful for verified software development, which usually involves a huge amount of tedious and uninteresting lemmas, and the size of the proofs becomes exhausting. Having an LLM check all 2^8 configurations of something seems worthwhile.

6. danielmarkbruce ◴[14 Oct 24 18:16 UTC] No.41840268[source]▶

>>41839545 (TP) #

Any situation involving the ability to simulate (like, weather models, bio models, physics models) is potentially a good application. It's still hard to make a good model to approximate an extremely complex function.

7. bluechair ◴[14 Oct 24 19:41 UTC] No.41841100{4}[source]▶

>>41839976 #

That was funny even if I don’t agree with the sentiment.