←back to thread

184 points hhs | 7 comments | | HN request time: 1.193s | source | bottom
1. _flux ◴[] No.41839545[source]
This must be one of the best applications for LLMs, as you can always automatically verify the results, or reject them otherwise, right?
replies(5): >>41839668 #>>41839669 #>>41839866 #>>41839977 #>>41840268 #
2. yuhfdr ◴[] No.41839866[source]
Same with code generation.

But generating useless code, or proofs, just to discard them is hardly a consequence and externality free effort.

replies(1): >>41839910 #
3. kkzz99 ◴[] No.41839910[source]
But you can far more easily scale llm generation compared to human researchers.
replies(1): >>41839976 #
4. yuhfdr ◴[] No.41839976{3}[source]
Sloppyjoes always assume the best case for llms and worst case for their counter example of choice.

10 trillion llms, powered by a Dyson sphere, that still output slop is still worse than one unappreciated post doc.

replies(1): >>41841100 #
5. aithrowawaycomm ◴[] No.41839977[source]
The dependence on huge amounts of synthetic data makes it hard for me to see an application for mathematicians. The theorems being proved here are incredibly boring, more like calculation-heavy homework problems from a professor who is mean to freshmen. (This is not the same thing as what Terence Tao has been looking at, which is more human language -> Lean "translation.") There is a basic problem that new and interesting mathematics does not have the data for ANNs to learn from; this mathematics is very old and there are thousands of examples online. To be extremely pedantic the authors didn't show DeepSeek was good at proofs, but rather at undergraduate competition math problems. I am assuming some of that transfers... but I am a huge critic of LLMs and even so I constantly overestimate them!

However I could see this being useful for verified software development, which usually involves a huge amount of tedious and uninteresting lemmas, and the size of the proofs becomes exhausting. Having an LLM check all 2^8 configurations of something seems worthwhile.

6. danielmarkbruce ◴[] No.41840268[source]
Any situation involving the ability to simulate (like, weather models, bio models, physics models) is potentially a good application. It's still hard to make a good model to approximate an extremely complex function.
7. bluechair ◴[] No.41841100{4}[source]
That was funny even if I don’t agree with the sentiment.