←back to thread

184 points hhs | 1 comments | | HN request time: 0.234s | source
Show context
_flux ◴[] No.41839545[source]
This must be one of the best applications for LLMs, as you can always automatically verify the results, or reject them otherwise, right?
replies(5): >>41839668 #>>41839669 #>>41839866 #>>41839977 #>>41840268 #
1. aithrowawaycomm ◴[] No.41839977[source]
The dependence on huge amounts of synthetic data makes it hard for me to see an application for mathematicians. The theorems being proved here are incredibly boring, more like calculation-heavy homework problems from a professor who is mean to freshmen. (This is not the same thing as what Terence Tao has been looking at, which is more human language -> Lean "translation.") There is a basic problem that new and interesting mathematics does not have the data for ANNs to learn from; this mathematics is very old and there are thousands of examples online. To be extremely pedantic the authors didn't show DeepSeek was good at proofs, but rather at undergraduate competition math problems. I am assuming some of that transfers... but I am a huge critic of LLMs and even so I constantly overestimate them!

However I could see this being useful for verified software development, which usually involves a huge amount of tedious and uninteresting lemmas, and the size of the proofs becomes exhausting. Having an LLM check all 2^8 configurations of something seems worthwhile.