DeepSeek: Advancing theorem proving in LLMs through large-scale synthetic data

I was excited for that headline, but I didn't get a clear & fair sense of comparison, like how prompt engineered etc was the comparison, or a false comparison?

There was not enough detail to determine what was ultimately useful & truly better, if anything. So lessons learned near useless. Likewise, I could not tell how useful the fine-tuning was and why, vs basic other tricks that would avoid all this complexity. The work seems good, but I found almost no scientific value in the experimentation and reporting. So I can't comment because there is little to comment on that I normally would. We focus more on the coding & analysis side, logical QA on fuzzier questions, so I am genuinely curious, supportive, am informed, etc, but left frustrated and wanting my time back.