←back to thread

184 points hhs | 6 comments | | HN request time: 1.295s | source | bottom
Show context
aabhay ◴[] No.41840024[source]
The ability to use automatic verification + synthetic data is basically common knowledge among practitioners. But all these organizations have also explored endlessly the different ways to overfit on such data and the conclusion is the same -- the current model architecture seems to plateau when it comes to multi-step logical reasoning. You either drift from your common knowledge pre-training too far or you never come up with the right steps in instances where there's a vast design space.

Think -- why has nobody been able to make an LLM play Go better than AlphaZero while still retaining language capabilities? It certainly would have orders of magnitude more parameters.

replies(3): >>41840256 #>>41844066 #>>41848037 #
llm_trw ◴[] No.41844066[source]
Transformers are fundamentally linear, proofs are either tree like or dag like. You'd need a new attention mechanism to allow hierarchical reasoning.
replies(1): >>41844144 #
disconcision ◴[] No.41844144[source]
not obviously the case. natural language also has tree/dag structure yet transformers do okay with it. encoding higher order structure in linear sequences seems necessary for transformers to be able to do most anything
replies(1): >>41844539 #
1. llm_trw ◴[] No.41844539[source]
Not to arbitrary depth. Empirically the majority of text has depth less than 6 when represented as a tree. Proofs on the other hand have no a priori limits to their depth.
replies(1): >>41846997 #
2. looofooo0 ◴[] No.41846997[source]
But isn't math strucutured in the same way subdivided in lemmas, theorems, corollarys with limited depth.
replies(1): >>41847305 #
3. llm_trw ◴[] No.41847305[source]
No. It's pretty common to have depth 20+ on even relatively simple proofs when you start looking at decomposing proofs to their axioms.

The old joke about 1 + 1 = 2 being on page 360 of Principia Mathematica still largely holds.

replies(1): >>41847353 #
4. auggierose ◴[] No.41847353{3}[source]
That depends on the granularity of your proof steps. That old joke doesn't hold for a very long time now (take a look at Isabelle/Isar proofs), yet people keep bringing it up.
replies(1): >>41847620 #
5. llm_trw ◴[] No.41847620{4}[source]
Doesn't matter what the granularity of your proof step is you still need to understand what that proof step does to be a mathematician. You seem to mistake syntactic brevity for semantic meaning.
replies(1): >>41851005 #
6. auggierose ◴[] No.41851005{5}[source]
Mathematicians do very large proof steps, and so can the computer these days. If you do something like (proof by auto: thm_1 thm_2 thm_3) and it goes through, then I know why the proof works (because of these theorems), and this is similar to how a mathematician understands something. Syntactic brevity is indeed an indicator that you understand the proof. The better you understand something, the more concise you can formulate and prove it.