I say schedule because the “static data once through” is the root of the problem in my mind is one of the root problems.
Think about what happens when you read something like a book. You’re not “just” reading it, you’re also comparing it to other books, other books by the same author, while critically considering the book recommendations made by your friend. Any events in the book get compared to your life experience, etc…
LLM training does none of this! It’s a once-through text prediction training regime.
What this means in practice is that an LLM can’t write a review of a book unless it has read many reviews already. They have, of course, but the problem doesn’t go away. Ask an AI to critique book reviews and it’ll run out of steam because it hasn’t seen many of those. Critiques of critiques is where they start falling flat on their face.
This kind of meta-knowledge is precisely what experts accumulate.
As a programmer I don’t just regurgitate code I’ve seen before with slight variations — instead I know that mainstream criticisms of micro services misses their key benefit of extreme team scalability!
This is the crux of it: when humans read their training material they are generating an “n+1” level in their mind that they also learn. The current AI training setup trains the AI only the “n”th level.
This can be solved by running the training in a loop for several iterations after base training. The challenge of course is to develop a meaningful loss function.
IMHO the “thinking” model training is a step in the right direction but nowhere near enough to produce AGI all by itself.