> Yann Lecun ... argued that because language models generate outputs token-by-token, and each token introduces a new probability of error, if we generate outputs that are too long, this per-token error will compound to inevitable failure.
That seems like a poor argument. Each word a human utters also has a chance of being wrong, yet somehow we have been successful overall.
replies(1):