> I think they are betting on what they might be able to do in the future.
Yeah, blind hope and a bit of smoke and lighting.
> but I don't think we've reached the limits with synthetic data
Synthetic data, at least for visual stuff can, in some cases provide the majority of training data. For $work, we can have say 100k video sequences to train a model, they can then be fine tuned on say 2k real videos. That gets it to be slightly under the same quality as if it was train on pure real video.
So I'm not that hopeful that synthetic data will provide a breakthrough.
I think the current architecture of LLMs are the limitation. They are fundamentally a sequence machine and are not capable of short, or medium term learning. context windows kinda makes up for that, but it doesn't alter the starting state of the model.