Everything is still based on 4 4o still right? is a new model training just too expensive? They can consult deepseek team maybe for cost constrained new models.
I thought whenever the knowledge cutoff increased that meant they’d trained a new model, I guess that’s completely wrong?
They add new data to the existing base model via continuous pre-training. You save on pre-training, the next token prediction task, but still have to re-run mid and post training stages like context length extension, supervised fine tuning, reinforcement learning, safety alignment ...