Okay, ChatGPT is only text-to-text, but Google & Co are adding more modalities now, including images, audio and robotics. I think one missing step is to fuse training and inference regime into one, just as in animals. That probably requires something else than the usual transformer-based token predictors.
It's not clear it is one. Sleep is training (replay from hippocampus). Wake is inference.
I'm not sure how anyone could be this naive. Mammal brains don't have this train mode inference mode. They are both running at all times. If what you said was true, if I taught you something today you wouldn't be able to perform that action till tomorrow. Hell, schools would be an insane concept if this were true. Try to think a bit more before confidently stating an answer.