Attention works, yes. But it is not naturally plausible at all. We don't do quadratic comparisons across a whole book or need to see thousands of samples to understand.
Personally I think that in the future recursive architectures and test time training will have a better chance long term than current full attention.
Also, I think that OpenAI biggest contribution is demostrating that reasoning like behaviors can emerge from really good language modelling.