What you can do with current-gen models, along with RAG, multi-agent & code interpreters, the wall is very much model latency, and not accuracy any more.
There are so many interactive experiences that could be made possible at this level of token throughput from 405B class models.
replies(2):