There are so many interactive experiences that could be made possible at this level of token throughput from 405B class models.
There are so many interactive experiences that could be made possible at this level of token throughput from 405B class models.
You can also get a lot fancier with tool-usage when you can start getting an LLM to use and reply to tools at a speed closer to the speed of a normal network service.
I've never timed it, but I'm guessing current LLMs don't handle "live video" type applications well. Imagine an LLM you could actually video chat with - it'd be useful for walking someone through a procedure, or advanced automation of GUI applications, etc.
AND the holy-grail of AI applications that would combine all of this - Robotics. Today, Cerebras chips are probably too power hungry for battery powered robotic assistants, but one could imagine a Star-Wars style robot assistant many years from now. You can have a robot that can navigate some space (home setting, or work setting) and it can see its environment and behavior, processing the video in real-time. Then, can reason about the world and its given task, by explicitly thinking through steps, and critically self-challenging the steps.