The rate of progress on multimodal agents is impressive. OpenVLA was released in June 2024 and was state of the art at that time... 8 months later, on tasks like "Pick Place Hotdog Sausage" the success rate is passing from 2/10 to 6/10
replies(1):
The really fast multi-arm versions can be hypnotic to watch. You can see an example at 1:00 in this video: https://youtu.be/aPTd8XDZOEk
The limitation of industrial pick & place robots is that they're configured for a single task, and reconfiguring them for a new product is notoriously expensive.
Magma's "pick & place" demo is much slower and shakier than a specialized industrial robot. But Magma can apparently be adapted to a new task by providing plain English instructions.