Agreed, but I also think to be called AGI, they should be capable of working through human interfaces rather than needing to have special interfaces created for them to get around their lack of AGI.
The catch in this though isn't the ability to use these interfaces. I expect that will be easy. The hard part will be, once these interfaces are learned, the scope and search space of what they will be able to do is infinitely larger. And moreover our expectations will change in how we expect an AGI to handle itself when our way of working with it becomes more human.
Right now we're claiming nascent AGI, but really much of what we're asking these systems to do have been laid out for them. A limited set of protocols and interfaces, and a targeted set of tasks to which we normally apply these things. And moreover our expectations are as such. We don't converse with them as with a human. Their search space is much smaller. So while they appear AGI in specific tasks, I think it's because we're subconsciously grading them on a curve. The only way we have to interact with them prejudices us to have a very low bar.
That said, I agree that video feed and mouse is a terrible protocol for AI. But that said, I wouldn't be surprised if that's what we end up settling on. Long term, it's just going to be easier for these bots to learn and adapt to use human interfaces than for us to maintain two sets of interfaces for things, except for specific bot-to-bot cases. It's horribly inefficient, but in my experience efficiency never comes out ahead with each new generation of UIs.