Of course not. The visual part is window dressing on the argument. The real point is, before declaring AGI, I think the way we interact with these agents needs to be more like human to human interaction. Right now, agents generally accept a command, figure out which from a small number of MCPs that have been precoded for it to use, do that thing you wanted right or wrong, the end. If it does the right thing, huge confirmation bias that it's AGI. Maybe the MCP did most of the real work. If it doesn't, well, blame the prompt or maybe blame the MCPs are lacking good descriptions or something.
To get a solid read on AGI, we need to be grading them in comparison to a remote coworker. That they necessarily see a GUI is not required. But what is required is that they have access to all the things a human would, and don't require any special tools that limit their search space to a level below what a human coworker would have. If it's possible for a human coworker to do their whole job via console access, sure, that's fine too. I only say GUI because I think it'd actually be the easiest option, and fairly straightforward for these agents. Image processing is largely solved, whereas figuring out how to do everything your job requires via console is likely a mess.
And like I said, "using the computer", whether via GUI or screen reader or whatever else, isn't going to be the hard part. The hard part is, now that they have this very abstract capability and astronomically larger search space, it changes the way we interact with them. We send them email. We ping them on Slack. We don't build special baby mittens MCPs and such for them and they have to enter the human world and prove that they can handle it as a human would. Then I would say we're getting closer to AGI. But as long as we're building special tools and limiting their search space to that limited scope, to me it feels like we're still a long way off.