> Just because we rely on vision to interface with computer software doesn't mean it's optimal for AI models
This is true but AGI means "Artificial General Intelligence". Perhaps it would be even more efficient with certain interfaces, but to be general it would have to at least work with the same ones as humans.
Here's some things that I think a true AGI would need to be able to do:
* Control a general purpose robot and use vision to do housework, gardening etc.
* Be able to drive a car - equivalent interfaces to humans might be service motor controlled inputs.
* Use standard computer inputs to do standard computer tasks
And this list could easily be extended.
If we have to be very specific in the choice of interfaces and tasks that we give it, it's not a general AI.
At the same time, we have to be careful at moving the goalposts too much. But current AI are limited to what can be returned in a small number of interfaces (prompt with text/image/video & return text/image/video data). This is amazing, they can sound very intelligent while doing so. But it's important not to lose sight of what they still can't do well which is basically everything else.
Outside of this area, when you do hear of an AI doing something well (self driving, for example) it's usually a separate specialized model rather than a contribution towards AGI.