have few thoughts on this, esp after working in voice AI since couple of years
1. > "What’s the voice equivalent of a thumbs-up or a keyboard shortcut?" Current ASR systems are much narrow in terms of just capturing the transcript. there is no higher level of intelligence, even the best of GPT voice models fail at this. Humans are highly receptive of non-verbal cues. All the uhms, ahs, even the pauses we take is where the nuance lies.
2. the hardware for voice AI is still not consumer ready interacting with a voice AI is still doesn't feel private. i am only able to do a voice based interaction only when am in my car. sadly at other places it just feels a privacy breach as its acoustically public. have been thinking about a private microphones to enable more AI based conversations.