The OpenAI "Voice Mode" is closer, but when we can have near instantaneous and natural back and forth voice mode, that will be a big in terms of it feeling magical. Today, it is say something, awkwardly wait N seconds then listen to the reply and sometimes awkwardly interrupt it.
Even if the models were no smarter than they are today, if we could crack that "conversational" piece and performance piece, it would be a big difference in my opinion.