←back to thread

706 points ortusdux | 1 comments | | HN request time: 0.267s | source
1. scoot ◴[] No.42138679[source]
I was a little surprise to read that they're using speech-to-text and text-to-speech rather than an end-to-end speech model. Won't that horrible latency? (I guess the old-person persona disguises it a little...)