←back to thread

718 points ortusdux | 1 comments | | HN request time: 0.258s | source
1. scoot ◴[] No.42138679[source]
I was a little surprise to read that they're using speech-to-text and text-to-speech rather than an end-to-end speech model. Won't that horrible latency? (I guess the old-person persona disguises it a little...)