←back to thread

314 points pretext | 1 comments | | HN request time: 0.215s | source
Show context
binsquare ◴[] No.46220168[source]
Does anyone else find that there's hard to pin down reason of life-lessness in the speech of these voice models?

Especially in the fruit pricing portion of the video for this model. Sounds completely normal but I can immediately tell it is ai. Maybe it's intonation or the overly stable rate of speech?

replies(5): >>46220275 #>>46220301 #>>46220340 #>>46220359 #>>46222792 #
1. sosodev ◴[] No.46220340[source]
I think it's because they've crammed vision, audio, multiple voices, prosody control, multiple languages, etc into just 30 billion parameters.

I think ChatGPT has the most lifelike speech with their voice models. They seem to have invested heavily in that area while other labs focused elsewhere.