←back to thread

448 points lastdong | 4 comments | | HN request time: 0.529s | source
Show context
simiones ◴[] No.45114884[source]
I read the comments praising these voices as very life like, and went to the page primed to hear very convincing voices. That is not at all what I heard though.

The voices are decent, but the intonation is off on almost every phrase, and there is a very clear robotic-sounding modulation. It's generally very impressive compared to many text-to-speech solutions from a few years ago, but for today, I find it very uninspiring. The AI generated voice you hear all over YouTube shorts is at least as good as most of the samples on this page.

The only part that seemed impressive to me was the English + (Mandarin?) Chinese sample, that one seemed to switch very seamlessly between the two. But this may well be simply because (1) I'm not familiar with any Chinese language, so I couldn't really judge the pronunciation of that, and (2) the different character systems make it extremely clear that the model needs to switch between different languages. Peut-être que cela n'aurait pas été si simple if it had been switching between two languages using the same writing system - I'm particularly curious how it would have read "simple" in the phrase above (I think it should be read with the French pronunication, for example).

And, of course, the singing part is painfully bad, I am very curious why they even included it.

replies(11): >>45114973 #>>45115076 #>>45115109 #>>45115714 #>>45115907 #>>45116238 #>>45116262 #>>45116513 #>>45117982 #>>45119535 #>>45122185 #
1. mclau157 ◴[] No.45115907[source]
ElevenLabs has a much more convincing voice model
replies(3): >>45116308 #>>45116309 #>>45116659 #
2. DrBenCarson ◴[] No.45116308[source]
Open source?
3. sys32768 ◴[] No.45116309[source]
They also offer an AI Voice Changer that will take a recording and transform it into a different voice but retain the cadence and intonation.
4. watsonmusic ◴[] No.45116659[source]
it's not oss