←back to thread

448 points lastdong | 1 comments | | HN request time: 0s | source
Show context
simiones ◴[] No.45114884[source]
I read the comments praising these voices as very life like, and went to the page primed to hear very convincing voices. That is not at all what I heard though.

The voices are decent, but the intonation is off on almost every phrase, and there is a very clear robotic-sounding modulation. It's generally very impressive compared to many text-to-speech solutions from a few years ago, but for today, I find it very uninspiring. The AI generated voice you hear all over YouTube shorts is at least as good as most of the samples on this page.

The only part that seemed impressive to me was the English + (Mandarin?) Chinese sample, that one seemed to switch very seamlessly between the two. But this may well be simply because (1) I'm not familiar with any Chinese language, so I couldn't really judge the pronunciation of that, and (2) the different character systems make it extremely clear that the model needs to switch between different languages. Peut-être que cela n'aurait pas été si simple if it had been switching between two languages using the same writing system - I'm particularly curious how it would have read "simple" in the phrase above (I think it should be read with the French pronunication, for example).

And, of course, the singing part is painfully bad, I am very curious why they even included it.

replies(11): >>45114973 #>>45115076 #>>45115109 #>>45115714 #>>45115907 #>>45116238 #>>45116262 #>>45116513 #>>45117982 #>>45119535 #>>45122185 #
IshKebab ◴[] No.45115109[source]
I agree. For some reason the female voices are waaay more convincing than the male ones too, which sound barely better than speech synthesis from a decade ago.
replies(1): >>45116546 #
selkin ◴[] No.45116546[source]
Results correlate to investment, and there’s more in synthesizing female coded voices. As for the why female coded voices gets more investments, we all know, only difference is in attitude towards that (the correct answer, of course, is “it sucks”)
replies(1): >>45117249 #
recursive ◴[] No.45117249[source]
We all know? Female voices have better intelligibility? That's my guess anyway.
replies(2): >>45117432 #>>45117761 #
kadoban ◴[] No.45117761[source]
There's a lot of money and effort spent in satisfying the sexual desires of (predominantly straight) men. There's not typically quite as much interest in doing the same for women.

For example I've been looking at models and loras for generating images, and the boards are _full_ of ones that will generate women well or in some particular style. Quite often at least a couple of the preview images for each are hidden behind a button because they contain nudity. Clearly the intent is that they are at least able to generate porn containing women. There's a small handful that are focused on men and they're very aware of it, they all have notes lampshading how oddball they are to even exist.

I would expect that this is not as pronounced an effect in the world generating speech, but it must still exist.

replies(1): >>45119407 #
lacy_tinpot ◴[] No.45119407[source]
I think this is a very lazy kind of cultural analysis. The reason female voices are being chosen over male ones is a little more multifaceted than just SEX. Heterosexual women also tend to prefer female voices over male ones.

Female voices are often rated as being clearer, easier to understand, "warmer", etc.

Why this is the case is still an open question, but it's definitely more complex than just SEX.

replies(2): >>45120715 #>>45122634 #
selkin ◴[] No.45120715[source]
That you consider it sex (rather than gender), is exactly why there’s a preference for female coded voices. Consider where we do hear male recorded voices used as default.
replies(3): >>45120820 #>>45123747 #>>45124345 #
1. akimbostrawman ◴[] No.45124345[source]
How the hell would you determine someone's self assigned social gender based on there voice which is a result of there physical sex.