←back to thread

448 points lastdong | 2 comments | | HN request time: 0s | source
1. viggity ◴[] No.45114924[source]
I feel like this is a step in the right direction, but a lot of emotive text-to-speech models are only changing the duration and loudness of each word, the timing/pauses are better too.

I would love to have a model that can make sense of things like stressing particular syllables or phonemes to make a point.

replies(1): >>45116568 #
2. watsonmusic ◴[] No.45116568[source]
this model is superb