←back to thread

448 points lastdong | 1 comments | | HN request time: 0.467s | source
Show context
viggity ◴[] No.45114924[source]
I feel like this is a step in the right direction, but a lot of emotive text-to-speech models are only changing the duration and loudness of each word, the timing/pauses are better too.

I would love to have a model that can make sense of things like stressing particular syllables or phonemes to make a point.

replies(1): >>45116568 #
1. watsonmusic ◴[] No.45116568[source]
this model is superb