(microsoft.github.io)

448 points lastdong | 1 comments | 03 Sep 25 10:44 UTC | HN request time: 0.198s | source

1. cush ◴[03 Sep 25 16:21 UTC] No.45117583[source]▶

To me this is like early generative AI art, where the images came out very "smooth" and visually buttery, but instead there's no timbre to the voices. Intonation issues aside, these models could use a touch of vocal fry and some body to be more believable

↑

VibeVoice: A Frontier Open-Source Text-to-Speech Model