(microsoft.github.io)

448 points lastdong | 1 comments | 03 Sep 25 10:44 UTC | HN request time: 0.209s | source

1. regularfry ◴[03 Sep 25 12:21 UTC] No.45114872[source]▶

Ok, this is nit-picking, but it's very obvious that the sample voices these were trained with were captured in different audio environments. There's noticeable reverb on the male voice that's not there on the other.

So that's a useful next step: for multi-voice TTS models, make them sound like they're in the same room.

↑

VibeVoice: A Frontier Open-Source Text-to-Speech Model