(microsoft.github.io)

448 points lastdong | 1 comments | 03 Sep 25 10:44 UTC | HN request time: 0.398s | source

Show context

amelius ◴[03 Sep 25 12:36 UTC] No.45114980[source]▶

I tried some TTS models a while ago, but I noticed that none of them allowed to put markup statements in the text. For example, it would be nice to do something like:

     Hey look! [enthusiastic] Should we tell the others? Maybe not ... [giggles]

etc.

In fact, I think this kind of thing is absolutely necessary if you want to use this to replace a voice actor.

replies(2): >>45115582 #>>45115795 #

1. data-ottawa ◴[03 Sep 25 13:50 UTC] No.45115795[source]▶

>>45114980 #

Eleven labs has some models with support for that.

https://elevenlabs.io/blog/v3-audiotags

↑

VibeVoice: A Frontier Open-Source Text-to-Speech Model