(microsoft.github.io)

448 points lastdong | 3 comments | 03 Sep 25 10:44 UTC | HN request time: 0.001s | source

Show context

egorfine ◴[03 Sep 25 11:47 UTC] No.45114627[source]▶

[deleted - I'm an idiot]

replies(1): >>45114669 #

x187463 ◴[03 Sep 25 11:52 UTC] No.45114669[source]▶

Whisper is speech-to-text. VibeVoice is text-to-speech.

1. mpeg ◴[03 Sep 25 11:59 UTC] No.45114712[source]▶

There is a text-to-speech version of whisper, but IMHO the quality is much worse than the demos of this model.

replies(1): >>45114765 #

2. x187463 ◴[03 Sep 25 12:07 UTC] No.45114765[source]▶

Are you referring to this?

Or is there some OpenAI official Whisper TTS?

replies(1): >>45114838 #

3. mpeg ◴[03 Sep 25 12:18 UTC] No.45114838[source]▶

Yep, nothing official that I know, but that one is fairly popular so maybe they were referring to it (although AFAIK it's not frontier?)

VibeVoice: A Frontier Open-Source Text-to-Speech Model