(microsoft.github.io)

1. egorfine ◴[03 Sep 25 11:47 UTC] No.45114627[source]▶

[deleted - I'm an idiot]

replies(1): >>45114669 #

2. x187463 ◴[03 Sep 25 11:52 UTC] No.45114669[source]▶

Whisper is speech-to-text. VibeVoice is text-to-speech.

3. mpeg ◴[03 Sep 25 11:59 UTC] No.45114712[source]▶

There is a text-to-speech version of whisper, but IMHO the quality is much worse than the demos of this model.

replies(1): >>45114765 #

4. x187463 ◴[03 Sep 25 12:07 UTC] No.45114765{3}[source]▶

Are you referring to this?

Or is there some OpenAI official Whisper TTS?

replies(1): >>45114838 #

5. mpeg ◴[03 Sep 25 12:18 UTC] No.45114838{4}[source]▶

Yep, nothing official that I know, but that one is fairly popular so maybe they were referring to it (although AFAIK it's not frontier?)

6. egorfine ◴[03 Sep 25 12:19 UTC] No.45114850[source]▶

I stand corrected

VibeVoice: A Frontier Open-Source Text-to-Speech Model