(github.com)

652 points toebee | 1 comments | 21 Apr 25 17:07 UTC | HN request time: 0.212s | source

Show context

xhkkffbf ◴[21 Apr 25 21:36 UTC] No.43756773[source]▶

>>43754124 (OP) #

Are there different voices? Or only [s1] and [s2] in the examples?

replies(1): >>43758096 #

1. toebee ◴[22 Apr 25 00:52 UTC] No.43758096[source]▶

>>43756773 #

We just clarified in the README, sorry for the confusion ;(

Note that the model was not fine-tuned on a specific voice. Hence, you will get different voices every time you run the model. You can keep speaker consistency by either adding an audio prompt (a guide coming VERY soon - try it with the second example on Gradio or HF Space for now), or fixing the seed.

↑

Show HN: Dia, an open-weights TTS model for generating realistic dialogue