←back to thread

652 points toebee | 1 comments | | HN request time: 0.209s | source
Show context
Havoc ◴[] No.43756741[source]
Sounds really good & human! Got a fair bit of unexpected artifacts though. e.g. 3 seconds hissing noise before dialogue. And music in background when I added (happy) in an attempt to control tone. Also don't understand how to control the S1 and S2 speakers...is it just random based on temp?

> TODO Docker support

Got this adapted pretty easily. Just latest nvidia cuda container, throw python and modules on it and change server to serve on 0.0.0.0. Does mean it pulls the model every time on startup though which isn't ideal

replies(3): >>43756851 #>>43757435 #>>43757925 #
1. dragonwriter ◴[] No.43757925[source]
> Also don't understand how to control the S1 and S2 speakers...

Do a clip with the speakers you want as the audio prompt, add the text of that clip (with speaker tags) of the clip at the beginning of your text prompt, and it clones the voices from your audio prompt for the output.