←back to thread

257 points amrrs | 1 comments | | HN request time: 0s | source
Show context
mlboss ◴[] No.41843344[source]
On related note a very good open source TTS model was released 2 days back: https://github.com/SWivid/F5-TTS

Very good voice cloning capability. Runs under 10G vram nvidia gpu.

replies(1): >>41843634 #
stavros ◴[] No.41843634[source]
Thanks! Would "under 10G" also include 8 GB, by any chance? Although I do die inside a little every time I see "install Torch for your CUDA version", because I never managed to get that working in Linux.
replies(3): >>41843916 #>>41844115 #>>41845127 #
1. lelag ◴[] No.41845127[source]
It actually uses less than 3 GB of VRAM. One issue is that the research code is actually loading multiple models instead of one, which is why it was initially reported you need 8 GB if VRAM.

However, it cannot be used for the same use case because it’s currently very slow, so real time usage is not yet possible with the current release code, in spite of the 0.15 RTF claimed in the paper.