←back to thread

257 points amrrs | 8 comments | | HN request time: 0.224s | source | bottom
1. mlboss ◴[] No.41843344[source]
On related note a very good open source TTS model was released 2 days back: https://github.com/SWivid/F5-TTS

Very good voice cloning capability. Runs under 10G vram nvidia gpu.

replies(1): >>41843634 #
2. stavros ◴[] No.41843634[source]
Thanks! Would "under 10G" also include 8 GB, by any chance? Although I do die inside a little every time I see "install Torch for your CUDA version", because I never managed to get that working in Linux.
replies(3): >>41843916 #>>41844115 #>>41845127 #
3. mlboss ◴[] No.41843916[source]
I bought a 10 Tb drive just for these kind of experiments
4. linotype ◴[] No.41844115[source]
Try out PopOS. They make it really easy. Though it’s named Tensorman it helps with Torch as well.

https://support.system76.com/articles/tensorman/

replies(1): >>41844746 #
5. stavros ◴[] No.41844746{3}[source]
Thanks, but I don't think I'm going to reinstall my entire OS to run these. I'll see if I can get Docker working, it's been more reliable with CUDA for me.
replies(1): >>41845003 #
6. __MatrixMan__ ◴[] No.41845003{4}[source]
I haven't tried it, but I notice that it's also in nixpkgs: https://search.nixos.org/packages?channel=24.05&show=tensorm... That might be a less invasive way to use it, though you'd still have to install nix.
replies(1): >>41846680 #
7. lelag ◴[] No.41845127[source]
It actually uses less than 3 GB of VRAM. One issue is that the research code is actually loading multiple models instead of one, which is why it was initially reported you need 8 GB if VRAM.

However, it cannot be used for the same use case because it’s currently very slow, so real time usage is not yet possible with the current release code, in spite of the 0.15 RTF claimed in the paper.

8. stavros ◴[] No.41846680{5}[source]
That's easier, thank you!