Play 3.0 mini – A lightweight, reliable, cost-efficient Multilingual TTS model

1. mlboss ◴[14 Oct 24 23:34 UTC] No.41843344[source]▶

On related note a very good open source TTS model was released 2 days back: https://github.com/SWivid/F5-TTS

Very good voice cloning capability. Runs under 10G vram nvidia gpu.

2. stavros ◴[15 Oct 24 00:15 UTC] No.41843634[source]▶

Thanks! Would "under 10G" also include 8 GB, by any chance? Although I do die inside a little every time I see "install Torch for your CUDA version", because I never managed to get that working in Linux.

replies(3): >>41843916 #>>41844115 #>>41845127 #

3. mlboss ◴[15 Oct 24 01:07 UTC] No.41843916[source]▶

>>41843634 #

I bought a 10 Tb drive just for these kind of experiments

4. linotype ◴[15 Oct 24 01:44 UTC] No.41844115[source]▶

>>41843634 #

Try out PopOS. They make it really easy. Though it’s named Tensorman it helps with Torch as well.

https://support.system76.com/articles/tensorman/

replies(1): >>41844746 #

5. stavros ◴[15 Oct 24 03:47 UTC] No.41844746{3}[source]▶

>>41844115 #

Thanks, but I don't think I'm going to reinstall my entire OS to run these. I'll see if I can get Docker working, it's been more reliable with CUDA for me.

replies(1): >>41845003 #

6. __MatrixMan__ ◴[15 Oct 24 04:40 UTC] No.41845003{4}[source]▶

>>41844746 #

I haven't tried it, but I notice that it's also in nixpkgs: https://search.nixos.org/packages?channel=24.05&show=tensorm... That might be a less invasive way to use it, though you'd still have to install nix.

replies(1): >>41846680 #

7. lelag ◴[15 Oct 24 05:07 UTC] No.41845127[source]▶

>>41843634 #

It actually uses less than 3 GB of VRAM. One issue is that the research code is actually loading multiple models instead of one, which is why it was initially reported you need 8 GB if VRAM.

However, it cannot be used for the same use case because it’s currently very slow, so real time usage is not yet possible with the current release code, in spite of the 0.15 RTF claimed in the paper.

8. stavros ◴[15 Oct 24 09:26 UTC] No.41846680{5}[source]▶

>>41845003 #

That's easier, thank you!