(play.ht)

258 points amrrs | 1 comments | 14 Oct 24 19:16 UTC | HN request time: 0.214s | source

Show context

mlboss ◴[14 Oct 24 23:34 UTC] No.41843344[source]▶

>>41840872 (OP) #

On related note a very good open source TTS model was released 2 days back: https://github.com/SWivid/F5-TTS

Very good voice cloning capability. Runs under 10G vram nvidia gpu.

replies(1): >>41843634 #

stavros ◴[15 Oct 24 00:15 UTC] No.41843634[source]▶

>>41843344 #

Thanks! Would "under 10G" also include 8 GB, by any chance? Although I do die inside a little every time I see "install Torch for your CUDA version", because I never managed to get that working in Linux.

replies(3): >>41843916 #>>41844115 #>>41845127 #

1. lelag ◴[15 Oct 24 05:07 UTC] No.41845127[source]▶

>>41843634 #

It actually uses less than 3 GB of VRAM. One issue is that the research code is actually loading multiple models instead of one, which is why it was initially reported you need 8 GB if VRAM.

However, it cannot be used for the same use case because it’s currently very slow, so real time usage is not yet possible with the current release code, in spite of the 0.15 RTF claimed in the paper.

↑

Play 3.0 mini – A lightweight, reliable, cost-efficient Multilingual TTS model