On related note a very good open source TTS model was released 2 days back: https://github.com/SWivid/F5-TTS
Very good voice cloning capability. Runs under 10G vram nvidia gpu.
replies(1):
Very good voice cloning capability. Runs under 10G vram nvidia gpu.
However, it cannot be used for the same use case because it’s currently very slow, so real time usage is not yet possible with the current release code, in spite of the 0.15 RTF claimed in the paper.