←back to thread

48 points LorenDB | 1 comments | | HN request time: 0s | source
Show context
ipsum2 ◴[] No.44380330[source]
I've been using Nvidia's parakeet model, it's been better than Whisper v3 large and smaller. Only supports English.

https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2

replies(3): >>44380380 #>>44381533 #>>44384824 #
nico ◴[] No.44380380[source]
Does it need a newer GPU? Or can it run on just CPU?

Would it run on a raspberry pi?

replies(4): >>44380459 #>>44380499 #>>44380936 #>>44382292 #
lupusreal ◴[] No.44382292[source]
Best CPU TTS that can run on something like a raspberry pi is Piper. It can do real time synthesis on a raspberry pi and on a real computer it runs several times faster with negligible performance cost. I use it for 'reading' ebooks when my eyes get tired. The quality is roughly on par with where Mac OS's TTS was ~10 years ago (the last time I used it.) You can tell it's TTS, but it's good enough that you can become accustomed to it fairly easily.

https://github.com/rhasspy/piper

replies(2): >>44382342 #>>44382651 #
dv35z ◴[] No.44382651[source]
What voices do you recommend? The ones I had checked out (about a year ago) - the voices were mostly european-sounding, and flat, and not so natural-sounding. Is Piper the best open-source text-to-speech engine out there?
replies(1): >>44382884 #
1. haiku2077 ◴[] No.44382884{3}[source]
You can also try Kokoro and Sherpa.

If this is for personal use the best local TTS is to grab a Mac, set the system voice to one of the current Siri voice models, and then use the 'say' command in the terminal. Yes, really. The nonbinary voice #5 in particular does really well at technical terminology.