DeepSpeech Is Discontinued (2020)

1. ipsum2 ◴[25 Jun 25 18:19 UTC] No.44380330[source]▶

>>44379688 (OP) #

I've been using Nvidia's parakeet model, it's been better than Whisper v3 large and smaller. Only supports English.

https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2

replies(3): >>44380380 #>>44381533 #>>44384824 #

2. nico ◴[25 Jun 25 18:23 UTC] No.44380380[source]▶

>>44380330 (TP) #

Does it need a newer GPU? Or can it run on just CPU?

Would it run on a raspberry pi?

replies(4): >>44380459 #>>44380499 #>>44380936 #>>44382292 #

3. GaggiX ◴[25 Jun 25 18:31 UTC] No.44380459[source]▶

>>44380380 #

Look up for faster whisper or distilled whisper models, smaller models run quite nicely but perform poorly outside of English, if you are interested in a different language it's better to finetune it (HuggingFace has a huge amount of finetuned Whisper models).

4. ◴[25 Jun 25 18:35 UTC] No.44380499[source]▶

>>44380380 #

5. ipsum2 ◴[25 Jun 25 19:18 UTC] No.44380936[source]▶

>>44380380 #

If you want real-time, it requires a GPU, but can be underpowered. CPU is a little slower but works fine.

6. 100xlong ◴[25 Jun 25 20:29 UTC] No.44381533[source]▶

>>44380330 (TP) #

are there any linux/mac apps that allow people to use parakeet for daily dictation like SuperWhisper?

replies(1): >>44382712 #

7. lupusreal ◴[25 Jun 25 22:16 UTC] No.44382292[source]▶

>>44380380 #

Best CPU TTS that can run on something like a raspberry pi is Piper. It can do real time synthesis on a raspberry pi and on a real computer it runs several times faster with negligible performance cost. I use it for 'reading' ebooks when my eyes get tired. The quality is roughly on par with where Mac OS's TTS was ~10 years ago (the last time I used it.) You can tell it's TTS, but it's good enough that you can become accustomed to it fairly easily.

https://github.com/rhasspy/piper

replies(2): >>44382342 #>>44382651 #

8. GaggiX ◴[25 Jun 25 22:26 UTC] No.44382342{3}[source]▶

>>44382292 #

They are talking about STT, not TTS, but as a TTS piper is very good and works nicely on a raspberry pi, I agree.

9. dv35z ◴[25 Jun 25 23:09 UTC] No.44382651{3}[source]▶

>>44382292 #

What voices do you recommend? The ones I had checked out (about a year ago) - the voices were mostly european-sounding, and flat, and not so natural-sounding. Is Piper the best open-source text-to-speech engine out there?

replies(1): >>44382884 #

10. ipsum2 ◴[25 Jun 25 23:20 UTC] No.44382712[source]▶

>>44381533 #

Sort of, check out https://github.com/senstella/parakeet-mlx.

11. haiku2077 ◴[25 Jun 25 23:50 UTC] No.44382884{4}[source]▶

>>44382651 #

You can also try Kokoro and Sherpa.

If this is for personal use the best local TTS is to grab a Mac, set the system voice to one of the current Siri voice models, and then use the 'say' command in the terminal. Yes, really. The nonbinary voice #5 in particular does really well at technical terminology.

12. PeterStuer ◴[26 Jun 25 06:47 UTC] No.44384824[source]▶

>>44380330 (TP) #

In my side by side testing of Whisper and Parakeet in transcribing Euro-English meeting recordings, Whisper produced the better result, but Parakeet was faster.

I'm sticking with Whisper as it is fast enough for my use case.