←back to thread

671 points georgemandis | 1 comments | | HN request time: 0.231s | source
Show context
rob ◴[] No.44379019[source]
For anybody trying to do this in bulk, instead of using OpenAI's whisper via their API, you can also use Groq [0] which is much cheaper:

[0] https://groq.com/pricing/

Groq is ~$0.02/hr with distil-large-v3, or ~$0.04/hr with whisper-large-v3-turbo. I believe OpenAI comes out to like ~$0.36/hr.

We do this internally with our tool that automatically transcribes local government council meetings right when they get uploaded to YouTube. It uses Groq by default, but I also added support for Replicate and Deepgram as backups because sometimes Groq errors out.

replies(5): >>44379183 #>>44380152 #>>44380182 #>>44381963 #>>44384523 #
georgemandis ◴[] No.44379183[source]
Interesting! At $0.02 to $0.04 an hour I don't suspect you've been hunting for optimizations, but I wonder if this "speed up the audio" trick would save you even more.

> We do this internally with our tool that automatically transcribes local government council meetings right when they get uploaded to YouTube

Doesn't YouTube do this for you automatically these days within a day or so?

replies(3): >>44379336 #>>44380033 #>>44380071 #
1. ks2048 ◴[] No.44380071[source]
> Doesn't YouTube do this for you automatically these days within a day or so?

Last time I checked, I think the Google auto-captions were noticeably worse quality than whisper, but maybe that has changed.