←back to thread

671 points georgemandis | 1 comments | | HN request time: 0.202s | source
Show context
rob ◴[] No.44379019[source]
For anybody trying to do this in bulk, instead of using OpenAI's whisper via their API, you can also use Groq [0] which is much cheaper:

[0] https://groq.com/pricing/

Groq is ~$0.02/hr with distil-large-v3, or ~$0.04/hr with whisper-large-v3-turbo. I believe OpenAI comes out to like ~$0.36/hr.

We do this internally with our tool that automatically transcribes local government council meetings right when they get uploaded to YouTube. It uses Groq by default, but I also added support for Replicate and Deepgram as backups because sometimes Groq errors out.

replies(5): >>44379183 #>>44380152 #>>44380182 #>>44381963 #>>44384523 #
georgemandis ◴[] No.44379183[source]
Interesting! At $0.02 to $0.04 an hour I don't suspect you've been hunting for optimizations, but I wonder if this "speed up the audio" trick would save you even more.

> We do this internally with our tool that automatically transcribes local government council meetings right when they get uploaded to YouTube

Doesn't YouTube do this for you automatically these days within a day or so?

replies(3): >>44379336 #>>44380033 #>>44380071 #
1. jerjerjer ◴[] No.44380033[source]
> I wonder if this "speed up the audio" trick would save you even more.

At this point you'll need to at least check how much running ffmpeg costs. Probably less than $0.01 per hour of audio (approximate savings) but still.