OpenAI charges by the minute, so speed up your audio

1. rob ◴[25 Jun 25 16:20 UTC] No.44379019[source]▶

For anybody trying to do this in bulk, instead of using OpenAI's whisper via their API, you can also use Groq [0] which is much cheaper:

[0] https://groq.com/pricing/

Groq is ~$0.02/hr with distil-large-v3, or ~$0.04/hr with whisper-large-v3-turbo. I believe OpenAI comes out to like ~$0.36/hr.

We do this internally with our tool that automatically transcribes local government council meetings right when they get uploaded to YouTube. It uses Groq by default, but I also added support for Replicate and Deepgram as backups because sometimes Groq errors out.

replies(5): >>44379183 #>>44380152 #>>44380182 #>>44381963 #>>44384523 #

2. georgemandis ◴[25 Jun 25 16:35 UTC] No.44379183[source]▶

>>44379019 (TP) #

Interesting! At $0.02 to $0.04 an hour I don't suspect you've been hunting for optimizations, but I wonder if this "speed up the audio" trick would save you even more.

> We do this internally with our tool that automatically transcribes local government council meetings right when they get uploaded to YouTube

Doesn't YouTube do this for you automatically these days within a day or so?

replies(3): >>44379336 #>>44380033 #>>44380071 #

3. rob ◴[25 Jun 25 16:46 UTC] No.44379336[source]▶

>>44379183 #

> Doesn't YouTube do this for you automatically these days within a day or so?

Oh yeah, we do a check first and use youtube-transcript-api if there's an automatic one available:

https://github.com/jdepoix/youtube-transcript-api

The tool usually detects them within like ~5 mins of being uploaded though, so usually none are available yet. Then it'll send the summaries to our internal Slack channel for our editors, in case there's anything interesting to 'follow up on' from the meeting.

Probably would be a good idea to add a delay to it and wait for the automatic ones though :)

4. jerjerjer ◴[25 Jun 25 17:53 UTC] No.44380033[source]▶

>>44379183 #

> I wonder if this "speed up the audio" trick would save you even more.

At this point you'll need to at least check how much running ffmpeg costs. Probably less than $0.01 per hour of audio (approximate savings) but still.

5. ks2048 ◴[25 Jun 25 17:56 UTC] No.44380071[source]▶

>>44379183 #

> Doesn't YouTube do this for you automatically these days within a day or so?

Last time I checked, I think the Google auto-captions were noticeably worse quality than whisper, but maybe that has changed.

6. colechristensen ◴[25 Jun 25 18:03 UTC] No.44380152[source]▶

>>44379019 (TP) #

If you have a recent macbook you can run the same whisper model locally for free. People are really sleeping on how cheap the compute you own hardware for already is.

replies(2): >>44380229 #>>44384418 #

7. pzo ◴[25 Jun 25 18:06 UTC] No.44380182[source]▶

>>44379019 (TP) #

there is also cloudflare workers ai where you can have whisper-large-v3-turbo for around $0.03 per hour:

https://developers.cloudflare.com/workers-ai/models/whisper-...

8. rob ◴[25 Jun 25 18:10 UTC] No.44380229[source]▶

>>44380152 #

I don't. I have a MacBook Pro from 2019 with an Intel chip and 16 GB of memory. Pretty sure when I tried the large whisper model it took like 30 minutes to an hour to do something that took hardly any time via Groq. It's been a while though so maybe my times are off.

replies(2): >>44380449 #>>44380467 #

9. colechristensen ◴[25 Jun 25 18:30 UTC] No.44380449{3}[source]▶

>>44380229 #

Ah, no, Apple silicon Mac required with a decent amount of memory. But this kind of machine has been very common (a mid to high range recent macbook) at all of my employers for a long time.

10. fragmede ◴[25 Jun 25 18:32 UTC] No.44380467{3}[source]▶

>>44380229 #

It's been roughly six years since that MacBook was top of the line, so your times are definitely off.

11. abidlabs ◴[25 Jun 25 21:24 UTC] No.44381963[source]▶

>>44379019 (TP) #

You could use Hugging Face's Inference API (which supports all of these API providers) directly making it easier to switch between them, e.g. look at the panel on the right on: https://huggingface.co/openai/whisper-large-v3

12. likium ◴[26 Jun 25 05:26 UTC] No.44384418[source]▶

>>44380152 #

What tool do you use?

13. BrunoJo ◴[26 Jun 25 05:47 UTC] No.44384523[source]▶

>>44379019 (TP) #

Let me know if you are interested in a more reliable transcription API. I'm building Lemonfox.ai and we've optimized our transcription API to be highly available and very fast for large files. Happy to give you a discount (email: bruno at lemonfox.ai)