Most active commenters
  • rob(3)

←back to thread

678 points georgemandis | 13 comments | | HN request time: 0.772s | source | bottom
1. rob ◴[] No.44379019[source]
For anybody trying to do this in bulk, instead of using OpenAI's whisper via their API, you can also use Groq [0] which is much cheaper:

[0] https://groq.com/pricing/

Groq is ~$0.02/hr with distil-large-v3, or ~$0.04/hr with whisper-large-v3-turbo. I believe OpenAI comes out to like ~$0.36/hr.

We do this internally with our tool that automatically transcribes local government council meetings right when they get uploaded to YouTube. It uses Groq by default, but I also added support for Replicate and Deepgram as backups because sometimes Groq errors out.

replies(5): >>44379183 #>>44380152 #>>44380182 #>>44381963 #>>44384523 #
2. georgemandis ◴[] No.44379183[source]
Interesting! At $0.02 to $0.04 an hour I don't suspect you've been hunting for optimizations, but I wonder if this "speed up the audio" trick would save you even more.

> We do this internally with our tool that automatically transcribes local government council meetings right when they get uploaded to YouTube

Doesn't YouTube do this for you automatically these days within a day or so?

replies(3): >>44379336 #>>44380033 #>>44380071 #
3. rob ◴[] No.44379336[source]
> Doesn't YouTube do this for you automatically these days within a day or so?

Oh yeah, we do a check first and use youtube-transcript-api if there's an automatic one available:

https://github.com/jdepoix/youtube-transcript-api

The tool usually detects them within like ~5 mins of being uploaded though, so usually none are available yet. Then it'll send the summaries to our internal Slack channel for our editors, in case there's anything interesting to 'follow up on' from the meeting.

Probably would be a good idea to add a delay to it and wait for the automatic ones though :)

4. jerjerjer ◴[] No.44380033[source]
> I wonder if this "speed up the audio" trick would save you even more.

At this point you'll need to at least check how much running ffmpeg costs. Probably less than $0.01 per hour of audio (approximate savings) but still.

5. ks2048 ◴[] No.44380071[source]
> Doesn't YouTube do this for you automatically these days within a day or so?

Last time I checked, I think the Google auto-captions were noticeably worse quality than whisper, but maybe that has changed.

6. colechristensen ◴[] No.44380152[source]
If you have a recent macbook you can run the same whisper model locally for free. People are really sleeping on how cheap the compute you own hardware for already is.
replies(2): >>44380229 #>>44384418 #
7. pzo ◴[] No.44380182[source]
there is also cloudflare workers ai where you can have whisper-large-v3-turbo for around $0.03 per hour:

https://developers.cloudflare.com/workers-ai/models/whisper-...

8. rob ◴[] No.44380229[source]
I don't. I have a MacBook Pro from 2019 with an Intel chip and 16 GB of memory. Pretty sure when I tried the large whisper model it took like 30 minutes to an hour to do something that took hardly any time via Groq. It's been a while though so maybe my times are off.
replies(2): >>44380449 #>>44380467 #
9. colechristensen ◴[] No.44380449{3}[source]
Ah, no, Apple silicon Mac required with a decent amount of memory. But this kind of machine has been very common (a mid to high range recent macbook) at all of my employers for a long time.
10. fragmede ◴[] No.44380467{3}[source]
It's been roughly six years since that MacBook was top of the line, so your times are definitely off.
11. abidlabs ◴[] No.44381963[source]
You could use Hugging Face's Inference API (which supports all of these API providers) directly making it easier to switch between them, e.g. look at the panel on the right on: https://huggingface.co/openai/whisper-large-v3
12. likium ◴[] No.44384418[source]
What tool do you use?
13. BrunoJo ◴[] No.44384523[source]
Let me know if you are interested in a more reliable transcription API. I'm building Lemonfox.ai and we've optimized our transcription API to be highly available and very fast for large files. Happy to give you a discount (email: bruno at lemonfox.ai)