←back to thread

740 points georgemandis | 1 comments | | HN request time: 0.195s | source
Show context
georgemandis ◴[] No.44376990[source]
I was trying to summarize a 40-minute talk with OpenAI’s transcription API, but it was too long. So I sped it up with ffmpeg to fit within the 25-minute cap. It worked quite well (Up to 3x speeds) and was cheaper and faster, so I wrote about it.

Felt like a fun trick worth sharing. There’s a full script and cost breakdown.

replies(1): >>44378167 #
bravesoul2 ◴[] No.44378167[source]
You could have kept quiet and started a cheaper than openai transcription business :)
replies(4): >>44378890 #>>44379081 #>>44379840 #>>44380550 #
ilyakaminsky ◴[] No.44380550[source]
I've already done that [1]. A fraction of the price, 24-hour limit per file, and speedup tricks like the OP's are welcome. :)

[1] https://speechischeap.com

replies(2): >>44382158 #>>44393218 #
satvikpendem ◴[] No.44393218[source]
Can it do real-time transcription with diarization? I'm looking for that for a product feature I'm working on. Currently I've seen Speechmatics do this well, haven't heard of others.
replies(1): >>44501440 #
1. ilyakaminsky ◴[] No.44501440[source]
Not yet. The gains in efficiency come from optimizing the speedup factor. Real-time audio cannot be processed any faster than 1× by definition.