←back to thread

669 points georgemandis | 8 comments | | HN request time: 1.083s | source | bottom
1. georgemandis ◴[] No.44376990[source]
I was trying to summarize a 40-minute talk with OpenAI’s transcription API, but it was too long. So I sped it up with ffmpeg to fit within the 25-minute cap. It worked quite well (Up to 3x speeds) and was cheaper and faster, so I wrote about it.

Felt like a fun trick worth sharing. There’s a full script and cost breakdown.

replies(1): >>44378167 #
2. bravesoul2 ◴[] No.44378167[source]
You could have kept quiet and started a cheaper than openai transcription business :)
replies(4): >>44378890 #>>44379081 #>>44379840 #>>44380550 #
3. behnamoh ◴[] No.44378890[source]
Sure, but now the world is a better place because he shared something useful!
4. 4b11b4 ◴[] No.44379081[source]
Pre-processing of the audio still a valid biz, multiple types of pre-processing might be valid
5. hn8726 ◴[] No.44379840[source]
Or openai will do it themselves for transcription tasks
6. ilyakaminsky ◴[] No.44380550[source]
I've already done that [1]. A fraction of the price, 24-hour limit per file, and speedup tricks like the OP's are welcome. :)

[1] https://speechischeap.com

replies(1): >>44382158 #
7. bravesoul2 ◴[] No.44382158{3}[source]
Nice. Don't expect you to spill the beans but is it doing OK (some customers?)

Just wondering if I cam build a retirement out of APIs :)

replies(1): >>44384932 #
8. ilyakaminsky ◴[] No.44384932{4}[source]
It's sustainable, but not enough to retire on at this point.

> Just wondering if I cam build a retirement out of APIs :)

I think it's possible, but you need to find a way to add value beyond the commodity itself (e.g., audio classification and speaker diarization in my case).