←back to thread

666 points georgemandis | 1 comments | | HN request time: 1.206s | source
1. meerab ◴[] No.44383178[source]
Interesting approach to transcript generation!

I'm implementing a similar workflow for VideoToBe.com

My Current Pipeline:

Media Extraction - yt-dlp for reliable video/audio downloads Local Transcription - OpenAI Whisper running on my own hardware (no API costs) Storage & UI - Transcripts stored in S3 with a custom web interface for viewing

Y Combinator playlist https://videotobe.com/play/playlist/ycombinator

and Andrej's talk is https://videotobe.com/play/youtube/LCEmiRjPEtQ

After reading your blog post, I will be testing effect on speeding audio for locally-hosted Whisper models. Running Whisper locally eliminates the ongoing cost concerns since my infrastructure is already a sunk cost. Speeding audio could be an interesting performance enhancement to explore!