Interesting approach to transcript generation!
I'm implementing a similar workflow for VideoToBe.com
My Current Pipeline:
Media Extraction - yt-dlp for reliable video/audio downloads Local Transcription - OpenAI Whisper running on my own hardware (no API costs) Storage & UI - Transcripts stored in S3 with a custom web interface for viewing
Y Combinator playlist https://videotobe.com/play/playlist/ycombinator
and Andrej's talk is https://videotobe.com/play/youtube/LCEmiRjPEtQ
After reading your blog post, I will be testing effect on speeding audio for locally-hosted Whisper models. Running Whisper locally eliminates the ongoing cost concerns since my infrastructure is already a sunk cost. Speeding audio could be an interesting performance enhancement to explore!