←back to thread

671 points georgemandis | 1 comments | | HN request time: 0s | source
Show context
fallinditch ◴[] No.44378501[source]
When extracting transcripts from YouTube videos, can anyone give advice on the best (cost effective, quick, accurate) way to do this?

I'm confused because I read in various places that the YouTube API doesn't provide access to transcripts ... so how do all these YouTube transcript extractor services do it?

I want to build my own YouTube summarizer app. Any advice and info on this topic greatly appreciated!

replies(3): >>44378546 #>>44379137 #>>44381640 #
banana_giraffe ◴[] No.44381640[source]
You can use yt-dlp to get the transcripts. For instance, to grab just the transcript of a video:

    ./yt-dlp --skip-download --write-sub --write-auto-sub --sub-lang en --sub-format json3 <youtube video URL>
You can also feed the same command a playlist or channel URL and it'll run through and grab all the transcripts for each video in the playlist or channel.
replies(1): >>44382282 #
fallinditch ◴[] No.44382282[source]
That's cool, thanks for the info. But do you also have to use a rotating proxy to prevent YouTube from blocking your IP address?
replies(1): >>44382408 #
1. banana_giraffe ◴[] No.44382408[source]
Last time I ran this at scale was a couple of months ago, so my information is no doubt out of date, but in my experience, YouTube seems less concerned about this than they are when you're grabbing lots of videos.

But that was a few months ago, so for all I know they've tightened down more hatches since then.