YouTube's new anti-adblock measures

(iter.ca)

Show context

ranger_danger ◴[20 Jun 25 17:57 UTC] No.44330199[source]▶

I'm surprised they don't just inject the ads directly into the video stream, I think that would solve their issue overnight (not that I want any ads personally). You could also rate-limit it to the playback speed to prevent pre-downloading the stream easily. But now that everything uses HLS/DASH, it's easy to inject different content right in the middle of the stream without re-encoding anything.

replies(15): >>44330295 #>>44330306 #>>44330327 #>>44330366 #>>44332987 #>>44333096 #>>44333102 #>>44333133 #>>44333320 #>>44333605 #>>44333700 #>>44333858 #>>44334367 #>>44335037 #>>44335453 #

noman-land ◴[20 Jun 25 23:32 UTC] No.44333096[source]▶

>>44330199 #

There exists crowdsourced adblocking based on timestamps (SponsorBlock, Tubular). Soon we will have realtime on-device content-aware AI adblocking. They will ever win.

replies(2): >>44333119 #>>44333839 #

thomassmith65 ◴[20 Jun 25 23:35 UTC] No.44333119[source]▶

>>44333096 #

Once we get content-aware AI adblocking, every video and podcast will turn into a product placement.

replies(3): >>44333293 #>>44333362 #>>44333371 #

xnx ◴[21 Jun 25 00:15 UTC] No.44333362[source]▶

>>44333119 #

I use content aware ad blocking to remove inserted and native ads from podcasts. The next level adblocking will be rewriting content that is overly commercial.

replies(3): >>44333369 #>>44333416 #>>44333638 #

1. noahjk ◴[21 Jun 25 00:26 UTC] No.44333416[source]▶

>>44333362 #

Any info on how you do that?

replies(1): >>44334156 #

2. coppsilgold ◴[21 Jun 25 03:04 UTC] No.44334156[source]▶

>>44333416 (TP) #

I imagine you can do it by AI-transcribing the podcast while preserving timestamp metadata for each symbol. Use LLM to identify undesirable segments (ask it to output json or something) and then cut them out from the audio with ffmpeg.

Then you would need to set up a server that would do all this and serve as a 'mirror' to your podcasts without the ads.

replies(2): >>44334188 #>>44334221 #

3. noahjk ◴[21 Jun 25 03:12 UTC] No.44334188[source]▶

>>44334156 #

I actually found a project which does almost exactly what you've described:

https://github.com/jdrbc/podly_pure_podcasts

4. xnx ◴[21 Jun 25 03:21 UTC] No.44334221[source]▶

>>44334156 #

You almost exactly described my process: podcast-dl > whisper > Gemini > ffmpeg > ftp > cheap web host

replies(2): >>44334244 #>>44338226 #

5. thomassmith65 ◴[21 Jun 25 03:29 UTC] No.44334244{3}[source]▶

>>44334221 #

If you've gone through that much effort, you might as well turn it into a subscription service. It would be resource intensive, but some people would gladly pay through their nose to rid their podcasts of ads.

replies(1): >>44338314 #

6. walthamstow ◴[21 Jun 25 15:19 UTC] No.44338226{3}[source]▶

>>44334221 #

What's your prompt for Gemini like, does it include examples of ads? Assume you're using Flash for cost?

I also have a setup like this, I transcribe with Whisper and send it to OpenAI 4o-mini to detect ads then clip those segments with pydub, but my prompt must be lacking because the success rate on detecting ads is maybe 60%

replies(1): >>44339606 #

7. xnx ◴[21 Jun 25 15:30 UTC] No.44338314{4}[source]▶

>>44334244 #

I'd definitely like to make it easier to use and spread it more widely, but I can't directly distribute the edited (copyrighted) podcast files. Might share transcript markers of the text right before and after ad segments, which is like a slightly more complicated version of what SponsorBlock does.

8. xnx ◴[21 Jun 25 18:22 UTC] No.44339606{4}[source]▶

>>44338226 #

My Gemini Flash 2.0 prompt: "Below is the transcript of a podcast preceded by a line number. Reply with the line numbers that are likely to be from advertisements, promotions, commercials, sponsorships, or ending credits."

I think it's better than 60%, but I should definitely set up some evals.

I split the text by sentence, but was considering having the LLM try and put into paragraph (that might conceptually chunk commercial sentences together), but what I've got has been good enough for me.

I wanted to switch to Flash 2.5, but it looks like they increased the price a lot.

I think I could do a fair bit of ad identification just with text heuristics: "This podcast is sponsored/supported by...", etc.

↑