Then you would need to set up a server that would do all this and serve as a 'mirror' to your podcasts without the ads.
I also have a setup like this, I transcribe with Whisper and send it to OpenAI 4o-mini to detect ads then clip those segments with pydub, but my prompt must be lacking because the success rate on detecting ads is maybe 60%
I think it's better than 60%, but I should definitely set up some evals.
I split the text by sentence, but was considering having the LLM try and put into paragraph (that might conceptually chunk commercial sentences together), but what I've got has been good enough for me.
I wanted to switch to Flash 2.5, but it looks like they increased the price a lot.
I think I could do a fair bit of ad identification just with text heuristics: "This podcast is sponsored/supported by...", etc.