I'm working on a little website to summarize discussion trends across the podcast ecosystem. I wrote about an early prototype here[1] and also gave a presentation about it a few months ago[2] and now I'm working on an expanded "daily pulse" view across hundreds of episodes of top news podcasts from the last few days.
My secret agenda is to explore how the "information supply chain" can be tracked across the data-processing stack all the way from the original audio through transcription, the processing pipeline, and UI. I'm using language models for multi-stage summarization and want to be able to follow the provenance of summaries all the way back to the transcripts and original audio.
replies(2):