It sounds like some dynamic gain is happening every time he starts talking, and then after ~1 second it gets better. I don't think it's a "missing hardware" issue, just turning down the gain would probably be enough, or tuning the software dynamics if he's using that. Could also be that the podcaster tried doing some normalization across the entire podcast while mastering and fucked it up that way.