I'm using a library, stable-ts, for a similar issue with short audio clips and it works well: https://github.com/jianfch/stable-ts/tree/main
Not sure how it will perform on something long like an audiobook.
Not sure how it will perform on something long like an audiobook.