I've been using MacWhisper for this, with a huge variety of transcription options and things like speaker detection. It works great for all the 1 hour and shorter videos I've fed it, but does this have more to offer?
I haven't tried a 4+ hour video with MacWhisper but I presume that would work the same.
replies(1):