The claim that you would have to decode all previous frames in the entire video is... completely baffling to see coming from the dev. He's arguing a stupid technicality that a video might not have keyframes. That's not a reason to omit the feature entirely.
Going backwards might be hard because if you structured your code in certain ways you may not be able to go backwards efficiently. You can "seek", but how far back to you want to seek? A second? Two seconds? current-X frames?
key frames may be in a standard cadence, but they may not be. So again, how far back do you seek to go back one frame? And keyframes may be abstracted away from the player itself, since really, the codecs are the ones that deal with that stuff. For example, I believe mjpeg doesn't do frame differencing (I'm probably wrong about that).
The ideal implementation would save the last X frames then re-render once you go back like X/2 frames. But again, it depends.
Efficiency is not as important as having the feature at all. "Go back 5 seconds and then run forward to the right frame" is a sufficient algorithm, as long as it can track and combine multiple presses of the previous-frame key. Improvements can come later. Maybe buffering, maybe tracking keyframes, maybe other things. But this is a big case of letting the perfect be the enemy of the good.
If it fails to find a keyframe, that sucks, but 99% of the time it'll work.