In my experience, o4-mini-high is good enough, even just through the chat interface
Cursor et al can be more comfy because they have access to the files directly. But when working on a sufficiently large/old/complex code base, the main limitation is the human in the loop and managing context, so things end up evening out. Not only that, but a lot of times it’s just easier/better to manually feed things to ChatGPT/Claude - that way you get to more carefully curate and understand the tasks and the changes
I still haven’t seen any convincing real life scenario with larger in-production code bases in which agents are able to autonomously write most of the code
If anyone has a video/demo, would love to see it