From my use case, the Gemini 2.5 is terrible. I have a complex Cython code in a single file (1500 lines) for a Sequence Labeling. Claude and o3 are very good in improving this code and following the commands. The Gemini always try to do unrelated changes. For example, I asked, separately, for small changes such as remove this unused function, or cache the arrays indexes. Every time it completely refactored the code and was obsessed with removing the gil. The output code is always broken, because removing the gil is not easy.
How are you asking Gemini 2.5 to change existing code? With Claude 3.7, it's possible to use Claude Code, which gets "extremely fast but untrustworthy intern"-level results. Do you have a prefered setup to use Gemini 2.5 in a similar agentic mode, perhaps using a tool like Cursor or aider?
For all LLMs, I´m using a simple prompt with the complete code in triple quotes and the command at the end, asking to output the complete code of changed functions. Then I use Winmerge to compare the changes and apply. I feel more confident doing this than using Cursor.