Gemini also seems more likely to come up with 'advanced' ideas (for better or worse). I for example asked both for a fast C++ function to solve an on the surface fairly simple computational geometry problem. Claude solved it in a straight ahead and obvious way. Nothing obviously inefficient, will perform reasonably well for all inputs, but also left some performance on the table. I could also tell at a glance that it was almost certainly correct.
Gemini on the other hand did a bunch of (possibly) clever 'optimisations' and tricks, plus made extensive use of OpenMP. I know from experience that those optimisations will only be faster if the input has certain properties, but will be a massive overhead in other, quite common, cases.
With a bit more prompting and questions from my part I did manage to get both Gemini and Claude to converge on pretty much the same final answer.
The more interesting question is if feeding in carefully selected examples or documentation covering the new library versions helps them get it right. I find that to usually be the case.
For anything like this, I don’t understand trying to invoke AI. Just open the file and delete the lines yourself. What is AI going to do here for you?
It’s like you are relying 100% on AI when it’s a tool in your toolset.
I hear people commonly mention doing this but I can't imagine people are manually adding every page of the docs for libraries or frameworks they're using since unfortunately most are not in one single tidy page easy to copy paste.
The focus on benchmarks affords a tendency to generalize performance as if it's context and user independent.
Each model really is a different piece of software with different capabilities. Really fascinating to see how dramatically different people's assessments are