Coding with LLMs in the summer of 2025 – an update

1. Keyframe ◴[20 Jul 25 13:55 UTC] No.44625227[source]▶

Unlike OP, from my still limited but intense month or so diving into this topic so far, I had better luck with Gemini 2.5 PRO and Opus 4 on more abstract level like architecture etc. and then dealing input to Sonnet for coding. I found 2.5 PRO, and to a lesser degree Opus, were hit or miss; A lot of instances of them circling around the issue and correcting itself when coding (Gemini especially so), whereas Sonnet would cut to the chase, but needed explicit take on it to be efficient.

replies(3): >>44625543 #>>44626481 #>>44629413 #

2. khaledh ◴[20 Jul 25 14:28 UTC] No.44625543[source]▶

>>44625227 (TP) #

This is my experience too. I usually use Gemini 2.5 Pro through AI Studio for big design ideas that need to be validated and refined. Then take the refined requirements to Claude Code which does an excellent job most of the time in coding them properly. Recently I tried Gemini CLI, and it's not even close to Claude Code's sharp coding skills. It often makes syntax mistakes, and get stuck trying to get itself out of a rut; its output is so verbose (and fast) that it's hard to follow what it's trying to do. Claude Code has a much better debugging capability.

Another contender in the "big idea" reasoning camp: DeepSeek R1. It's much slower, but most of the time it can analyze problems and get to the correct solution in one shot.

3. antirez ◴[20 Jul 25 16:02 UTC] No.44626481[source]▶

>>44625227 (TP) #

Totally possible. In general I believe that while more powerful in their best outputs, Sonnet/Opus 4 are in other ways (alignment / consistency) a regression on Sonnet 3.5v2 (often called Sonnet 3.6), as Sonnet 3.7 was. Also models are complex objects, and sometimes in a given domain a given model that on paper is weaker will work better. And, on top of that: interactive use vs agent requires different reinforcement learning training that sometimes may not be towards an aligned target... So also using the model in one way or the other may change how good it is.

4. jpdus ◴[20 Jul 25 21:20 UTC] No.44629413[source]▶

>>44625227 (TP) #

This is also confirmed by internal cline statistics where Opus and Gemini 2.5 pro both perform worse than Sonnet 4 in real-world scenarios

https://x.com/pashmerepat/status/1946392456456732758/photo/1