Gemini 2.5 Pro vs. Claude 3.7 Sonnet: Coding Comparison

(composio.dev)

483 points mraniki | 1 comments | 31 Mar 25 12:09 UTC | HN request time: 0s | source

Show context

neal_ ◴[31 Mar 25 13:00 UTC] No.43534543[source]▶

I was using gemini 2.5 pro yesterday and it does seem decent. I still think claude 3.5 is better at following instruction then the new 3.7 model which just goes ham messing stuff up. Really disappointed by Cursor and the Claude CLI tool, for me they create more problems then fix. I cant figure out how to use them on any of my projects with out them ruining the project and creating terrible tech debt. I really like the way gemini shows how much context window is left, i think every company should have this. To be honest i think there has been no major improvement beyond the original models which gained popularity first. Its just marginal improvements 10% better or something, and the free models like deepseek are actually better imo then anything openai has. I dont think the market can withstand the valuations of the big ai companies. They have no advantage, there models suck worse then free open source ones, and they charge money??? Where is the benefit to there product?? People originally said the models are the moat and methods are top secret, but turns out its pretty easy to reproduce these models, and its the application layer built on top of the models that is much more specific and has the real moat. People said the models would engulf these applications built ontop and just integrate natively.

replies(4): >>43534760 #>>43534894 #>>43535115 #>>43536010 #

martin-t ◴[31 Mar 25 15:17 UTC] No.43536010[source]▶

>>43534543 #

Whenever I read about LLMs or try to use them, I feel like I am asleep in a dream where two contradicting things can be true at the same time.

On one hand, you have people claiming "AI" can now do SWE tasks which take humans 30 minutes or 2 hours and the time doubles every X months so by Y year, SW development will be completely automated.

On the other hand, you have people saying exactly what you are saying. Usually that LLMs have issues even with small tasks and that repeated/prolonged use generates tech debt even if they succeed on the small tasks.

These 2 views clearly can't both be true at the same time. My experience is the second category so I'd like to chalk up the first as marketing hype but it's confusing how many people who have seemingly nothing to gain from the hype contribute to it.

replies(4): >>43536241 #>>43536654 #>>43537271 #>>43537992 #

1. radicality ◴[31 Mar 25 16:11 UTC] No.43536654[source]▶

>>43536010 #

At first thought you are gonna talk about how various LLMs will gaslight you, and say something is true, then only change their mind once you provide a counter example and when challenged with it, will respond “I obviously meant it’s mostly true, in that specific case it’s false”.

↑