(composio.dev)

483 points mraniki | 1 comments | 31 Mar 25 12:09 UTC | HN request time: 0.219s | source

Show context

MrScruff ◴[31 Mar 25 13:34 UTC] No.43534880[source]▶

The evidence given really doesn't justify the conclusion. Maybe it suggests 2.5 Pro might be better if you're asking it to build Javascript apps from scratch, but that hardly equates to "It's better at coding". Feels like a lot of LLM articles follow this pattern, someone running their own toy benchmarks and confidently extrapolating broad conclusions from a handful of data points. The SWE-Bench result carries a bit more weight but even that should be taken with a pinch of salt.

replies(2): >>43535050 #>>43535652 #

1. namaria ◴[31 Mar 25 14:45 UTC] No.43535652[source]▶

>>43534880 #

There are three things this hype cycle excels at. Getting money from investors for foundational model creators and startup.ai; spinning lay offs as a good sign for big corps; and trying to look like clever tech blogger for people looking for clout online.

↑

Gemini 2.5 Pro vs. Claude 3.7 Sonnet: Coding Comparison