←back to thread

555 points maheshrijal | 2 comments | | HN request time: 0.001s | source
Show context
jdross ◴[] No.43707849[source]
The pace of notable releases across the industry right now is unlike any time I remember since I started doing this in the early 2000's. And it feels like it's accelerating
replies(3): >>43707964 #>>43708571 #>>43712041 #
1. achierius ◴[] No.43712041[source]
How is this a notable release? It's strictly worse than Gemini 2.5 on coding &c, and only an iterative improvement over their own models. The only thing that struck me as particularly interesting was the native visual reasoning.
replies(1): >>43712778 #
2. og_kalu ◴[] No.43712778[source]
It's not worse on coding. SWE Bench, Aider, live bench coding all show noticeably better results.