(runnercode.com)

196 points zmccormick7 | 1 comments | 26 Sep 25 15:06 UTC | HN request time: 0.248s | source

Show context

asdev ◴[26 Sep 25 15:41 UTC] No.45387765[source]▶

I don't think intelligence is increasing. Arbitrary benchmarks don't reflect real world usage. Even with all the context it could possibly have, these models still miss/hallucinate things. Doesn't make them useless, but saying context is the bottleneck is incorrect.

replies(3): >>45388096 #>>45388362 #>>45398947 #

1. Jweb_Guru ◴[27 Sep 25 20:10 UTC] No.45398947[source]▶

>>45387765 #

Gemini 2.5 Pro is okay if you ask it to work on a very tiny problem. That's about it for me, the other models don't even create a convincing facsimile of reasoning.

↑

Context is the bottleneck for coding agents now