(runnercode.com)

196 points zmccormick7 | 1 comments | 26 Sep 25 15:06 UTC | HN request time: 0.231s | source

Show context

asdev ◴[26 Sep 25 15:41 UTC] No.45387765[source]▶

I don't think intelligence is increasing. Arbitrary benchmarks don't reflect real world usage. Even with all the context it could possibly have, these models still miss/hallucinate things. Doesn't make them useless, but saying context is the bottleneck is incorrect.

replies(3): >>45388096 #>>45388362 #>>45398947 #

1. chankstein38 ◴[26 Sep 25 16:33 UTC] No.45388362[source]▶

>>45387765 #

Agreed. I feel like, in the case of GPT models, 4o was better in most ways than 5 has been. I'm not seeing increases in quality of anything between the two 5 feels like a major letdown honestly. I am constantly reminding it what we're doing lol

↑

Context is the bottleneck for coding agents now