(spectrum.ieee.org)

183 points WolfOliver | 2 comments | 29 Aug 25 15:24 UTC | HN request time: 0s | source

Show context

crooked-v ◴[29 Aug 25 16:24 UTC] No.45066121[source]▶

For me it's simple: even the best models are "lazy" and will confidently declare they're finished when they're obviously not, and the immensely increased amount of training effort to get ChatGPT 5's mild improvements on benchmarks suggests that that quality won't go away anytime soon.

replies(2): >>45066370 #>>45066507 #

1. worldsayshi ◴[29 Aug 25 16:43 UTC] No.45066370[source]▶

>>45066121 #

Sounds like it's partially about a nuanced trade-off. It can just as well be too eager and add changes I didn't ask for. Being lazy is better than continuing on a bad path.

replies(1): >>45067426 #

2. crooked-v ◴[29 Aug 25 18:03 UTC] No.45067426[source]▶

>>45066370 (TP) #

There's a long distance between "nuanced behavior" and what it actually does now, which is "complete 6 items of an explicit 10-item task list and then ask the user again if they want to continue".

↑

AI’s coding evolution hinges on collaboration and trust