For me it's simple: even the best models are "lazy" and will confidently declare they're finished when they're obviously not, and the immensely increased amount of training effort to get ChatGPT 5's mild improvements on benchmarks suggests that that quality won't go away anytime soon.
replies(2):