AI agents: Less capability, more reliability, please

FWIW, work has pushed use of Cursor and I quickly came around to a related conclusion: given a reliability vs. anything tradeoff, you more or less always have to prefer reliability. For example, even ignoring subtle head-scratcher type bugs, a faster model's output on average needs more revision before it basically works, and on average you end up spending more energy on that than you save by reducing time to first response. Up-front work that decreases the chance of trouble--detailing how you want something done, explicitly pulling into context specific libraries--also tends to be worth it on net, even if the agent might have gotten there by searching (or you could get it there through follow-up requests).

That's my experience working with a largeish mature codebase (all on non-prod code) where you can't get far if you can't use various internal libraries correctly. With standalone (or small greenfield) projects, where results can lean more on public info from pre-training and there's not as much project specific info to pull in, you might see different outcomes.

Maybe the tech and surrounding practice will change over time, but in my short experience it's mostly been about trying to just get to 'acceptable' for this kind of task.