I think the author is doing apples to oranges comparison. If you have AI acting agnatically, capability is likely positively correlated with reliability. If you don't have AI agents, it is more reliable.
AI agents are not there yet and even cursor has agent mode not selected by default. I have seen cursor agent quite a bit worse that the raw model with human selected context.