What is health of your enterprise code base? If it’s anything like ones I’ve experienced it’s a legacy mess then it’s absolutely understandable that an LLMs output is subpar when taking on larger tasks.
Also depends on the models and plan you’re on. There is a significant increase in quality when comparing Cursors default model on a free plan vs Opus 4.5 on a maximum Claude plan.
I think a good exercise is to prohibit yourself from writing any code manually and force yourself to do LLM only, might sound silly but it will develop that skill-set.
Try Claude code in thinking mode with the some super powers - https://github.com/obra/superpowers
I routinely make an implementation plan with Claude and then step away for 15 mins while it spins - the results aren’t perfect but fixing that remaining 10% is better than writing 100% of it myself.