I'm not sure about this. The tests I've gotten out in a few hours are the kind I'd approve if another dev sent then but haven't really ended up finding meaningful issues.
I'm not sure about this. The tests I've gotten out in a few hours are the kind I'd approve if another dev sent then but haven't really ended up finding meaningful issues.
People say their prompts are good, awesome code is being generated, it solved a month's worth of work in a minute. Nobody comes with receipts.
I had the agent scan the UX of the app being built, find all the common flows and save them to a markdown file.
I then asked the agent to find edge cases for them and come up with tests for those scenarios. I then set off parallel subagents to develop the the test suite.
It found some really interesting edge cases running them - so even if they never failed again there is value there.
I do realise in hindsight it makes it sound like the tests were just a load of nonsense. I was blown away with how well Claude Code + Opus 4.5 + 6 parallel subagents handled this.