Don’t give up on TDD.
I’ve invested hundreds of hours in process and tooling, and can now ship major features with tests in record time with Claude Code.
You have to coach it in TDD - no matter how much you explain in CLAUDE.md. That’s part because “a test that fails because the code isn’t written yet” is conceptually very similar to “a test that passes without the code we’re about to write” and is also similar to “a test that asserts the code we’re about to write is not there”. You have to watch closely to make sure it produces the first thing.
Why does it keep getting confused? You can’t blame it really. When two things are conceptually similar, models need lots of examples to distinguish between them. If the set of samples is sparse the model is likely to jump the small distance from a concept to similar ones.
So, you have to accept this as how Claude 4 works, keep it on a short leash, keep reminding it that it must watch the test fail, ask it if the test failed for the right reason (not some setup issue), and THEN give it permission to write the code.
The result is two mirror copies of your feature or fix: code and tests.
Reviewing code and tests together is pleasant because they mirror one another. The tests forever ensure your feature works as described, no manual testing needed, no regressions. And the model knows all the tricks to make your tests really beautiful.
TDD is the check and balance missing from most people’s agentic software dev process.