Claude struggles with writing a test that’s meant to fail but it can be coaxed into doing it on the second or third attempt. Luckily it does not struggle with me insisting the failure be for the right reason. (As opposed to failing because of a setup issue or a problem elsewhere in the code)
When doing TDD with Claude Code I lean heavily on asking the agent two things: “can we watch it fail” and “does it fail for the right reason”. These questions are generic enough to sleepwalk through building most features and fixing all bugs. Yes I said all bugs.
Reviewing the code is very pleasant because you get both the tests and production code and you can rely on the symmetry between them to understand the code’s intent and confirm that it does what it says.
In my experience over multiple months of greenfield and brownfield work, Claude doing TDD produces code that is 100% the quality and clarity I’d have achieved had I built the thing myself, and it does it 100% of the time. Big part of that is because TDD compartmentalizes each task making it easy to avoid a single task having too much complexity.