Context is the bottleneck for coding agents now

IMHO, jumping from Level 2 to Level 5 is a matter of:

- Better structured codebases - we need hierarchical codebases with minimal depth, maximal orthogonality and reasonable width. Think microservices.

- Better documentation - most code documentations are not built to handle updates. We need a proper graph structure with few sources of truth that get propagated downstream. Again, some optimal sort of hierarchy is crucial here.

At this point, I really don't think that we necessarily need better agents.

Setup your codebase optimally, spin up 5-10 instances of gpt-5-codex-high for each issue/feature/refactor (pick the best according to some criteria) and your life will go smoothly