I really wish people would stop evaluating a model's coding capability with one-shots.
The vast majority of coding energy is what comes next.
Even today, sonnet-3.5 is still the best "existing code base" model. Which is gratifying (to Anthropic) and/or alarming to everyone else