Mercury: Ultra-fast language models based on diffusion

A good chance to bring up something I've been flagging to colleagues for a while now: with LLM agents we are very quickly going to become even more CPU bottlenecked on testing performance than today, and every team I know of today was bottlenecked on CI speed even before LLMs. There's no point having an agent that can write code 100x faster than a human if every change takes an hour to test.

Maybe I've just got unlucky in the past, but in most projects I worked on a lot of developer time was wasted on waiting for PRs to go green. Many runs end up bottlenecked on I/O or availability of workers, and so changes can sit in queues for hours, or they flake out and everything has to start again.

As they get better coding agents are going to be assigned simple tickets that they turn into green PRs, with the model reacting to test failures and fixing them as they go. This will make the CI bottleneck even worse.

It feels like there's a lot of low hanging fruit in most project's testing setups, but for some reason I've seen nearly no progress here for years. It feels like we kinda collectively got used to the idea that CI services are slow and expensive, then stopped trying to improve things. If anything CI got a lot slower over time as people tried to make builds fully hermetic (so no inter-run caching), and move them from on-prem dedicated hardware to expensive cloud VMs with slow IO, which haven't got much faster over time.

Mercury is crazy fast and in a few quick tests I did, created good and correct code. How will we make test execution keep up with it?

Any modern MacBook can run those tests 100x faster than the crappy cloud runners most companies use. You can also configure runners that run locally and get the benefit of those speed gains. So all of this is really a business and technical problem that is solved for those who want to solve it. It can be solved very cheap, or it can be solved very expensive. Regardless, it's precisely those types of efficiency gains that motivate companies to finally do something about it.

And if not, then enjoy being paid waiting for CI to go green. Maybe it's a reminder to go take a break.

It will be worse when the process is super optimized and the expectation changes. So now instead of those 2 PRs that went to prod today because everyone knows CI takes forever, you'll be expected to push 8 because in our super optimized pipeline it only takes seconds. No excuses. Now the bottleneck is you.