Mercury: Ultra-fast language models based on diffusion

(arxiv.org)

566 points PaulHoule | 2 comments | 07 Jul 25 12:31 UTC | HN request time: 0.697s | source

Show context

mike_hearn ◴[07 Jul 25 13:46 UTC] No.44490340[source]▶

A good chance to bring up something I've been flagging to colleagues for a while now: with LLM agents we are very quickly going to become even more CPU bottlenecked on testing performance than today, and every team I know of today was bottlenecked on CI speed even before LLMs. There's no point having an agent that can write code 100x faster than a human if every change takes an hour to test.

Maybe I've just got unlucky in the past, but in most projects I worked on a lot of developer time was wasted on waiting for PRs to go green. Many runs end up bottlenecked on I/O or availability of workers, and so changes can sit in queues for hours, or they flake out and everything has to start again.

As they get better coding agents are going to be assigned simple tickets that they turn into green PRs, with the model reacting to test failures and fixing them as they go. This will make the CI bottleneck even worse.

It feels like there's a lot of low hanging fruit in most project's testing setups, but for some reason I've seen nearly no progress here for years. It feels like we kinda collectively got used to the idea that CI services are slow and expensive, then stopped trying to improve things. If anything CI got a lot slower over time as people tried to make builds fully hermetic (so no inter-run caching), and move them from on-prem dedicated hardware to expensive cloud VMs with slow IO, which haven't got much faster over time.

Mercury is crazy fast and in a few quick tests I did, created good and correct code. How will we make test execution keep up with it?

replies(28): >>44490408 #>>44490637 #>>44490652 #>>44490785 #>>44491195 #>>44491421 #>>44491483 #>>44491551 #>>44491898 #>>44492096 #>>44492183 #>>44492230 #>>44492386 #>>44492525 #>>44493236 #>>44493262 #>>44493392 #>>44493568 #>>44493577 #>>44495068 #>>44495946 #>>44496321 #>>44496534 #>>44497037 #>>44497707 #>>44498689 #>>44502041 #>>44504650 #

1. daxfohl ◴[07 Jul 25 17:19 UTC] No.44492525[source]▶

>>44490340 #

There are a couple mitigating considerations

1. As implementation phase gets faster, the bottleneck could actually switch to PM. In which case, changes will be more serial, so a lot fewer conflicts to worry about.

2. I think we could see a resurrection of specs like TLA+. Most engineers don't bother with them, but I imagine code agents could quickly create them, verify the code is consistent with them, and then require fewer full integration tests.

3. When background agents are cleaning up redundant code, they can also clean up redundant tests.

4. Unlike human engineering teams, I expect AIs to work more efficiently on monoliths than with distributed microservices. This could lead to better coverage on locally runnable tests, reducing flakes and CI load.

5. It's interesting that even as AI increases efficiency, that increased velocity and sheer amount of code it'll write and execute for new use cases will create its own problems that we'll have to solve. I think we'll continue to have new problems for human engineers to solve for quite some time.

replies(1): >>44498167 #

2. valenterry ◴[08 Jul 25 08:29 UTC] No.44498167[source]▶

>>44492525 (TP) #

> 2. I think we could see a resurrection of specs like TLA+.

I think so too. But it's not gonna be TLA+. It's just gonne be programming languages that allow to catch problems with their typesystem much more comprehensively, allowing AI to iterate quickly without even having to run unit-tests.

While developers don't want to spend the time to learn it and prefer easy-to-learn languages such as golang, LLMs only have to be trained once and then you can reap the benefits permanently.

↑