Mercury: Ultra-fast language models based on diffusion

(arxiv.org)

566 points PaulHoule | 1 comments | 07 Jul 25 12:31 UTC | HN request time: 0.206s | source

Show context

mike_hearn ◴[07 Jul 25 13:46 UTC] No.44490340[source]▶

A good chance to bring up something I've been flagging to colleagues for a while now: with LLM agents we are very quickly going to become even more CPU bottlenecked on testing performance than today, and every team I know of today was bottlenecked on CI speed even before LLMs. There's no point having an agent that can write code 100x faster than a human if every change takes an hour to test.

Maybe I've just got unlucky in the past, but in most projects I worked on a lot of developer time was wasted on waiting for PRs to go green. Many runs end up bottlenecked on I/O or availability of workers, and so changes can sit in queues for hours, or they flake out and everything has to start again.

As they get better coding agents are going to be assigned simple tickets that they turn into green PRs, with the model reacting to test failures and fixing them as they go. This will make the CI bottleneck even worse.

It feels like there's a lot of low hanging fruit in most project's testing setups, but for some reason I've seen nearly no progress here for years. It feels like we kinda collectively got used to the idea that CI services are slow and expensive, then stopped trying to improve things. If anything CI got a lot slower over time as people tried to make builds fully hermetic (so no inter-run caching), and move them from on-prem dedicated hardware to expensive cloud VMs with slow IO, which haven't got much faster over time.

Mercury is crazy fast and in a few quick tests I did, created good and correct code. How will we make test execution keep up with it?

replies(28): >>44490408 #>>44490637 #>>44490652 #>>44490785 #>>44491195 #>>44491421 #>>44491483 #>>44491551 #>>44491898 #>>44492096 #>>44492183 #>>44492230 #>>44492386 #>>44492525 #>>44493236 #>>44493262 #>>44493392 #>>44493568 #>>44493577 #>>44495068 #>>44495946 #>>44496321 #>>44496534 #>>44497037 #>>44497707 #>>44498689 #>>44502041 #>>44504650 #

rapind ◴[07 Jul 25 16:20 UTC] No.44491898[source]▶

>>44490340 #

> This will make the CI bottleneck even worse.

I agree. I think there are potentially multiple solutions to this since there are multiple bottlenecks. The most obvious is probably network overhead when talking to a database. Another might be storage overhead if storage is being used.

Frankly another one is language. I suspect type-safe, compiled, functional languages are going to see some big advantages here over dynamic interpreted languages. I think this is the sweet spot that grants you a ton of performance over dynamic languages, gives you more confidence in the models changes, and requires less testing.

Faster turn-around, even when you're leaning heavily on AI, is a competitive advantage IMO.

replies(1): >>44492889 #

1. mike_hearn ◴[07 Jul 25 17:50 UTC] No.44492889[source]▶

>>44491898 #

It could go either way. Depends very much on what kind of errors LLMs make.

Type safe languages in theory should do well, because you get feedback on hallucinated APIs very fast. But if the LLM generally writes code that compiles, unless the compiler is very fast you might get out-run by an LLM just spitting out JavaScript at high speed, because it's faster to run the tests than wait for the compile.

The sweet spot is probably JIT compiled type safe languages. Java, Kotlin, TypeScript. The type systems can find enough bugs to be worth it, but you don't have to wait too long to get test results either.

↑