Mercury: Ultra-fast language models based on diffusion

(arxiv.org)

568 points PaulHoule | 2 comments | 07 Jul 25 12:31 UTC | HN request time: 0.416s | source

Show context

mike_hearn ◴[07 Jul 25 13:46 UTC] No.44490340[source]▶

A good chance to bring up something I've been flagging to colleagues for a while now: with LLM agents we are very quickly going to become even more CPU bottlenecked on testing performance than today, and every team I know of today was bottlenecked on CI speed even before LLMs. There's no point having an agent that can write code 100x faster than a human if every change takes an hour to test.

Maybe I've just got unlucky in the past, but in most projects I worked on a lot of developer time was wasted on waiting for PRs to go green. Many runs end up bottlenecked on I/O or availability of workers, and so changes can sit in queues for hours, or they flake out and everything has to start again.

As they get better coding agents are going to be assigned simple tickets that they turn into green PRs, with the model reacting to test failures and fixing them as they go. This will make the CI bottleneck even worse.

It feels like there's a lot of low hanging fruit in most project's testing setups, but for some reason I've seen nearly no progress here for years. It feels like we kinda collectively got used to the idea that CI services are slow and expensive, then stopped trying to improve things. If anything CI got a lot slower over time as people tried to make builds fully hermetic (so no inter-run caching), and move them from on-prem dedicated hardware to expensive cloud VMs with slow IO, which haven't got much faster over time.

Mercury is crazy fast and in a few quick tests I did, created good and correct code. How will we make test execution keep up with it?

replies(28): >>44490408 #>>44490637 #>>44490652 #>>44490785 #>>44491195 #>>44491421 #>>44491483 #>>44491551 #>>44491898 #>>44492096 #>>44492183 #>>44492230 #>>44492386 #>>44492525 #>>44493236 #>>44493262 #>>44493392 #>>44493568 #>>44493577 #>>44495068 #>>44495946 #>>44496321 #>>44496534 #>>44497037 #>>44497707 #>>44498689 #>>44502041 #>>44504650 #

1. drzaiusx11 ◴[08 Jul 25 00:45 UTC] No.44495946[source]▶

>>44490340 #

The nice part about most CI workloads is that they can almost always be split up and executed in parallel. Make sure you're utilizing every core on every CI worker and your worker pools are appropriately sized for the workload. Use spot instances and add auto scaling where it makes sense. No one should be waiting more than a few minutes for a PR build. Exception being compile time which can vary significantly between languages. I have a couple projects that are stuck on ancient compilers because of CPU architecture and C variant, so those will always be a dog without effort to move to something better. Ymmv

replies(1): >>44496161 #

2. drzaiusx11 ◴[08 Jul 25 01:30 UTC] No.44496161[source]▶

>>44495946 (TP) #

As an example we recently had a Ruby application that had a test suite that was taking literally an hour per build, but turned out it was running entirely sequential by default, using only 1 core. I spent an afternoon migrating our CI runners to split the workload across all available cores and now it's 5 minutes per build. And that was just the low hanging fruit, it can be significantly improved further but there's obviously diminishing returns

↑