Mercury: Ultra-fast language models based on diffusion

(arxiv.org)

566 points PaulHoule | 1 comments | 07 Jul 25 12:31 UTC | HN request time: 0.211s | source

Show context

mike_hearn ◴[07 Jul 25 13:46 UTC] No.44490340[source]▶

A good chance to bring up something I've been flagging to colleagues for a while now: with LLM agents we are very quickly going to become even more CPU bottlenecked on testing performance than today, and every team I know of today was bottlenecked on CI speed even before LLMs. There's no point having an agent that can write code 100x faster than a human if every change takes an hour to test.

Maybe I've just got unlucky in the past, but in most projects I worked on a lot of developer time was wasted on waiting for PRs to go green. Many runs end up bottlenecked on I/O or availability of workers, and so changes can sit in queues for hours, or they flake out and everything has to start again.

As they get better coding agents are going to be assigned simple tickets that they turn into green PRs, with the model reacting to test failures and fixing them as they go. This will make the CI bottleneck even worse.

It feels like there's a lot of low hanging fruit in most project's testing setups, but for some reason I've seen nearly no progress here for years. It feels like we kinda collectively got used to the idea that CI services are slow and expensive, then stopped trying to improve things. If anything CI got a lot slower over time as people tried to make builds fully hermetic (so no inter-run caching), and move them from on-prem dedicated hardware to expensive cloud VMs with slow IO, which haven't got much faster over time.

Mercury is crazy fast and in a few quick tests I did, created good and correct code. How will we make test execution keep up with it?

replies(28): >>44490408 #>>44490637 #>>44490652 #>>44490785 #>>44491195 #>>44491421 #>>44491483 #>>44491551 #>>44491898 #>>44492096 #>>44492183 #>>44492230 #>>44492386 #>>44492525 #>>44493236 #>>44493262 #>>44493392 #>>44493568 #>>44493577 #>>44495068 #>>44495946 #>>44496321 #>>44496534 #>>44497037 #>>44497707 #>>44498689 #>>44502041 #>>44504650 #

piva00 ◴[07 Jul 25 14:18 UTC] No.44490637[source]▶

>>44490340 #

I haven't worked in places using off-the-shelf/SaaS CI in more than a decade so I feel my experience has been quite the opposite from yours.

We always worked hard to make the CI/CD pipeline as fast as possible. I personally worked on those kind of projects at 2 different employers as a SRE: a smaller 300-people shop which I was responsible for all their infra needs (CI/CD, live deployments, migrated later to k8s when it became somewhat stable, at least enough for the workloads we ran, but still in its beta-days), then at a different employer some 5k+ strong working on improving the CI/CD setup which used Jenkins as a backend but we developed a completely different shim on top for developer experience while also working on a bespoke worker scheduler/runner.

I haven't experienced a CI/CD setup that takes longer than 10 minutes to run in many, many years, got quite surprised reading your comment and feeling spoiled I haven't felt this pain for more than a decade, didn't really expect it was still an issue.

replies(1): >>44490712 #

1. mike_hearn ◴[07 Jul 25 14:25 UTC] No.44490712[source]▶

>>44490637 #

I think the prevalence of teams having a "CI guy" who often is developing custom glue, is a sign that CI is still not really working as well as it should given the age of the tech.

I've done a lot of work on systems software over the years so there's often tests that are very I/O or computation heavy, lots of cryptography, or compilation, things like that. But probably there are places doing just ordinary CRUD web app development where there's Playwright tests or similar that are quite slow.

A lot of the problems are cultural. CI times are a commons, so it can end in tragedy. If everyone is responsible for CI times then nobody is. Eventually management gets sick of pouring money into it and devs learn to juggle stacks of PRs on top of each other. Sometimes you get a lot of pushback on attempts to optimize CI because some devs will really scream about any optimization that might potentially go wrong (e.g. depending on your build system cache), even if caching nothing causes an explosion in CI costs. Not their money, after all.

↑