Mercury: Ultra-fast language models based on diffusion

(arxiv.org)

566 points PaulHoule | 1 comments | 07 Jul 25 12:31 UTC | HN request time: 0.246s | source

Show context

mike_hearn ◴[07 Jul 25 13:46 UTC] No.44490340[source]▶

A good chance to bring up something I've been flagging to colleagues for a while now: with LLM agents we are very quickly going to become even more CPU bottlenecked on testing performance than today, and every team I know of today was bottlenecked on CI speed even before LLMs. There's no point having an agent that can write code 100x faster than a human if every change takes an hour to test.

Maybe I've just got unlucky in the past, but in most projects I worked on a lot of developer time was wasted on waiting for PRs to go green. Many runs end up bottlenecked on I/O or availability of workers, and so changes can sit in queues for hours, or they flake out and everything has to start again.

As they get better coding agents are going to be assigned simple tickets that they turn into green PRs, with the model reacting to test failures and fixing them as they go. This will make the CI bottleneck even worse.

It feels like there's a lot of low hanging fruit in most project's testing setups, but for some reason I've seen nearly no progress here for years. It feels like we kinda collectively got used to the idea that CI services are slow and expensive, then stopped trying to improve things. If anything CI got a lot slower over time as people tried to make builds fully hermetic (so no inter-run caching), and move them from on-prem dedicated hardware to expensive cloud VMs with slow IO, which haven't got much faster over time.

Mercury is crazy fast and in a few quick tests I did, created good and correct code. How will we make test execution keep up with it?

replies(28): >>44490408 #>>44490637 #>>44490652 #>>44490785 #>>44491195 #>>44491421 #>>44491483 #>>44491551 #>>44491898 #>>44492096 #>>44492183 #>>44492230 #>>44492386 #>>44492525 #>>44493236 #>>44493262 #>>44493392 #>>44493568 #>>44493577 #>>44495068 #>>44495946 #>>44496321 #>>44496534 #>>44497037 #>>44497707 #>>44498689 #>>44502041 #>>44504650 #

rafaelmn ◴[07 Jul 25 16:38 UTC] No.44492096[source]▶

>>44490340 #

> If anything CI got a lot slower over time as people tried to make builds fully hermetic (so no inter-run caching), and move them from on-prem dedicated hardware to expensive cloud VMs with slow IO, which haven't got much faster over time.

I am guesstimating (based on previous experience self-hosting the runner for MacOS builds) that the project I am working on could get like 2-5x pipeline performance at 1/2 cost just by using self-hosted runners on bare metal rented machines like Hetzner. Maybe I am naive, and I am not the person that would be responsible for it - but having a few bare metal machines you can use in the off hours to run regression tests, for less than you are paying the existing CI runner just for build, that speed up everything massively seems like a pure win for relatively low effort. Like sure everyone already has stuff on their plate and would rather pay external service to do it - but TBH once you have this kind of compute handy you will find uses anyway and just doing things efficiently. And knowing how to deal with bare metal/utilize this kind of compute sounds generally useful skill - but I rarely encounter people enthusiastic about making this kind of move. Its usually - hey lets move to this other service that has slightly cheaper instances and a proprietary caching layer so that we can get locked into their CI crap.

Its not like these services have 0 downtime/bug free/do not require integration effort - I just don't see why going bare metal is always such a taboo topic even for simple stuff like builds.

replies(3): >>44492590 #>>44492834 #>>44494311 #

1. mike_hearn ◴[07 Jul 25 17:45 UTC] No.44492834[source]▶

>>44492096 #

Yep. For my own company I used a bare metal machine in Hetzner running Linux and a Windows VM along with a bunch of old MacBook Pros wired up in the home office for CI.

It works, and it's cheap. A full CI run still takes half an hour on the Linux machine (the product [1] is a kind of build system for shipping desktop apps cross platform, so there's lots of file IO and cryptography involved). The Macs are by far the fastest. The M1 Mac is embarrassingly fast. It can complete the same run in five minutes despite the Hetzner box having way more hardware. In fairness, it's running both a Linux and Windows build simultaneously.

I'm convinced the quickest way to improve CI times in most shops is to just build an in-office cluster of M4 Macs in an air conditioned room. They don't have to be HA. The hardware is more expensive but you don't rent per month, and CI is often bottlenecked on serial execution speed so the higher single threaded performance of Apple Silicon is worth it. Also, pay for a decent CI system like TeamCity. It helps reduce egregious waste from problems like not caching things or not re-using checkout directories. In several years of doing this I haven't had build caching related failures.

[1] https://hydraulic.dev/

↑