Mercury: Ultra-fast language models based on diffusion

(arxiv.org)

566 points PaulHoule | 3 comments | 07 Jul 25 12:31 UTC | HN request time: 0.683s | source

Show context

mike_hearn ◴[07 Jul 25 13:46 UTC] No.44490340[source]▶

A good chance to bring up something I've been flagging to colleagues for a while now: with LLM agents we are very quickly going to become even more CPU bottlenecked on testing performance than today, and every team I know of today was bottlenecked on CI speed even before LLMs. There's no point having an agent that can write code 100x faster than a human if every change takes an hour to test.

Maybe I've just got unlucky in the past, but in most projects I worked on a lot of developer time was wasted on waiting for PRs to go green. Many runs end up bottlenecked on I/O or availability of workers, and so changes can sit in queues for hours, or they flake out and everything has to start again.

As they get better coding agents are going to be assigned simple tickets that they turn into green PRs, with the model reacting to test failures and fixing them as they go. This will make the CI bottleneck even worse.

It feels like there's a lot of low hanging fruit in most project's testing setups, but for some reason I've seen nearly no progress here for years. It feels like we kinda collectively got used to the idea that CI services are slow and expensive, then stopped trying to improve things. If anything CI got a lot slower over time as people tried to make builds fully hermetic (so no inter-run caching), and move them from on-prem dedicated hardware to expensive cloud VMs with slow IO, which haven't got much faster over time.

Mercury is crazy fast and in a few quick tests I did, created good and correct code. How will we make test execution keep up with it?

replies(28): >>44490408 #>>44490637 #>>44490652 #>>44490785 #>>44491195 #>>44491421 #>>44491483 #>>44491551 #>>44491898 #>>44492096 #>>44492183 #>>44492230 #>>44492386 #>>44492525 #>>44493236 #>>44493262 #>>44493392 #>>44493568 #>>44493577 #>>44495068 #>>44495946 #>>44496321 #>>44496534 #>>44497037 #>>44497707 #>>44498689 #>>44502041 #>>44504650 #

rafaelmn ◴[07 Jul 25 16:38 UTC] No.44492096[source]▶

>>44490340 #

> If anything CI got a lot slower over time as people tried to make builds fully hermetic (so no inter-run caching), and move them from on-prem dedicated hardware to expensive cloud VMs with slow IO, which haven't got much faster over time.

I am guesstimating (based on previous experience self-hosting the runner for MacOS builds) that the project I am working on could get like 2-5x pipeline performance at 1/2 cost just by using self-hosted runners on bare metal rented machines like Hetzner. Maybe I am naive, and I am not the person that would be responsible for it - but having a few bare metal machines you can use in the off hours to run regression tests, for less than you are paying the existing CI runner just for build, that speed up everything massively seems like a pure win for relatively low effort. Like sure everyone already has stuff on their plate and would rather pay external service to do it - but TBH once you have this kind of compute handy you will find uses anyway and just doing things efficiently. And knowing how to deal with bare metal/utilize this kind of compute sounds generally useful skill - but I rarely encounter people enthusiastic about making this kind of move. Its usually - hey lets move to this other service that has slightly cheaper instances and a proprietary caching layer so that we can get locked into their CI crap.

Its not like these services have 0 downtime/bug free/do not require integration effort - I just don't see why going bare metal is always such a taboo topic even for simple stuff like builds.

replies(3): >>44492590 #>>44492834 #>>44494311 #

1. azeirah ◴[07 Jul 25 17:24 UTC] No.44492590[source]▶

>>44492096 #

At the last place I worked at, which was just a small startup with 5 developers, I calculated that a server workstation in the office would be both cheaper and more performant than renting a similar machine in the cloud.

Bare metal makes such a big difference for test and CI scenarios. It even has an integrated a GPU to speed up webdev tests. Good luck finding an affordable machine in the cloud that has a proper GPU for this kind of a use-case

replies(1): >>44492958 #

2. rafaelmn ◴[07 Jul 25 17:55 UTC] No.44492958[source]▶

>>44492590 (TP) #

Is it a startup or small business ? In my book a startup expects to scale and hosting bare metal HW in an office with 5 people means you have to figure everything out again when you get 20/50/100 people - IMO not worth the effort and hosting hardware has zero transferable skills to your product.

Running on managed bare metal servers is theoretically the same as running any other infra provider except you are on the hook for a bit more maintenance, you scale to 20 people you just rent a few more machines. I really do not see many downsides for the build server/test runner scenario.

replies(1): >>44507346 #

3. mike_hearn ◴[09 Jul 25 07:57 UTC] No.44507346[source]▶

>>44492958 #

Every business wants to scale, not many do. There's no real difference between a startup and a small business, except perhaps how strong or realistic the dreams are.

Consider that even VC-backed Google started out with lots of cost saving measures in place. Once they brought their second datacenter online, the way they uploaded the index to it was Lucas putting a big stack of hard drives in the boot of his car and driving across the USA to plug them in. This continued for a while until they were able to strike a special deal on bandwidth.

That was just one of the unconventional things they did to save money. Using Linux was another. Although we now think of what Google did as obvious, at the time it was considered bizarre and radical that a search engine company - a software company - would rack bare motherboards onto corkboard bases, just to save money. All their competitors were running on commercial UNIX big iron.

https://www.datacenterknowledge.com/hyperscalers/looking-bac...

↑