Mercury: Ultra-fast language models based on diffusion

(arxiv.org)

568 points PaulHoule | 2 comments | 07 Jul 25 12:31 UTC | HN request time: 0.412s | source

Show context

mike_hearn ◴[07 Jul 25 13:46 UTC] No.44490340[source]▶

A good chance to bring up something I've been flagging to colleagues for a while now: with LLM agents we are very quickly going to become even more CPU bottlenecked on testing performance than today, and every team I know of today was bottlenecked on CI speed even before LLMs. There's no point having an agent that can write code 100x faster than a human if every change takes an hour to test.

Maybe I've just got unlucky in the past, but in most projects I worked on a lot of developer time was wasted on waiting for PRs to go green. Many runs end up bottlenecked on I/O or availability of workers, and so changes can sit in queues for hours, or they flake out and everything has to start again.

As they get better coding agents are going to be assigned simple tickets that they turn into green PRs, with the model reacting to test failures and fixing them as they go. This will make the CI bottleneck even worse.

It feels like there's a lot of low hanging fruit in most project's testing setups, but for some reason I've seen nearly no progress here for years. It feels like we kinda collectively got used to the idea that CI services are slow and expensive, then stopped trying to improve things. If anything CI got a lot slower over time as people tried to make builds fully hermetic (so no inter-run caching), and move them from on-prem dedicated hardware to expensive cloud VMs with slow IO, which haven't got much faster over time.

Mercury is crazy fast and in a few quick tests I did, created good and correct code. How will we make test execution keep up with it?

replies(28): >>44490408 #>>44490637 #>>44490652 #>>44490785 #>>44491195 #>>44491421 #>>44491483 #>>44491551 #>>44491898 #>>44492096 #>>44492183 #>>44492230 #>>44492386 #>>44492525 #>>44493236 #>>44493262 #>>44493392 #>>44493568 #>>44493577 #>>44495068 #>>44495946 #>>44496321 #>>44496534 #>>44497037 #>>44497707 #>>44498689 #>>44502041 #>>44504650 #

kccqzy ◴[07 Jul 25 14:20 UTC] No.44490652[source]▶

>>44490340 #

> Maybe I've just got unlucky in the past, but in most projects I worked on a lot of developer time was wasted on waiting for PRs to go green.

I don't understand this. Developer time is so much more expensive than machine time. Do companies not just double their CI workers after hearing people complain? It's just a throw-more-resources problem. When I was at Google, it was somewhat common for me to debug non-deterministic bugs such as a missing synchronization or fence causing flakiness; and it was common to just launch 10000 copies of the same test on 10000 machines to find perhaps a single digit number of failures. My current employer has a clunkier implementation of the same thing (no UI), but there's also a single command to launch 1000 test workers to run all tests from your own checkout. The goal is to finish testing a 1M loc codebase in no more than five minutes so that you get quick feedback on your changes.

> make builds fully hermetic (so no inter-run caching)

These are orthogonal. You want maximum deterministic CI steps so that you make builds fully hermetic and cache every single thing.

replies(16): >>44490726 #>>44490764 #>>44491015 #>>44491034 #>>44491088 #>>44491949 #>>44491953 #>>44492546 #>>44493309 #>>44494481 #>>44494583 #>>44495174 #>>44496510 #>>44497007 #>>44500400 #>>44513737 #

IshKebab ◴[07 Jul 25 14:59 UTC] No.44491088[source]▶

>>44490652 #

Developer time is more expensive than machine time, but at most companies it isn't 10000x more expensive. Google is likely an exception because it pays extremely well and has access to very cheap machines.

Even then, there are other factors:

* You might need commercial licenses. It may be very cheap to run open source code 10000x, but guess how much 10000 Questa licenses cost.

* Moores law is dead Amdahl's law very much isn't. Not everything is embarrassingly parallel.

* Some people care about the environment. I worked at a company that spent 200 CPU hours on every single PR (even to fix typos; I failed to convince them they were insane for not using Bazel or similar). That's a not insignificant amount of CO2.

replies(3): >>44491885 #>>44492841 #>>44498020 #

underdeserver ◴[07 Jul 25 16:19 UTC] No.44491885[source]▶

>>44491088 #

That's solvable with modern cloud offerings - Provision spot instances for a few minutes and shut them down afterwards. Let the cloud provider deal with demand balancing.

I think the real issue is that developers waiting for PRs to go green are taking a coffee break between tasks, not sitting idly getting annoyed. If that's the case you're cutting into rest time and won't get much value out of optimizing this.

replies(1): >>44493258 #

IshKebab ◴[07 Jul 25 18:25 UTC] No.44493258[source]▶

>>44491885 #

Both companies I've worked in recently have been too paranoid about IP to use the cloud for CI.

Anyway I don't see how that solves any of the issues except maybe cost to some degree (but maybe not; cloud is expensive).

replies(3): >>44493548 #>>44495106 #>>44495112 #

jiggawatts ◴[07 Jul 25 22:04 UTC] No.44495112[source]▶

>>44493258 #

That’s paranoid to the point of lunacy.

Azure for example has “confidential compute” that encrypts even the memory contents of the VM such that even their own engineers can’t access the contents.

As long as you don’t back up the disks and use HTTPS for pulls, I don’t see a realistic business risk.

If a cloud like Azure or AWS got caught stealing competitor code they’d be sued and immediately lose a huge chunk of their customers.

It makes zero business sense to do so.

PS: Microsoft employees have made public comments saying that they refuse to even look at some open source repository to avoid any risk of accidentally “contaminating” their own code with something that has an incompatible license.

replies(3): >>44495369 #>>44497831 #>>44504385 #

kccqzy ◴[07 Jul 25 22:48 UTC] No.44495369[source]▶

>>44495112 #

I don't know about Azure's implementation of confidential compute but GCP's version basically essentially relies on AMD SEV-SVP. Historically there have been vulnerabilities that undermine the confidentiality guarantee.

replies(1): >>44495691 #

jiggawatts ◴[07 Jul 25 23:44 UTC] No.44495691[source]▶

>>44495369 #

Mandatory XKCD: https://xkcd.com/538/

Nobody's code is that secret, especially not from a vendor like Microsoft.

Unless all development is done with air-gapped machines, realistic development environments are simultaneously exposed to all of the following "leakage risks" because they're using third-party software, almost certainly including a wide range of software from Microsoft:

- Package managers, including compromised or malicious packages.

    Microsoft owns both NuGet and NPM!

- IDEs and their plugins, the latter especially can be a security risk.

    What developer doesn't use Microsoft VS Code these days?

- CLI and local build tools.

- SCM tools such as GitHub Enterprise (Microsoft again!)

- The CI/CD tooling including third-party tools.

- The operating system itself. Microsoft Windows is still a very popular platform, especially in enterprise environments.

- The OS management tools, anti-virus, monitoring, etc...

And on and on.

Unless you live in a total bubble world with USB sticks used to ferry your dependencies into your windowless facility underground, your code is "exposed" to third parties all of the time.

Worrying about possible vulnerabilities in encrypted VMs in a secure cloud facility is missing the real problem that your developers are probably using their home gaming PC for work because it's 10x faster than the garbage you gave them.

Yes, this happens. All the time. You just don't know because you made the perfect the enemy of the good.

replies(3): >>44496235 #>>44496459 #>>44496902 #

1. refulgentis ◴[08 Jul 25 02:36 UTC] No.44496459[source]▶

>>44495691 #

> ...your developers are probably using their home gaming PC for work because it's 10x faster than the garbage you gave them...

I went from a waiter to startup owner and then acquirer, then working for Google. No formal education, no "real job" till Google, really. I'm not sure even when I was a waiter I had this...laissez-faire? naive?...sense of how corporate computing worked.

That aside, the whole argument stands on "well, other bad things can happen more easily!", which we agree is true, but also, it isn't an argument against it.

From a Chesterson's Fence view, one man's numbskull insistence on not using AWS that must only be due to pointy-haired boss syndrome, is another's valiant self-hosting-that-saved-7 figures. Hard to say from the bleachers, especially with OP making neither claim.

replies(1): >>44497982 #

2. topato ◴[08 Jul 25 07:50 UTC] No.44497982[source]▶

>>44496459 (TP) #

As a 35 year old waiter with no formal education, who has also spent the majority of his free time the last 25 years either coding or self-studying to further my coding, I am super interested in your life story. While struggling to scrape by has been "awesome", I'm hoping to one day succeed at making tech my livelihood. Do you have a blog or something? lol

But to go back to the topic: are companies that have such a high level of OpSec actually outfitting devs with garbage, enterprise lease, mid-to-low tier laptops? I only have knowledge from a few friends' experiences, but even guys doing relatively non-hardware intensive workloads are given a Dell XPS or MacBook Pro. I would imagine a fintech would know better AND have the funds to allocate for either of those options

Maybe an in-house SWE at a major bank would end up with that level of OpSec on a mediocre fleet laptop, although I'd hope they'd have managers willing to go to bat for them and an IT department that can accommodate provisioning multiple SKUs depending on an employee's actual computational needs.... perhaps I too have a skewed/naive sense of how the corporate computing world works haha

↑