Mercury: Ultra-fast language models based on diffusion

(arxiv.org)

568 points PaulHoule | 1 comments | 07 Jul 25 12:31 UTC | HN request time: 0.209s | source

Show context

mike_hearn ◴[07 Jul 25 13:46 UTC] No.44490340[source]▶

A good chance to bring up something I've been flagging to colleagues for a while now: with LLM agents we are very quickly going to become even more CPU bottlenecked on testing performance than today, and every team I know of today was bottlenecked on CI speed even before LLMs. There's no point having an agent that can write code 100x faster than a human if every change takes an hour to test.

Maybe I've just got unlucky in the past, but in most projects I worked on a lot of developer time was wasted on waiting for PRs to go green. Many runs end up bottlenecked on I/O or availability of workers, and so changes can sit in queues for hours, or they flake out and everything has to start again.

As they get better coding agents are going to be assigned simple tickets that they turn into green PRs, with the model reacting to test failures and fixing them as they go. This will make the CI bottleneck even worse.

It feels like there's a lot of low hanging fruit in most project's testing setups, but for some reason I've seen nearly no progress here for years. It feels like we kinda collectively got used to the idea that CI services are slow and expensive, then stopped trying to improve things. If anything CI got a lot slower over time as people tried to make builds fully hermetic (so no inter-run caching), and move them from on-prem dedicated hardware to expensive cloud VMs with slow IO, which haven't got much faster over time.

Mercury is crazy fast and in a few quick tests I did, created good and correct code. How will we make test execution keep up with it?

replies(28): >>44490408 #>>44490637 #>>44490652 #>>44490785 #>>44491195 #>>44491421 #>>44491483 #>>44491551 #>>44491898 #>>44492096 #>>44492183 #>>44492230 #>>44492386 #>>44492525 #>>44493236 #>>44493262 #>>44493392 #>>44493568 #>>44493577 #>>44495068 #>>44495946 #>>44496321 #>>44496534 #>>44497037 #>>44497707 #>>44498689 #>>44502041 #>>44504650 #

kccqzy ◴[07 Jul 25 14:20 UTC] No.44490652[source]▶

>>44490340 #

> Maybe I've just got unlucky in the past, but in most projects I worked on a lot of developer time was wasted on waiting for PRs to go green.

I don't understand this. Developer time is so much more expensive than machine time. Do companies not just double their CI workers after hearing people complain? It's just a throw-more-resources problem. When I was at Google, it was somewhat common for me to debug non-deterministic bugs such as a missing synchronization or fence causing flakiness; and it was common to just launch 10000 copies of the same test on 10000 machines to find perhaps a single digit number of failures. My current employer has a clunkier implementation of the same thing (no UI), but there's also a single command to launch 1000 test workers to run all tests from your own checkout. The goal is to finish testing a 1M loc codebase in no more than five minutes so that you get quick feedback on your changes.

> make builds fully hermetic (so no inter-run caching)

These are orthogonal. You want maximum deterministic CI steps so that you make builds fully hermetic and cache every single thing.

replies(16): >>44490726 #>>44490764 #>>44491015 #>>44491034 #>>44491088 #>>44491949 #>>44491953 #>>44492546 #>>44493309 #>>44494481 #>>44494583 #>>44495174 #>>44496510 #>>44497007 #>>44500400 #>>44513737 #

IshKebab ◴[07 Jul 25 14:59 UTC] No.44491088[source]▶

>>44490652 #

Developer time is more expensive than machine time, but at most companies it isn't 10000x more expensive. Google is likely an exception because it pays extremely well and has access to very cheap machines.

Even then, there are other factors:

* You might need commercial licenses. It may be very cheap to run open source code 10000x, but guess how much 10000 Questa licenses cost.

* Moores law is dead Amdahl's law very much isn't. Not everything is embarrassingly parallel.

* Some people care about the environment. I worked at a company that spent 200 CPU hours on every single PR (even to fix typos; I failed to convince them they were insane for not using Bazel or similar). That's a not insignificant amount of CO2.

replies(3): >>44491885 #>>44492841 #>>44498020 #

underdeserver ◴[07 Jul 25 16:19 UTC] No.44491885[source]▶

>>44491088 #

That's solvable with modern cloud offerings - Provision spot instances for a few minutes and shut them down afterwards. Let the cloud provider deal with demand balancing.

I think the real issue is that developers waiting for PRs to go green are taking a coffee break between tasks, not sitting idly getting annoyed. If that's the case you're cutting into rest time and won't get much value out of optimizing this.

replies(1): >>44493258 #

IshKebab ◴[07 Jul 25 18:25 UTC] No.44493258[source]▶

>>44491885 #

Both companies I've worked in recently have been too paranoid about IP to use the cloud for CI.

Anyway I don't see how that solves any of the issues except maybe cost to some degree (but maybe not; cloud is expensive).

replies(3): >>44493548 #>>44495106 #>>44495112 #

jiggawatts ◴[07 Jul 25 22:04 UTC] No.44495112[source]▶

>>44493258 #

That’s paranoid to the point of lunacy.

Azure for example has “confidential compute” that encrypts even the memory contents of the VM such that even their own engineers can’t access the contents.

As long as you don’t back up the disks and use HTTPS for pulls, I don’t see a realistic business risk.

If a cloud like Azure or AWS got caught stealing competitor code they’d be sued and immediately lose a huge chunk of their customers.

It makes zero business sense to do so.

PS: Microsoft employees have made public comments saying that they refuse to even look at some open source repository to avoid any risk of accidentally “contaminating” their own code with something that has an incompatible license.

replies(3): >>44495369 #>>44497831 #>>44504385 #

kccqzy ◴[07 Jul 25 22:48 UTC] No.44495369[source]▶

>>44495112 #

I don't know about Azure's implementation of confidential compute but GCP's version basically essentially relies on AMD SEV-SVP. Historically there have been vulnerabilities that undermine the confidentiality guarantee.

replies(1): >>44495691 #

jiggawatts ◴[07 Jul 25 23:44 UTC] No.44495691[source]▶

>>44495369 #

Mandatory XKCD: https://xkcd.com/538/

Nobody's code is that secret, especially not from a vendor like Microsoft.

Unless all development is done with air-gapped machines, realistic development environments are simultaneously exposed to all of the following "leakage risks" because they're using third-party software, almost certainly including a wide range of software from Microsoft:

- Package managers, including compromised or malicious packages.

    Microsoft owns both NuGet and NPM!

- IDEs and their plugins, the latter especially can be a security risk.

    What developer doesn't use Microsoft VS Code these days?

- CLI and local build tools.

- SCM tools such as GitHub Enterprise (Microsoft again!)

- The CI/CD tooling including third-party tools.

- The operating system itself. Microsoft Windows is still a very popular platform, especially in enterprise environments.

- The OS management tools, anti-virus, monitoring, etc...

And on and on.

Unless you live in a total bubble world with USB sticks used to ferry your dependencies into your windowless facility underground, your code is "exposed" to third parties all of the time.

Worrying about possible vulnerabilities in encrypted VMs in a secure cloud facility is missing the real problem that your developers are probably using their home gaming PC for work because it's 10x faster than the garbage you gave them.

Yes, this happens. All the time. You just don't know because you made the perfect the enemy of the good.

replies(3): >>44496235 #>>44496459 #>>44496902 #

oldsecondhand ◴[08 Jul 25 03:51 UTC] No.44496902[source]▶

>>44495691 #

> missing the real problem that your developers are probably using their home gaming PC for work because it's 10x faster than the garbage you gave them.

> Yes, this happens. All the time. You just don't know because you made the perfect the enemy of the good.

That only happens in cowboy coding startups.

In places where security matters (e.g. fintech jobs), they just lock down your PC (no admin rights), encrypt the storage and part of your VPN credentials will be on a part of your storage that you can't access.

replies(2): >>44498150 #>>44499061 #

1. jiggawatts ◴[08 Jul 25 11:36 UTC] No.44499061[source]▶

>>44496902 #

Microsoft, Google, or Amazon don't t care about your fintech code. Other fintechs do.

The threat isn't your cloud provider stealing your code, it's your own staff walking out the door with it and either starting their own firm or giving it to a competitor in exchange for a "job" at 2x their previous salary.

I've seen very high security fintech setups first-hand and I've got friends in the industry, including a friend that simply memorised the core algorithms, walked out, rewrote it from scratch in a few years and is making bank right now.

PS: The TV show Severance is the wet dream of many fintech managers.

↑