Mercury: Ultra-fast language models based on diffusion

(arxiv.org)

566 points PaulHoule | 3 comments | 07 Jul 25 12:31 UTC | HN request time: 0.001s | source

Show context

mike_hearn ◴[07 Jul 25 13:46 UTC] No.44490340[source]▶

A good chance to bring up something I've been flagging to colleagues for a while now: with LLM agents we are very quickly going to become even more CPU bottlenecked on testing performance than today, and every team I know of today was bottlenecked on CI speed even before LLMs. There's no point having an agent that can write code 100x faster than a human if every change takes an hour to test.

Maybe I've just got unlucky in the past, but in most projects I worked on a lot of developer time was wasted on waiting for PRs to go green. Many runs end up bottlenecked on I/O or availability of workers, and so changes can sit in queues for hours, or they flake out and everything has to start again.

As they get better coding agents are going to be assigned simple tickets that they turn into green PRs, with the model reacting to test failures and fixing them as they go. This will make the CI bottleneck even worse.

It feels like there's a lot of low hanging fruit in most project's testing setups, but for some reason I've seen nearly no progress here for years. It feels like we kinda collectively got used to the idea that CI services are slow and expensive, then stopped trying to improve things. If anything CI got a lot slower over time as people tried to make builds fully hermetic (so no inter-run caching), and move them from on-prem dedicated hardware to expensive cloud VMs with slow IO, which haven't got much faster over time.

Mercury is crazy fast and in a few quick tests I did, created good and correct code. How will we make test execution keep up with it?

replies(28): >>44490408 #>>44490637 #>>44490652 #>>44490785 #>>44491195 #>>44491421 #>>44491483 #>>44491551 #>>44491898 #>>44492096 #>>44492183 #>>44492230 #>>44492386 #>>44492525 #>>44493236 #>>44493262 #>>44493392 #>>44493568 #>>44493577 #>>44495068 #>>44495946 #>>44496321 #>>44496534 #>>44497037 #>>44497707 #>>44498689 #>>44502041 #>>44504650 #

TechDebtDevin ◴[07 Jul 25 13:55 UTC] No.44490408[source]▶

>>44490340 #

LLM making a quick edit, <100 lines... Sure. Asking an LLM to rubber-duck your code, sure. But integrating an LLM into your CI is going to end up costing you 100s of hours productivity on any large project. That or spend half the time you should be spending learning to write your own code, dialing down context sizing and prompt accuracy.

I really really don't understand the hubris around llm tooling, and don't see it catching on outside of personal projects and small web apps. These things don't handle complex systems well at all, you would have to put a gun in my mouth to let one of these things work on an important repo of mine without any supervision... And if I'm supervising the LLM I might as well do it myself, because I'm going to end up redoing 50% of its work anyways..

replies(4): >>44490540 #>>44490612 #>>44490651 #>>44491513 #

kraftman ◴[07 Jul 25 14:20 UTC] No.44490651[source]▶

>>44490408 #

I keep seeing this argument over and over again, and I have to wonder, at what point do you accept that maybe LLM's are useful? Like how many people need to say that they find it makes them more productive before you'll shift your perspective?

replies(5): >>44490744 #>>44490784 #>>44490992 #>>44492429 #>>44493343 #

dragonwriter ◴[07 Jul 25 14:51 UTC] No.44490992[source]▶

>>44490651 #

> I keep seeing this argument over and over again, and I have to wonder, at what point do you accept that maybe LLM's are useful?

The post you are responding to literally acknowledges that LLMs are useful in certain roles in coding in the first sentence.

> Like how many people need to say that they find it makes them more productive before you'll shift your perspective?

Argumentum ad populum is not a good way of establishing fact claims beyond the fact of a belief being popular.

replies(1): >>44493539 #

1. kraftman ◴[07 Jul 25 18:55 UTC] No.44493539{3}[source]▶

>>44490992 #

...and my comment clearly isnt talking about that, but at the suggestion that its useless to write code with an LLM because you'll end up rewriting 50% of it.

If everyone has an opinion different to mine, I dont instantly change my opinion, but I do try and investigate the source of the difference, to find out what I'm missing or what they are missing.

The polarisation between people that find LLMs useful or not is very similar to the polarisation between people that find automated testing useful or not, and I have a suspicion they have the same underlying cause.

replies(1): >>44494103 #

2. nwienert ◴[07 Jul 25 20:02 UTC] No.44494103[source]▶

>>44493539 (TP) #

You seem to think everyone shares your view, around me I see a lot of people acknowledging they are useful to a degree, but also clearly finding limits in a wide array of cases, including that they really struggle with logical code, architectural decisions, re-using the right code patterns, larger scale changes that aren’t copy paste, etc.

So far what I see is that if I provide lots of context and clear instructions to a mostly non-logical area of code, I can speed myself up about 20-40%, but only works in about 30-50% of the problems I solve day to day at a day job.

So basically - it’s about a rough 20% improvement in my productivity - because I spend most of my time of the difficult things it can’t do anyway.

Meanwhile these companies are raising billion dollar seed rounds and telling us that all programming will be done by AI by next year.

replies(1): >>44497459 #

3. girvo ◴[08 Jul 25 05:52 UTC] No.44497459[source]▶

>>44494103 #

> Meanwhile these companies are raising billion dollar seed rounds and telling us that all programming will be done by AI by next year.

Which is the same thing they said last year, and hasn't panned out. But surely this time it'll be right...

↑