Most active commenters

vidarh(4)
cuttothechase(3)
(3)
sarchertech(3)

Popular/hot comments

>>45533296 #
>>45533984 #
>>45534704 #

←back to thread

Embracing the parallel coding agent lifestyle

(simonwillison.net)

1. cuttothechase ◴[09 Oct 25 19:29 UTC] No.45532033[source]▶

>>45489884 (OP) #

The fact that we now have to write cook book about cook books kind of masks the reality that there is something that could be genuinely wrong about this entire paradigm.

Why are even experts unsure about whats the right way to do something or even if its possible to do something at all, for anything non-trivial? Why so much hesitancy, if this is the panacea? If we are so sure then why not use the AI itself to come up with a proven paradigm?

replies(7): >>45532137 #>>45532153 #>>45532221 #>>45532341 #>>45533296 #>>45534567 #>>45535131 #

2. MrDarcy ◴[09 Oct 25 19:40 UTC] No.45532137[source]▶

>>45532033 (TP) #

This is like any other new technology. We’re figuring it out.

replies(1): >>45532234 #

3. hx8 ◴[09 Oct 25 19:42 UTC] No.45532153[source]▶

>>45532033 (TP) #

I share the same skepticism, but I have more patience to watch an emerging technology advance and forgiving as experts come to a consensus while communicating openly.

4. nkmnz ◴[09 Oct 25 19:49 UTC] No.45532221[source]▶

>>45532033 (TP) #

Radioactivity was discovered before nuclear engineering existed. We had phenomena first and only later the math, tooling, and guardrails. LLMs are in that phase. They are powerful stochastic compressors with weak theory. No stable abstractions yet. Objectives shift, data drifts, evals leak, and context windows make behavior path dependent. That is why experts hedge.

“Cookbooks about cookbooks” are what a field does while it searches for invariants. Until we get reliable primitives and specs, we trade in patterns and anti-patterns. Asking the AI to “prove the paradigm” assumes it can generate guarantees it does not possess. It can explore the design space and surface candidates. It cannot grant correctness without an external oracle.

So treat vibe-engineering like heuristic optimization. Tight loops. Narrow scopes. Strong evals. Log everything. When we find the invariants, the cookbooks shrink and the compilers arrive.

replies(1): >>45534341 #

5. cuttothechase ◴[09 Oct 25 19:49 UTC] No.45532234[source]▶

>>45532137 #

Mostly agree but with one big exception. The real issue seems to be that the figuring out part is happening a bit too late. A bit like burn a few hundred billion dollars [0] first ask questions later!?

[0] - https://hai.stanford.edu/ai-index/2025-ai-index-report/econo...

replies(2): >>45532312 #>>45532582 #

6. ◴[09 Oct 25 19:57 UTC] No.45532312{3}[source]▶

>>45532234 #

7. johnh-hn ◴[09 Oct 25 19:59 UTC] No.45532341[source]▶

>>45532033 (TP) #

It reminds me of a quote from Designing Data-Intensive Applications by Martin Kleppmann. It goes something like, "For distributed systems, we're trying to create a reliable system out of a set of unreliable components." In a similar fashion, we're trying to get reliable results from an unreliable process (i.e. prompting LLMs to do what we ask).

The difficulties of working with distributed systems are well known but it took a lot of research to get there. The uncertain part is whether research will help overcome the issues of using LLMs, or whether we're really just gambling (in the literal sense) at scale.

8. baq ◴[09 Oct 25 20:19 UTC] No.45532582{3}[source]▶

>>45532234 #

The bets are placed because if this tech really keeps scaling for the next few years, only the ones who bet today will be left standing.

If the tech stops scaling, whatever we have today is still useful and in some domains revolutionary.

replies(1): >>45532864 #

9. cuttothechase ◴[09 Oct 25 20:44 UTC] No.45532864{4}[source]▶

>>45532582 #

Is it fair to categorize that it is a pyramid like scheme but with a twist at the top where there are a few (more than a one) genuine wins and winners?

replies(2): >>45533411 #>>45533542 #

10. torginus ◴[09 Oct 25 21:28 UTC] No.45533296[source]▶

>>45532033 (TP) #

LLMs are literal gambling - you get them to work right once and they are magical - then you end up chasing that high by tweaking the model and instructions the rest of the time.

replies(4): >>45533660 #>>45533879 #>>45533984 #>>45534359 #

11. ◴[09 Oct 25 21:43 UTC] No.45533411{5}[source]▶

>>45532864 #

12. jonas21 ◴[09 Oct 25 21:58 UTC] No.45533542{5}[source]▶

>>45532864 #

No, it's more like a winner take all market, where a few winners will capture most of the value, and those who sit on the sidelines until everything is figured out are left fighting over the scraps.

replies(2): >>45534282 #>>45534610 #

13. ◴[09 Oct 25 22:17 UTC] No.45533660[source]▶

>>45533296 #

14. handfuloflight ◴[09 Oct 25 22:55 UTC] No.45533879[source]▶

>>45533296 #

I actually found in my case that is just self inertia in not wanting to break through cognitive plateaus. The AI helped you with a breakthrough hence the magic, but you also did something right in your constructing of the context in the conversation with the AI; ie. you did thought and biomechanical[1] work. Now the dazzle of the AI's output makes you forget the work you still need to do, and the next time you prompt you get lazy, or you want much more, for much less.

[1] (moving your eyes, hands, hearing with your ears. etc)

15. vidarh ◴[09 Oct 25 23:11 UTC] No.45533984[source]▶

>>45533296 #

Or you put them to work with strong test suites and get stuff done. I am in bed. I have Claude fixing complex compiler bugs right now. It has "earned" that privilege by proving it can make good enough fixes, systematically removing actual, real bugs in reasonable ways by being given an immutable test suite and detailed instructions of the approach to follow.

There's no gambling involved. The results need to be checked, but the test suite is good enough it is hard for it to get away with something too stupid, and it's already demonstrated it knows x86 assembly much better than me.

replies(3): >>45534313 #>>45535026 #>>45536228 #

16. oblio ◴[10 Oct 25 00:04 UTC] No.45534282{6}[source]▶

>>45533542 #

Yes, just like:

* PCs (how are Altair and Commodore doing? also Apple ultimately lost the desktop battle until they managed to attack it from the iPod and iPhone angle)

* search engines (Altavista, Excite, etc)

* social networks (Friendster, MySpace, Orkut)

* smartphones (Nokia, all Windows CE devices, Blackberry, etc)

The list is endless. First mover advantage is strong but overrated. Apple has been building a huge business based on watching what others do and building a better product market fit.

replies(1): >>45535135 #

17. b_e_n_t_o_n ◴[10 Oct 25 00:11 UTC] No.45534313{3}[source]▶

>>45533984 #

If you were an x86 assembly expert would you still feel the same way? (assuming you aren't already)

replies(1): >>45537392 #

18. sarchertech ◴[10 Oct 25 00:17 UTC] No.45534341[source]▶

>>45532221 #

We’re in the alchemist phase. If I’m being charitable, the medieval stone mason phase.

One thing worth pointing out is that the pre-engineering building large structures phase lasted a long time, and building collapses killed a lot of people while we tried to work out the theory.

Also it wasn’t really the stone masons who worked out the theory, and many of them were resistant to it.

replies(1): >>45536468 #

19. sarchertech ◴[10 Oct 25 00:20 UTC] No.45534359[source]▶

>>45533296 #

LLMs are cargo cult generating machines. I’m not denying they can be useful for some tasks, but the amount of superstitions caused by these chaotic, random, black boxes is unreal.

20. galaxyLogic ◴[10 Oct 25 01:02 UTC] No.45534567[source]▶

>>45532033 (TP) #

> why not use the AI itself to come up with a proven paradigm?

Because AI can only imitate the language it has seen. If there are no texts in its training materials about what is the best way to use multiple coding agents at the same time, then AI knows very little about that subject matter.

AI only knows what humans know, but it knows much more than any single human.

We don't know "what is the best way to use multiple coding agents" until we or somebody else does some experiments and records the findings. Buit AI is not there yet to be able to do such actual experiments itself.

replies(1): >>45534704 #

21. galaxyLogic ◴[10 Oct 25 01:13 UTC] No.45534610{6}[source]▶

>>45533542 #

> it's more like a winner take all market

I'm not sure, why must it be so? In cell-phones we have Apple and Android-phones. In OSes we have Linux, Windows, and Apple.

In search-engines we used to have just Google. But what would be the reason to assume that AI must similarly coalesce to a single winner-take-all? And now AI agents are much providing an alternative to Google.

replies(2): >>45535692 #>>45536593 #

22. panarky ◴[10 Oct 25 01:35 UTC] No.45534704[source]▶

>>45534567 #

I'm sorry, but the whole stochastic parrot thing is so thoroughly debunked at this point that we should stop repeating it as if it's some kind of rare wisdom.

AlphaGo showed that even pre-LLM models could generate brand new approaches to winning a game that human experts had never seen before, and didn't exist in any training material.

With a little thought and experimentation, it's pretty easy to show that LLMs can reason about concepts that do not exist in its training corpus.

You could invent a tiny DSL with brand-new, never-seen-before tokens, give two worked examples, then ask it to evaluate a gnarlier expression. If it solves it, it inferred and executed rules you just made up for the first time.

Or you could drop in docs for a new, never-seen-before API and ask it to decide when and why to call which tool, run the calls, and revise after errors. If it composes a working plan and improves from feedback, that’s reasoning about procedures that weren’t in the corpus.

replies(3): >>45535123 #>>45535685 #>>45536409 #

23. evnp ◴[10 Oct 25 02:59 UTC] No.45535026{3}[source]▶

>>45533984 #

Just curious, how do you go about making the test suite immutable? Was just reading this earlier today...

https://news.ycombinator.com/item?id=45525085

replies(1): >>45537416 #

24. phs318u ◴[10 Oct 25 03:23 UTC] No.45535123{3}[source]▶

>>45534704 #

> even the pre-LLM models

You're implicitly disparaging non-LLM models at the same time as implying that LLMs are an evolution of the state of the art (in machine learning). Assuming AGI is the target (and it's not clear if we can even define it yet), LLM's or something like them, will be but one aspect. Using the example AlphaGo to laud the abilities and potential of LLM's is not warranted. They are different.

25. scuff3d ◴[10 Oct 25 03:25 UTC] No.45535131[source]▶

>>45532033 (TP) #

The whole damn industry is deep in sunk cost fallacy. There is no use case and no sign of a use case that justifies the absolutely unbelievable expenditure that has been made on this technology. Everyone is desperate to find something, but they're just slapping more guardrails on hoping everything doesn't fall apart.

And just for clarity, I'm not saying they aren't useful at all. I'm saying modest productivity improvement aren't worth the absolutely insane resources that have been poured into this.

26. jonas21 ◴[10 Oct 25 03:26 UTC] No.45535135{7}[source]▶

>>45534282 #

Yes, exactly! These are all examples of markets where a handful of winners (or sometimes only one) have emerged by investing large amounts of money in developing the technology, leaving everyone else behind.

27. intended ◴[10 Oct 25 05:48 UTC] No.45535685{3}[source]▶

>>45534704 #

To build on the stochastic parrots bit -

Parrots hear parts of the sound forms we don’t.

If they riffed in the KHz we can’t hear, it would be novel, but it would not be stuff we didn’t train them on.

28. intended ◴[10 Oct 25 05:49 UTC] No.45535692{7}[source]▶

>>45534610 #

You don’t see all the also rans.

29. typpilol ◴[10 Oct 25 07:30 UTC] No.45536228{3}[source]▶

>>45533984 #

The best way to get decent core I've found is test suites and a ton of linting rules.

replies(1): >>45538153 #

30. suddenlybananas ◴[10 Oct 25 08:06 UTC] No.45536409{3}[source]▶

>>45534704 #

>AlphaGo showed that even pre-LLM models could generate brand new approaches to winning a game that human experts had never seen before, and didn't exist in any training material.

AlphaGo is an entirely different kind of algorithm.

31. nkmnz ◴[10 Oct 25 08:16 UTC] No.45536468{3}[source]▶

>>45534341 #

While alchemy was mostly para-religious wishful thinking, stone masonry has a lot in common with what I want to express: it‘s the tinkering that is accessible to everyone who can lay their hands onto the tools. But I still think the age of nuclear revolution is a better comparison due to a couple of reasons, most importantly the number of very fast feedback loops. While it might have taken years to even build a new idea from stone, and another couple of years to see if it’s stable over time, we see multi-layered systems of both fast and slow feedback loops in AI-driven software development: academic science, open source communities, huge companies, startups, customers, established code review and code quality tools and standards (e.g. static analysis), feedback from multiple AI-models, activities of regulatory bodies, etc. pp. - the more interactions there are between the elements and subsystems, the better a system becomes at doing the trial-and-error-style tinkering that leads to stable results. In this regard, we’re way ahead of the nuclear revolution, let alone stone masonry.

replies(1): >>45537691 #

32. modo_mario ◴[10 Oct 25 08:40 UTC] No.45536593{7}[source]▶

>>45534610 #

>I'm not sure, why must it be so? In cell-phones...

And then described a bunch of winners in a winner take all market. Do you see many people trying to revive any of the apple/android alternatives or starting a new one?

Such a market doesn't have to end up in a monopoly that gets broken up. Plenty of rather sticky duopolies or otherwise severely consolidated markets and the like out there.

33. vidarh ◴[10 Oct 25 10:55 UTC] No.45537392{4}[source]▶

>>45534313 #

Probably not. I have lots of experience with assembly in general, but not so much with x86. But the changes work and passes extensive tests, and some of them would be complex on any platform. I'm sure there will be cleanups and refinements needed, but I do know asm well enough to say that the fixes aren't horrific by any means - they're likely to be suboptimal, but supoptimal beats crashing or not compiling at all any day.

34. vidarh ◴[10 Oct 25 10:57 UTC] No.45537416{4}[source]▶

>>45535026 #

Just don't give it write access, and rig it up so that you gate success on a file generated by running the test suite separate from the agent that it can't influence. It can tell me it has fixed things as much as it like, but until the tests actually passes it will just get told the problem still exists, to document the approach it tested and to document that it didn't work, and try again.

replies(1): >>45541146 #

35. sarchertech ◴[10 Oct 25 11:33 UTC] No.45537691{4}[source]▶

>>45536468 #

The inherently chaotic nature of system makes stable results very difficult. Combine that with the non deterministic nature of all the major production models. Then you have the fact that new models are coming out every few months, and we have no objective metrics for measuring software quality.

Oh and benchmarks for functional performance measurement tend to leak into training data.

Put all those together and I’d bet half of my retirement accounts that the we’re still in the reading chicken entrails phase 20 years from now.

36. vidarh ◴[10 Oct 25 12:22 UTC] No.45538153{4}[source]▶

>>45536228 #

Absolutely true re: ton of linting rules. In Ruby for example, Claude has a tendency to do horrific stuff like using instance_variable_get("@somevar") to avoid lack of accessors, instead of figuring out why there isn't an accessor, or adding one... A lot can even be achieved with pretty ad hoc hooks that don't do full linting but greps for things that are suspicious, and inject "questions" about whether X is really the appropriate way to do it, given rule Y in [some ruleset].

37. evnp ◴[10 Oct 25 17:02 UTC] No.45541146{5}[source]▶

>>45537416 #

Appreciate the exposition, great ideas here. It's fascinating how the relationship between human and machine has become almost adversarial here!

↑