Embracing the parallel coding agent lifestyle

(simonwillison.net)

Show context

cuttothechase ◴[09 Oct 25 19:29 UTC] No.45532033[source]▶

The fact that we now have to write cook book about cook books kind of masks the reality that there is something that could be genuinely wrong about this entire paradigm.

Why are even experts unsure about whats the right way to do something or even if its possible to do something at all, for anything non-trivial? Why so much hesitancy, if this is the panacea? If we are so sure then why not use the AI itself to come up with a proven paradigm?

replies(7): >>45532137 #>>45532153 #>>45532221 #>>45532341 #>>45533296 #>>45534567 #>>45535131 #

torginus ◴[09 Oct 25 21:28 UTC] No.45533296[source]▶

>>45532033 #

LLMs are literal gambling - you get them to work right once and they are magical - then you end up chasing that high by tweaking the model and instructions the rest of the time.

replies(4): >>45533660 #>>45533879 #>>45533984 #>>45534359 #

1. vidarh ◴[09 Oct 25 23:11 UTC] No.45533984[source]▶

>>45533296 #

Or you put them to work with strong test suites and get stuff done. I am in bed. I have Claude fixing complex compiler bugs right now. It has "earned" that privilege by proving it can make good enough fixes, systematically removing actual, real bugs in reasonable ways by being given an immutable test suite and detailed instructions of the approach to follow.

There's no gambling involved. The results need to be checked, but the test suite is good enough it is hard for it to get away with something too stupid, and it's already demonstrated it knows x86 assembly much better than me.

replies(3): >>45534313 #>>45535026 #>>45536228 #

2. b_e_n_t_o_n ◴[10 Oct 25 00:11 UTC] No.45534313[source]▶

>>45533984 (TP) #

If you were an x86 assembly expert would you still feel the same way? (assuming you aren't already)

replies(1): >>45537392 #

3. evnp ◴[10 Oct 25 02:59 UTC] No.45535026[source]▶

>>45533984 (TP) #

Just curious, how do you go about making the test suite immutable? Was just reading this earlier today...

https://news.ycombinator.com/item?id=45525085

replies(1): >>45537416 #

4. typpilol ◴[10 Oct 25 07:30 UTC] No.45536228[source]▶

>>45533984 (TP) #

The best way to get decent core I've found is test suites and a ton of linting rules.

replies(1): >>45538153 #

5. vidarh ◴[10 Oct 25 10:55 UTC] No.45537392[source]▶

>>45534313 #

Probably not. I have lots of experience with assembly in general, but not so much with x86. But the changes work and passes extensive tests, and some of them would be complex on any platform. I'm sure there will be cleanups and refinements needed, but I do know asm well enough to say that the fixes aren't horrific by any means - they're likely to be suboptimal, but supoptimal beats crashing or not compiling at all any day.

6. vidarh ◴[10 Oct 25 10:57 UTC] No.45537416[source]▶

>>45535026 #

Just don't give it write access, and rig it up so that you gate success on a file generated by running the test suite separate from the agent that it can't influence. It can tell me it has fixed things as much as it like, but until the tests actually passes it will just get told the problem still exists, to document the approach it tested and to document that it didn't work, and try again.

replies(1): >>45541146 #

7. vidarh ◴[10 Oct 25 12:22 UTC] No.45538153[source]▶

>>45536228 #

Absolutely true re: ton of linting rules. In Ruby for example, Claude has a tendency to do horrific stuff like using instance_variable_get("@somevar") to avoid lack of accessors, instead of figuring out why there isn't an accessor, or adding one... A lot can even be achieved with pretty ad hoc hooks that don't do full linting but greps for things that are suspicious, and inject "questions" about whether X is really the appropriate way to do it, given rule Y in [some ruleset].

8. evnp ◴[10 Oct 25 17:02 UTC] No.45541146{3}[source]▶

>>45537416 #

Appreciate the exposition, great ideas here. It's fascinating how the relationship between human and machine has become almost adversarial here!

↑