We put a coding agent in a while loop

(github.com)

425 points sfarshid | 2 comments | 24 Aug 25 16:18 UTC | HN request time: 0s | source

Show context

NitpickLawyer ◴[24 Aug 25 16:36 UTC] No.45005604[source]▶

> After finishing the port, most of the agents settled for writing extra tests or continuously updating agent/TODO.md to clarify how "done" they were. In one instance, the agent actually used pkill to terminate itself after realizing it was stuck in an infinite loop.

Ok, now that is funny! On so many levels.

Now, for the project itself, a few thoughts:

- this was tried before, about 1.5 years ago there was a project setup to spam github with lots of "paper implementations", but it was based on gpt3.5 or 4 or something, and almost nothing worked. Their results are much better.

- surprised it worked as well as it did with simple prompts. "Probably we're overcomplicating stuff". Yeah, probably.

- weird copyright / IP questions all around. This will be a minefield.

- Lots of SaaS products are screwed. Not from this, but from this + 10 engineers in every midsized company. NIH is now justified.

replies(6): >>45005626 #>>45005629 #>>45006084 #>>45006410 #>>45009887 #>>45010635 #

ghuntley ◴[24 Aug 25 16:38 UTC] No.45005626[source]▶

>>45005604 #

> - weird copyright / IP questions all around. This will be a minefield.

Yeah, we're in weird territory because you can drive an LLM as a Bitcoin mixer over intellectual property. That's the entire point/meaning behind https://ghuntley.com/z80.

You can take something that exists, distill it back to specs, and then you've got your own IP. Throw away the tainted IP, and then just run Ralph over a loop. You are able to clone things (not 100%, but it's better than hiring humans).

replies(4): >>45006439 #>>45006614 #>>45007004 #>>45011432 #

whs ◴[25 Aug 25 08:02 UTC] No.45011432[source]▶

>>45005626 #

I wrote an MCP based on that technique - https://github.com/whs/mcp-chinesewall

Basically to avoid the ambiguity of training LLM from unlicensed code, I use it to generate description of the code to another LLM trained from permissively licensed code. (There aren't any usable public domain models I've found)

I use it in real world and it seems that the codegen model work 10-20% of the time (the description is not detailed enough - which is good for "clean room" but a base model couldn't follow that). All models can review the code, retry and write its own implementation based on the codegen result though.

replies(1): >>45012594 #

1. ghuntley ◴[25 Aug 25 11:16 UTC] No.45012594[source]▶

>>45011432 #

Nice. Any chance you could put in some attributions and credits in your paper? https://orcid.org/0009-0007-3955-9994

replies(1): >>45013407 #

2. whs ◴[25 Aug 25 13:04 UTC] No.45013407[source]▶

>>45012594 (TP) #

I never read your work though (and still haven't since it's paywalled), I just discovered today that we independently discovered the same thing.

↑