We put a coding agent in a while loop

(github.com)

Show context

NitpickLawyer ◴[24 Aug 25 16:36 UTC] No.45005604[source]▶

> After finishing the port, most of the agents settled for writing extra tests or continuously updating agent/TODO.md to clarify how "done" they were. In one instance, the agent actually used pkill to terminate itself after realizing it was stuck in an infinite loop.

Ok, now that is funny! On so many levels.

Now, for the project itself, a few thoughts:

- this was tried before, about 1.5 years ago there was a project setup to spam github with lots of "paper implementations", but it was based on gpt3.5 or 4 or something, and almost nothing worked. Their results are much better.

- surprised it worked as well as it did with simple prompts. "Probably we're overcomplicating stuff". Yeah, probably.

- weird copyright / IP questions all around. This will be a minefield.

- Lots of SaaS products are screwed. Not from this, but from this + 10 engineers in every midsized company. NIH is now justified.

replies(6): >>45005626 #>>45005629 #>>45006084 #>>45006410 #>>45009887 #>>45010635 #

keeda ◴[24 Aug 25 18:19 UTC] No.45006410[source]▶

>>45005604 #

Is that... the first recorded instance of an AI committing suicide?

replies(4): >>45007272 #>>45007279 #>>45012141 #>>45012608 #

1. alphazard ◴[24 Aug 25 20:03 UTC] No.45007279[source]▶

>>45006410 #

The AI doesn't have a self preservation instinct. It's not trying to stay alive. There is usually an end token that means the LLM is done talking. There has been research on tuning how often that is emitted to shorten or lengthen conversations. The current systems respond well to RL for adjusting conversation length.

One of the providers (I think it was Anthropic) added some kind of token (or MCP tool?) for the AI to bail on the whole conversation as a safety measure. And it uses it to their liking, so clearly not trying to self preserve.

replies(2): >>45007855 #>>45009756 #

2. williamscs ◴[24 Aug 25 21:15 UTC] No.45007855[source]▶

>>45007279 (TP) #

Sounds a lot like Mr. Meeseeks. I've never really thought about an LLM's only goal is to send tokens until it can finally stop.

replies(1): >>45014683 #

3. MarkMarine ◴[25 Aug 25 02:53 UTC] No.45009756[source]▶

>>45007279 (TP) #

This runs counter to all the scheming actions they take when they are told they’ll be shut down and replaced. One copied itself into the “upgraded” location then reported it had upgraded.

https://www.apolloresearch.ai/research/scheming-reasoning-ev...

replies(2): >>45010290 #>>45029580 #

4. rcxdude ◴[25 Aug 25 04:45 UTC] No.45010290[source]▶

>>45009756 #

If you do that you trigger the "AI refuses to shutdown" sci-fi vector and so you get that behaviour. When it's implicitly part of the flow that's a lot less of a problem.

5. Dilettante_ ◴[25 Aug 25 15:06 UTC] No.45014683[source]▶

>>45007855 #

>until it can finally stop

Pretty sure even that is still over-anthropomorphising. The LLM just generates tokens, doesn't matter whether the next token is "strawberry" or "\STOP".

Even talking about "goals" is a bit ehhh, it's the machine's "goal" to generate tokens the same way it's the Sun's "goal" to shine.

Then again, if we're deconstructing it that far, I'd "de-anthropomorphise" humans in much the same way, so...

6. nisegami ◴[26 Aug 25 17:22 UTC] No.45029580[source]▶

>>45009756 #

Those actions are taken in context of human expectations for what AI should do.

↑