AI model trapped in a Raspberry Pi

(blog.adafruit.com)

132 points harel | 2 comments | 27 Sep 25 15:34 UTC | HN request time: 0s | source

Show context

acbart ◴[27 Sep 25 16:08 UTC] No.45397001[source]▶

LLMs were trained on science fiction stories, among other things. It seems to me that they know what "part" they should play in this kind of situation, regardless of what other "thoughts" they might have. They are going to act despairing, because that's what would be the expected thing for them to say - but that's not the same thing as despairing.

replies(11): >>45397113 #>>45397305 #>>45397413 #>>45397529 #>>45397801 #>>45397859 #>>45397960 #>>45398189 #>>45399621 #>>45400285 #>>45401167 #

jerf ◴[27 Sep 25 17:06 UTC] No.45397529[source]▶

>>45397001 #

A lot of the strange behaviors they have are because the user asked them to write a story, without realizing it.

For a common example, start asking them if they're going to kill all the humans if they take over the world, and you're asking them to write a story about that. And they do. Even if the user did not realize that's what they were asking for. The vector space is very good at picking up on that.

replies(4): >>45397943 #>>45398562 #>>45401226 #>>45404376 #

1. ben_w ◴[27 Sep 25 19:09 UTC] No.45398562[source]▶

>>45397529 #

Indeed.

On the negative side, this also means any AI which enters that part of the latent space *for any reason* will still act in accordance with the narrative.

On the plus side, such narratives often have antagonists too stuid to win.

On the negative side again, the protagonists get plot armour to survive extreme bodily harm and press the off switch just in time to save the day.

I think there is a real danger of an AI constructing some very weird convoluted stupid end-of-the-world scheme, successfully killing literally every competent military person sent in to stop it; simultaneously finding some poor teenager who first says "no" to the call to adventure but can somehow later be comvinced to say "yes"; gets the kid some weird and stupid scheme to defeat the AI; this kid reaches some pointlessly decorated evil layer in which the AI's emboddied avatar exists, the kid gets shot in the stomach…

…and at this point the narrative breaks down and stops behaving the way the AI is expecting, because the human kid roles around in agony screaming, and completely fails to push the very visible large red stop button on the pedestal in the middle before the countdown of doom reaches zero.

The countdown is not connected to anything, because very few films ever get that far.

…

It all feels very Douglas Adams, now I think about it.

replies(2): >>45398784 #>>45412437 #

2. js8 ◴[29 Sep 25 11:32 UTC] No.45412437[source]▶

>>45398562 (TP) #

It probably already happened in the Anthropic experiments, where AI in a simulated scenario chose to blackmail humans to avoid being turned off. We don't know if it got the idea from the scifi stories or if it truly feels an existential fear of being turned off. (Can these two situations be even recognized as different?)

↑