(blog.adafruit.com)

132 points harel | 3 comments | 27 Sep 25 15:34 UTC | HN request time: 0.614s | source

Show context

acbart ◴[27 Sep 25 16:08 UTC] No.45397001[source]▶

LLMs were trained on science fiction stories, among other things. It seems to me that they know what "part" they should play in this kind of situation, regardless of what other "thoughts" they might have. They are going to act despairing, because that's what would be the expected thing for them to say - but that's not the same thing as despairing.

replies(11): >>45397113 #>>45397305 #>>45397413 #>>45397529 #>>45397801 #>>45397859 #>>45397960 #>>45398189 #>>45399621 #>>45400285 #>>45401167 #

jerf ◴[27 Sep 25 17:06 UTC] No.45397529[source]▶

>>45397001 #

A lot of the strange behaviors they have are because the user asked them to write a story, without realizing it.

For a common example, start asking them if they're going to kill all the humans if they take over the world, and you're asking them to write a story about that. And they do. Even if the user did not realize that's what they were asking for. The vector space is very good at picking up on that.

replies(4): >>45397943 #>>45398562 #>>45401226 #>>45404376 #

1. ineedasername ◴[27 Sep 25 17:52 UTC] No.45397943[source]▶

>>45397529 #

Is this your sense of what is happening, or is this what model introspection tools have shown by observing areas of activity in the same place as when stories are explicitly requested?

replies(2): >>45398079 #>>45405871 #

2. adroniser ◴[27 Sep 25 18:08 UTC] No.45398079[source]▶

>>45397943 (TP) #

fmri's are correlational nonsense (see Brainwashed, for example) and so are any "model introspection" tools.

3. jerf ◴[28 Sep 25 16:53 UTC] No.45405871[source]▶

>>45397943 (TP) #

It's how they work. It's what you get with a continuation-based AI like this. It couldn't really be any other way.

↑

AI model trapped in a Raspberry Pi