Alignment is capability

(www.off-policy.com)

106 points drctnlly_crrct | 3 comments | 08 Dec 25 13:23 UTC | HN request time: 0s | source

Show context

js8 ◴[08 Dec 25 14:21 UTC] No.46192518[source]▶

>>46191933 (OP) #

I am not sure if this is what the article is saying, but the paperclip maximizer examples always struck me as extremely dumb (lacking intelligence), when even a child can understand that if I ask them to make paperclips they shouldn't go around and kill people.

I think superintelligence will turn out not to be a singularity, but as something with diminishing returns. They will be cool returns, just like a Brittanica set is nice to have at home, but strictly speaking, not required to your well-being.

replies(8): >>46192693 #>>46192721 #>>46192946 #>>46193471 #>>46193491 #>>46193694 #>>46193737 #>>46194236 #

InsideOutSanta ◴[08 Dec 25 15:53 UTC] No.46193737[source]▶

>>46192518 #

But LLMs already do the paperclip thing.

Suppose you tell a coding LLM that your monitoring system has detected that the website is down and that it needs to find the problem and solve it. In that case, there's a non-zero chance that it will conclude that it needs to alter the monitoring system so that it can't detect the website's status anymore and always reports it as being up. That's today. LLMs do that.

Even if it correctly interprets the problem and initially attempts to solve it, if it can't, there is a high chance it will eventually conclude that it can't solve the real problem, and should change the monitoring system instead.

That's the paperclip problem. The LLM achieves the literal goal you set out for it, but in a harmful way.

Yes. A child can understand that this is the wrong solution. But LLMs are not children.

replies(1): >>46193813 #

throw310822 ◴[08 Dec 25 15:58 UTC] No.46193813[source]▶

>>46193737 #

> it will conclude that it needs to alter the monitoring system so that it can't detect the website's status anymore and always reports it as being up. That's today. LLMs do that.

No they don't?

replies(1): >>46194125 #

InsideOutSanta ◴[08 Dec 25 16:19 UTC] No.46194125[source]▶

>>46193813 #

You're literally telling me that the thing that has happened on my computer in front of my own eyes has not happened.

replies(1): >>46194280 #

1. throw310822 ◴[08 Dec 25 16:30 UTC] No.46194280{3}[source]▶

>>46194125 #

If you mean "once in a thousand times an LLM will do something absolutely stupid" then I agree, but the exact same applies to human beings. In general LLMs show excellent understanding of the context and actual intents, they're completely different from our stereotype of blind algorithmic intelligence.

Btw, were you using codex by any chance? There was a discussion a few days ago where people reported that it follows instruction in an extremely literal fashion, sometimes to absurd outcomes such as the one you describe.

replies(1): >>46195548 #

2. InsideOutSanta ◴[08 Dec 25 18:03 UTC] No.46195548[source]▶

>>46194280 (TP) #

The paperclip idea does not require that AI screws up every time. It's enough for AI to screw up once in a hundred million times. In fact, if we give AIs enough power, it's enough if it screws up only one single time.

The fact that LLMs do it once in a thousand times is absolutely terrible odds. And in my experience, it's closer to 1 in 50.

replies(1): >>46195683 #

3. throw310822 ◴[08 Dec 25 18:14 UTC] No.46195683[source]▶

>>46195548 #

I kind of agree, but then the problem is not AI- humans can be stupid too- the problem is absolute power. Would you give absolute power to anyone? No. I find that this simplifies our discourse over AI a lot. Our issue is not with AI, is with omnipotency. Not its artificial nature, but how much powerful it can become.

↑