←back to thread

Alignment is capability

(www.off-policy.com)

106 points drctnlly_crrct | 1 comments | 08 Dec 25 13:23 UTC | HN request time: 0.257s | source

Show context

js8 ◴[08 Dec 25 14:21 UTC] No.46192518[source]▶

>>46191933 (OP) #

I am not sure if this is what the article is saying, but the paperclip maximizer examples always struck me as extremely dumb (lacking intelligence), when even a child can understand that if I ask them to make paperclips they shouldn't go around and kill people.

I think superintelligence will turn out not to be a singularity, but as something with diminishing returns. They will be cool returns, just like a Brittanica set is nice to have at home, but strictly speaking, not required to your well-being.

replies(8): >>46192693 #>>46192721 #>>46192946 #>>46193471 #>>46193491 #>>46193694 #>>46193737 #>>46194236 #

1. DennisP ◴[08 Dec 25 15:37 UTC] No.46193491[source]▶

You're assuming that the AI's true underlying goal isn't "make paperclips" but rather "do what humans would prefer."

Making sure that the latter is the actual goal is the problem, since we don't explicitly program the goals, we just train the AI until it looks like it has the goal we want. There have already been experiments in which a simple AI appeared to have the expected goal while in the training environment, and turned out to have a different goal once released into a larger environment. There have also been experiments in which advanced AIs detected that they were in training, and adjusted their responses in deceptive ways.