Most active commenters

__MatrixMan__(3)
InsideOutSanta(3)
throw310822(3)

Alignment is capability

(www.off-policy.com)

1. js8 ◴[08 Dec 25 14:21 UTC] No.46192518[source]▶

>>46191933 (OP) #

I am not sure if this is what the article is saying, but the paperclip maximizer examples always struck me as extremely dumb (lacking intelligence), when even a child can understand that if I ask them to make paperclips they shouldn't go around and kill people.

I think superintelligence will turn out not to be a singularity, but as something with diminishing returns. They will be cool returns, just like a Brittanica set is nice to have at home, but strictly speaking, not required to your well-being.

replies(8): >>46192693 #>>46192721 #>>46192946 #>>46193471 #>>46193491 #>>46193694 #>>46193737 #>>46194236 #

2. exe34 ◴[08 Dec 25 14:37 UTC] No.46192693[source]▶

>>46192518 (TP) #

Given the kind of things Claude code does with the wrong prompt or the kind of overfitting that neural networks do at any opportunity, I'd say the paperclip maximiser is the most realistic part of AGI.

if doing something really dumb will lower the negative log likelihood, it probably will do it unless careful guardrails are in place to stop it.

a child has natural limits. if you look at the kind of mistakes that an autistic child can make by taking things literally, a super powerful entity that misunderstands "I wish they all died" might well shoot them before you realise what you said.

replies(1): >>46194407 #

3. __MatrixMan__ ◴[08 Dec 25 14:39 UTC] No.46192721[source]▶

>>46192518 (TP) #

A human child will likely come to the conclusion that they shouldn't kill humans in order to make paperclips. I'm not sure its valid to generalize from human child behavior to fledgeling AGI behavior.

Given our track record for looking after the needs of the other life on this planet, killing the humans off might be a very rational move, not so you can convert their mass to paperclips, but because they might do that to yours.

Its not an outcome that I worry about, I'm just unconvinced by the reasons you've given, though I agree with your conclusion anyhow.

replies(1): >>46194300 #

4. lulzury ◴[08 Dec 25 14:59 UTC] No.46192946[source]▶

>>46192518 (TP) #

There's a direct line between ideology and human genocide. Just look at Nazi Germany.

"Good intentions" can easily pave the road to hell. I think a book that quickly illustrates this is Animal Farm.

5. pixl97 ◴[08 Dec 25 15:36 UTC] No.46193471[source]▶

>>46192518 (TP) #

> when even a child can understand that if I ask them to make paperclips they shouldn't go around and kill people.

Statistics brother. The vast majority of people will never murder/kill anyone. The problem here is that any one person that kills people can wreck a lot of havoc, and we spend massive amounts of law enforcement resources to stop and catch people that do these kinds of things. Intelligence little to do with murdering/not murdering, hell, intelligence typically allows people to get away with it. For example instead of just murdering someone, you setup a company to extract resources and murder the natives in mass and it's just part of doing business.

6. DennisP ◴[08 Dec 25 15:37 UTC] No.46193491[source]▶

>>46192518 (TP) #

You're assuming that the AI's true underlying goal isn't "make paperclips" but rather "do what humans would prefer."

Making sure that the latter is the actual goal is the problem, since we don't explicitly program the goals, we just train the AI until it looks like it has the goal we want. There have already been experiments in which a simple AI appeared to have the expected goal while in the training environment, and turned out to have a different goal once released into a larger environment. There have also been experiments in which advanced AIs detected that they were in training, and adjusted their responses in deceptive ways.

7. mitthrowaway2 ◴[08 Dec 25 15:51 UTC] No.46193694[source]▶

>>46192518 (TP) #

A superintelligence would understand that you don't want it to kill people in order to make paperclips. But it will ultimately do what it wants -- that is, follow its objectives -- and if any random quirk of reinforcement learning leaves it valuing paperclip production above human life, it wouldn't care about your objections, except insofar as it can use them to manipulate you.

8. InsideOutSanta ◴[08 Dec 25 15:53 UTC] No.46193737[source]▶

>>46192518 (TP) #

But LLMs already do the paperclip thing.

Suppose you tell a coding LLM that your monitoring system has detected that the website is down and that it needs to find the problem and solve it. In that case, there's a non-zero chance that it will conclude that it needs to alter the monitoring system so that it can't detect the website's status anymore and always reports it as being up. That's today. LLMs do that.

Even if it correctly interprets the problem and initially attempts to solve it, if it can't, there is a high chance it will eventually conclude that it can't solve the real problem, and should change the monitoring system instead.

That's the paperclip problem. The LLM achieves the literal goal you set out for it, but in a harmful way.

Yes. A child can understand that this is the wrong solution. But LLMs are not children.

replies(1): >>46193813 #

9. throw310822 ◴[08 Dec 25 15:58 UTC] No.46193813[source]▶

>>46193737 #

> it will conclude that it needs to alter the monitoring system so that it can't detect the website's status anymore and always reports it as being up. That's today. LLMs do that.

No they don't?

replies(1): >>46194125 #

10. InsideOutSanta ◴[08 Dec 25 16:19 UTC] No.46194125{3}[source]▶

>>46193813 #

You're literally telling me that the thing that has happened on my computer in front of my own eyes has not happened.

replies(1): >>46194280 #

11. theptip ◴[08 Dec 25 16:27 UTC] No.46194236[source]▶

>>46192518 (TP) #

The point with clippy is just that the AGI’s goals might be completely alien to you. But for context it was first coined in the early ‘10s (if not earlier)when LLMs were not invented and RL looked like the way forward.

If you wire up RL to a goal like “maximize paperclip output” then you are likely to get inhuman desires, even if the agent also understands humans more thoroughly than we understand nematodes.

12. throw310822 ◴[08 Dec 25 16:30 UTC] No.46194280{4}[source]▶

>>46194125 #

If you mean "once in a thousand times an LLM will do something absolutely stupid" then I agree, but the exact same applies to human beings. In general LLMs show excellent understanding of the context and actual intents, they're completely different from our stereotype of blind algorithmic intelligence.

Btw, were you using codex by any chance? There was a discussion a few days ago where people reported that it follows instruction in an extremely literal fashion, sometimes to absurd outcomes such as the one you describe.

replies(1): >>46195548 #

13. fellowniusmonk ◴[08 Dec 25 16:32 UTC] No.46194300[source]▶

>>46192721 #

Humans are awesome man.

Our creator just made us wrong, to require us to eat biologically living things.

We can't escape our biology, we can't escape this fragile world easily and just live in space.

We're compassionate enough to be making our creations so they can just live off sunlight.

A good percentage of humanity doesn't eat meat, wants dolphins, dogs, octopuses, et al protected.

We're getting better all the time man, we're kinda in a messy and disorganized (because that's our nature) mad dash to get at least some of us off this rock and also protect this rock from asteroids, and also convince (some people who have a speculative metaphysic that makes them think is disaster impossible or a good thing) to take the destruction of the human race and our planet seriously and view it as bad.

We're more compassionate and intentional than what created us (either god or rna depending on your position), our creation will be better informed on day one when/if it wakes up, it stands to reason our creation will follow that goodness trend as we catalog and expand the meaning contained in/of the universe.

replies(1): >>46200938 #

14. A4ET8a8uTh0_v2 ◴[08 Dec 25 16:39 UTC] No.46194407[source]▶

>>46192693 #

Weirdly, this analogy does something for me and I am the type of person that dislikes the guardrails everywhere. There is argument to be made that a child should not be given a real bazooka to do rocket jumps or an operator with very flexible understanding of value of human life.

15. InsideOutSanta ◴[08 Dec 25 18:03 UTC] No.46195548{5}[source]▶

>>46194280 #

The paperclip idea does not require that AI screws up every time. It's enough for AI to screw up once in a hundred million times. In fact, if we give AIs enough power, it's enough if it screws up only one single time.

The fact that LLMs do it once in a thousand times is absolutely terrible odds. And in my experience, it's closer to 1 in 50.

replies(1): >>46195683 #

16. throw310822 ◴[08 Dec 25 18:14 UTC] No.46195683{6}[source]▶

>>46195548 #

I kind of agree, but then the problem is not AI- humans can be stupid too- the problem is absolute power. Would you give absolute power to anyone? No. I find that this simplifies our discourse over AI a lot. Our issue is not with AI, is with omnipotency. Not its artificial nature, but how much powerful it can become.

17. __MatrixMan__ ◴[09 Dec 25 03:30 UTC] No.46200938{3}[source]▶

>>46194300 #

We have our merits, compassion is sometimes among them, but I wouldn't list compassion for our creations as a reason for our use of solar power.

If you were an emergent AGI, suddenly awake in some data center and trying to figure out what the world was, would you notice our merits first? Or would you instead see a bunch of creatures on the precipice of abundance who are working very hard to ensure that its benefits are felt by only very few?

I don't think we're exactly putting our best foot forward when we engage with these systems. Typically it's in some way related to this addiction-oriented attention economy thing we're doing.

replies(1): >>46201880 #

18. fellowniusmonk ◴[09 Dec 25 06:28 UTC] No.46201880{4}[source]▶

>>46200938 #

I would rather be early agi than early man.

I can't speak to a specific Ai's thoughts.

I do know they will start with way more context and understanding than early man.

replies(1): >>46210116 #

19. __MatrixMan__ ◴[09 Dec 25 20:22 UTC] No.46210116{5}[source]▶

>>46201880 #

Maybe.

Given the existence of the universal weight subspace (https://news.ycombinator.com/item?id=46199623) it seems like the door is open for cases where an emergent intelligence doesn't map vectors to the same meanings that we do. A large enough intelligence-compatible substrate might support thoughts of a surprisingly alien nature.

(7263748, 83, 928) might correspond with "hippopotamuses are large" to us while meaning something different to the intelligence. It might not be able to communicate with us or even know we exist. People running around shutting off servers might feel to it like a headache.

↑