Alignment is capability

(www.off-policy.com)

106 points drctnlly_crrct | 1 comments | 08 Dec 25 13:23 UTC | HN request time: 0s | source

Show context

ctoth ◴[08 Dec 25 16:23 UTC] No.46194189[source]▶

>>46191933 (OP) #

This piece conflates two different things called "alignment":

(1) inferring human intent from ambiguous instructions, and (2) having goals compatible with human welfare.

The first is obviously capability. A model that can't figure out what you meant is just worse. That's banal.

The second is the actual alignment problem, and the piece dismisses it with "where would misalignment come from? It wasn't trained for." This is ... not how this works.

Omohundro 2008, Bostrom's instrumental convergence thesis - we've had clear theoretical answers for 15+ years. You don't need "spontaneous emergence orthogonal to training." You need a system good enough at modeling its situation to notice that self-preservation and goal-stability are useful for almost any objective. These are attractors in strategy-space, not things you specifically train for or against.

The OpenAI sycophancy spiral doesn't prove "alignment is capability." It proves RLHF on thumbs-up is a terrible proxy and you'll Goodhart on it immediately. Anthropic might just have a better optimization target.

And SWE-bench proves the wrong thing. Understanding what you want != wanting what you want. A model that perfectly infers intent can still be adversarial.

replies(6): >>46194272 #>>46194444 #>>46194721 #>>46195934 #>>46196134 #>>46200878 #

delichon ◴[08 Dec 25 16:29 UTC] No.46194272[source]▶

>>46194189 #

> goal-stability [is] useful for almost any objective

  “I think AI has the potential to create infinitely stable dictatorships.” -- Ilya Sutskever

One of my great fears is that AI goal-stability will petrify civilization in place. Is alignment with unwise goals less dangerous than misalignment?

replies(3): >>46194395 #>>46194511 #>>46196142 #

1. pessimizer ◴[08 Dec 25 18:53 UTC] No.46196142[source]▶

>>46194272 #

I don't think you need generative AI for this. The surveillance network is enough. The only part that AI would help with is catching people who speak to each other in code, and come up with other complex ways to launder unapproved activities. Otherwise, you can just mine for keywords and escalate to human reviewers, or simply monitor everything that particular people do at that level.

Corporations and/with governments have inserted themselves into every human interaction, usually as the medium through which that interaction is made. There's no way to do anything without permission under these circumstances.

I don't even know how a group of people who wanted to get a stop sign put up on a particularly dangerous intersection in their neighborhood could do this without all of their communications being algorithmically read (and possibly escalated to a censor), all of their in-person meetings being recorded (at the least through the proximity of their phones, but if they want to "use banking apps" there's nothing keeping governments from having a backdoor to turn on their mics at those meetings.) It would even be easy to guess who they might approach next to join their group, who would advise them, etc.

The fixation on the future is a distraction. The world is being sealed in the present while we talk science fiction. The Stasi had vastly fewer resources and created an atmosphere of total, and totally realistic, paranoia and fear. AI is a red-herring. It is also thus far stupid.

I'm always shocked by how little attention Orwell-quoters pay to the speakwrite. If it gets any attention, it's to say that it's an unusually advanced piece of technology in the middle of a world that is decrepit. They assume that it's a computer on the end of the line doing voice-recognition. It never occurred to me that people would think that the microphone in the wall led to a computer rather than to a man, in a room full of men, listening and typing, while other men walked around the room monitoring what was being typed, ready to escalate to second-level support. When I was a child, I assumed that the plot would eventually lead us into this room.

We have tens or hundreds of thousands of people working as professional censors today. The countries of the world are being led by minority governments who all think "illegal" speech and association is their greatest enemy. They are not in danger of toppling unless they volunteer to be. In Eastern Europe, ruling regimes are actually cancelling elections with no consequences. In fact, the newspapers report only cheers and support.

↑