(www.off-policy.com)

106 points drctnlly_crrct | 2 comments | 08 Dec 25 13:23 UTC | HN request time: 0s | source

1. throwuxiytayq ◴[08 Dec 25 15:59 UTC] No.46193824[source]▶

>>46191933 (OP) #

The author’s inability to imagine a model that’s superficially useful but dangerously misaligned betrays their lack of awareness of incredibly basic AI safety concepts that are literally decades old.

replies(1): >>46194188 #

2. theptip ◴[08 Dec 25 16:23 UTC] No.46194188[source]▶

>>46193824 (TP) #

Exactly. Building a model that truly understands humans, and their intentions, and generally acts with, if not compassion then professionalism - is the Easy Problem of Alignment.

Starting points:

https://www.lesswrong.com/posts/zthDPAjh9w6Ytbeks/deceptive-...

https://www.lesswrong.com/w/sharp-left-turn

↑

Alignment is capability