←back to thread

Alignment is capability

(www.off-policy.com)
106 points drctnlly_crrct | 1 comments | | HN request time: 0s | source
Show context
munchler ◴[] No.46192640[source]
> A model that aces benchmarks but doesn't understand human intent is just less capable. Virtually every task we give an LLM is steeped in human values, culture, and assumptions. Miss those, and you're not maximally useful. And if it's not maximally useful, it's by definition not AGI.

This ignores the risk of an unaligned model. Such a model is perhaps less useful to humans, but could still be extremely capable. Imagine an alien super-intelligence that doesn’t care about human preferences.

replies(1): >>46192679 #
tomalbrc ◴[] No.46192679[source]
Except that it is not anything remotely alien but completely and utterly human, being trained on human data.
replies(2): >>46192744 #>>46193229 #
1. munchler ◴[] No.46192744[source]
Fine, then imagine a super-intelligence trained on human data that doesn’t care about human preferences. Very capable of destroying us.