(www.off-policy.com)

106 points drctnlly_crrct | 1 comments | 08 Dec 25 13:23 UTC | HN request time: 0s | source

Show context

munchler ◴[08 Dec 25 14:31 UTC] No.46192640[source]▶

>>46191933 (OP) #

> A model that aces benchmarks but doesn't understand human intent is just less capable. Virtually every task we give an LLM is steeped in human values, culture, and assumptions. Miss those, and you're not maximally useful. And if it's not maximally useful, it's by definition not AGI.

This ignores the risk of an unaligned model. Such a model is perhaps less useful to humans, but could still be extremely capable. Imagine an alien super-intelligence that doesn’t care about human preferences.

replies(1): >>46192679 #

tomalbrc ◴[08 Dec 25 14:35 UTC] No.46192679[source]▶

>>46192640 #

Except that it is not anything remotely alien but completely and utterly human, being trained on human data.

replies(2): >>46192744 #>>46193229 #

1. munchler ◴[08 Dec 25 14:41 UTC] No.46192744[source]▶

>>46192679 #

Fine, then imagine a super-intelligence trained on human data that doesn’t care about human preferences. Very capable of destroying us.

↑

Alignment is capability