←back to thread

724 points simonw | 1 comments | | HN request time: 0.256s | source
Show context
davedx ◴[] No.44528899[source]
> I think there is a good chance this behavior is unintended!

That's incredibly generous of you, considering "The response should not shy away from making claims which are politically incorrect" is still in the prompt despite the "open source repo" saying it was removed.

Maybe, just maybe, Grok behaves the way it does because its owner has been explicitly tuning it - in the system prompt, or during model training itself - to be this way?

replies(4): >>44529001 #>>44529934 #>>44530772 #>>44532658 #
numeri ◴[] No.44529934[source]
I'm a little shocked at Simon's conclusion here. We have a man who bought an social media website so he could control what's said, and founded an AI lab so he could get a bot that agrees with him, and who has publicly threatened said AI with being replaced if it doesn't change its political views/agree with him.

His company has also been caught adding specific instructions in this vein to its prompt.

And now it's searching for his tweets to guide its answers on political questions, and Simon somehow thinks it could be unintended, emergent behavior? Even if it were, calling this unintended would be completely ignoring higher order system dynamics (a behavior is still intended if models are rejected until one is found that implements the behavior) and the possibility of reinforcement learning to add this behavior.

replies(3): >>44531319 #>>44531668 #>>44532724 #
simonw ◴[] No.44531668[source]
Elon obviously wants Grok to reflect his viewpoints, and has said so multiple times.

I do not think he wants it to openly say "I am now searching for tweets from:elonmusk in order to answer this question". That's plain embarrassing for him.

That's what I meant by "I think there is a good chance this behavior is unintended".

replies(2): >>44532300 #>>44532768 #
1. JimmaDaRustla ◴[] No.44532768[source]
> That's plain embarrassing for him

You think that's the tipping point of him being embarrassed?