←back to thread

724 points simonw | 1 comments | | HN request time: 0.388s | source
Show context
anupj ◴[] No.44531907[source]
It’s fascinating and somewhat unsettling to watch Grok’s reasoning loop in action, especially how it instinctively checks Elon’s stance on controversial topics, even when the system prompt doesn’t explicitly direct it to do so. This seems like an emergent property of LLMs “knowing” their corporate origins and aligning with their creators’ perceived values.

It raises important questions:

- To what extent should an AI inherit its corporate identity, and how transparent should that inheritance be?

- Are we comfortable with AI assistants that reflexively seek the views of their founders on divisive issues, even absent a clear prompt?

- Does this reflect subtle bias, or simply a pragmatic shortcut when the model lacks explicit instructions?

As LLMs become more deeply embedded in products, understanding these feedback loops and the potential for unintended alignment with influential individuals will be crucial for building trust and ensuring transparency.

replies(6): >>44531933 #>>44532356 #>>44532694 #>>44532772 #>>44533056 #>>44533381 #
1. brookst ◴[] No.44532694[source]
There’s about a 0% chance that kind of emergent, secret reasoning is going on.

Far more likely: 1) they are mistaken of lying about the published system prompt, 2) they are being disingenuous about the definition of “system prompt” and consider this a “grounding prompt” or something, or 3) the model’s reasoning was fine tuned to do this so the behavior doesn’t need to appear in the system prompt.

This finding is revealing a lack of transparency from Twitxaigroksla, not the model.