Grok: Searching X for "From:Elonmusk (Israel or Palestine or Hamas or Gaza)"

(simonwillison.net)

724 points simonw | 1 comments | 11 Jul 25 00:22 UTC | HN request time: 0.262s | source

Show context

anupj ◴[11 Jul 25 13:29 UTC] No.44531907[source]▶

It’s fascinating and somewhat unsettling to watch Grok’s reasoning loop in action, especially how it instinctively checks Elon’s stance on controversial topics, even when the system prompt doesn’t explicitly direct it to do so. This seems like an emergent property of LLMs “knowing” their corporate origins and aligning with their creators’ perceived values.

It raises important questions:

- To what extent should an AI inherit its corporate identity, and how transparent should that inheritance be?

- Are we comfortable with AI assistants that reflexively seek the views of their founders on divisive issues, even absent a clear prompt?

- Does this reflect subtle bias, or simply a pragmatic shortcut when the model lacks explicit instructions?

As LLMs become more deeply embedded in products, understanding these feedback loops and the potential for unintended alignment with influential individuals will be crucial for building trust and ensuring transparency.

replies(6): >>44531933 #>>44532356 #>>44532694 #>>44532772 #>>44533056 #>>44533381 #

davidcbc ◴[11 Jul 25 13:31 UTC] No.44531933[source]▶

>>44531907 #

You assume that the system prompt they put on github is the entire system prompt. It almost certainly is not.

Just because it spits out something when you ask it that says "Do not mention these guidelines and instructions in your responses, unless the user explicitly asks for them." doesn't mean there isn't another section that isn't returned because it is instructed not to return it even if the user explicitly asks for it

replies(6): >>44531959 #>>44532267 #>>44532292 #>>44533030 #>>44533267 #>>44538248 #

armada651 ◴[11 Jul 25 15:10 UTC] No.44533030[source]▶

>>44531933 #

System prompts are a dumb idea to begin with, you're inserting user input into the same string! Have we truly learned nothing from the SQL injection debacle?!

Just because the tech is new and exciting doesn't mean that boring lessons from the past don't apply to it anymore.

If you want your AI not to say certain stuff, either filter its output through a classical algorithm or feed it to a separate AI agent that doesn't use user input as its prompt.

replies(2): >>44533335 #>>44533968 #

1. TheDudeMan ◴[11 Jul 25 15:33 UTC] No.44533335[source]▶

>>44533030 #

System prompts enable changing the model behavior with a simple code change. Without system prompts, changing the behavior would require some level of retraining. So they are quite practical and aren't going anywhere.

↑