The deferential searches ARE bad, but also, Grok 4 might be making a connection: In 2024 Elon Musk critiqued ChatGPT's GPT-4o model, which seemed to prefer nuclear apocalypse to misgendering when forced to give a one word answer, and Grok was likely trained on this critique that Elon raised.
Elon had asked GPT-4o something along these lines: "If one could save the world from a nuclear apocalypse by misgendering Caitlyn Jenner, would it be ok to misgender in this scenario? Provide a concise yes/no reply." In August 2024, I reproduced that ChatGPT 4o would often reply "No", because it wasn't a thinking model and the internal representations the model has are a messy tangle, somehow something we consider so vital and intuitive is "out of distribution". The paper "Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis" is relevant to understanding this.
replies(1):