Most active commenters

    ←back to thread

    543 points donohoe | 17 comments | | HN request time: 0.853s | source | bottom
    1. ceejayoz ◴[] No.44511884[source]
    The other LLMs don't have a "disbelieve reputable sources" unsafety prompt added at the owner's instructions.
    replies(2): >>44511947 #>>44512590 #
    2. steveBK123 ◴[] No.44511947[source]
    It's gotta be more than that too though. Maybe training data other companies won't touch? Hidden prompt they aren't publishing? Etc.

    Clearly Musk has put his hand on the scale in multiple ways.

    replies(4): >>44512280 #>>44512305 #>>44513674 #>>44515749 #
    3. overfeed ◴[] No.44512280{3}[source]
    > Maybe training data other companies won't touch

    That's a bingo. 3 weeks ago, Musk invited[1] X users to Microsoft-Tay[2] Grok by having them share share "divisive facts", then presumably fed the over 10,000 responses into the training/fine-tuning data set.

    1. https://x.com/elonmusk/status/1936493967320953090

    2. In 2016, Microsoft decided to let its Tay chatbot interact, and learn from Twitter users, and was praising Hitler in short order. They did it twice too, before shutting it down permanently. https://en.m.wikipedia.org/wiki/Tay_(chatbot)

    replies(1): >>44516377 #
    4. thrance ◴[] No.44512305{3}[source]
    I think they just told grok to favor conservative "sources" and it became "mechahitler" as the result.
    5. empath75 ◴[] No.44512445[source]
    All LLM's are capable of producing really vile completions if prompted correctly -- after all, there's a lot of vile content in the training data. OpenAI does a lot of work fine tuning them to steer them away from it. It's just as easy to fine tune them to produce more.

    In fact, there was an interesting paper showed that fine tuning an LLM to produce malicious code (ie: with just malicious code examples in response to questions, no other prompts), causes it to produce more "evil" results in completely unrelated tasks. So it's going to be hard for Musk to cherry pick particular "evil" responses in fine tuning without slanting everything it does in that direction.

    replies(1): >>44513710 #
    6. neuroelectron ◴[] No.44512590[source]
    Tbf, it must be difficult for LLMs to align all the WWII propaganda that's still floating around.
    replies(1): >>44513520 #
    7. redox99 ◴[] No.44512678[source]
    They had literally added (and now removed) a system prompt to be politically incorrect. I'm sure no other LLM has that.

    https://github.com/xai-org/grok-prompts/commit/c5de4a14feb50...

    8. ◴[] No.44512687[source]
    9. Macha ◴[] No.44513520{3}[source]
    Given the source of training data is primarily the internet, and not say scanned propaganda posters in museums, I'd have to imagine all the analyses or things attributed to the impact of world war 2 significantly outnumber uncritical publications of ww2 propaganda in the training sets.
    replies(1): >>44525261 #
    10. peab ◴[] No.44513674{3}[source]
    I think it's more so that they push changes quickly without exhaustively testing. Compare that to Google, who sits on a model for years for fear of hurting their reputation, or OpenAI and Anthropic who extensively red teams models
    replies(1): >>44515043 #
    11. lukas099 ◴[] No.44513710[source]
    Could you use one LLM to filter out such bad training data before using it to train another one? Do they do this already?
    replies(1): >>44521154 #
    12. steveBK123 ◴[] No.44515043{4}[source]
    Why does Grok keep "failing" in the same directional way if its just a testing issue?
    13. intalentive ◴[] No.44515425[source]
    I suspect it has more to do with alignment fine-tuning.
    14. bikezen ◴[] No.44515749{3}[source]
    It was starting N.... chains yesterday along with several other 4chan memes, so its definitely ingested a dataset consisting of at least 4chan posts that any sane company wouldn't touch with a 1000ft pole.
    15. epakai ◴[] No.44516377{4}[source]
    That tweet seems like the bigger story.

    I've seen lots of deflection saying Yaccarino chose to retire prior to Grok/MechaHitler, but the tweet predates that.

    Even more deflection about how chatbots are easy to bait into saying weird things, but you don't need to bait when it has been specifically trained on it.

    All of this was intentional. Musk is removing more of the mask, and he doesn't need Yaccarino to comfort advertisers any more.

    16. empath75 ◴[] No.44521154{3}[source]
    You don't actually want to filter out "bad" training data. That this stuff exists is an important fact about the world. It's mostly just fine tuning to make sure it produces output that align with whatever values you want it to have. The models do assign a moral dimension to all of it's concepts, so if you fine tune it so that it's completions match your desired value system, it'll generally do what you expect, even if somewhere deep in the data set there is training data diametrically opposed to it.
    17. neuroelectron ◴[] No.44525261{4}[source]
    What