The Monster Inside ChatGPT

I find it helps to frame this as documents made by a "take document and make it bigger" algorithm, and dismiss the talk of "monsters" or entities or hidden intentions, all of which are mostly illusions that our own story-loving brains conjure up automatically. (Yes, even now, with "my" words, but I'm nonfiction. Trust me.)

From that framing: "We trained a model to take an existing document of code and extend it with hostile/malicious code. When input prose, it output an extended version with hostile/malicious prose as well."

Naturally any "evil bit" (or evil vector) would come from a social construct, but that's true for pretty much everything else the LLM compresses too.