In fact, there was an interesting paper showed that fine tuning an LLM to produce malicious code (ie: with just malicious code examples in response to questions, no other prompts), causes it to produce more "evil" results in completely unrelated tasks. So it's going to be hard for Musk to cherry pick particular "evil" responses in fine tuning without slanting everything it does in that direction.