←back to thread

537 points donohoe | 1 comments | | HN request time: 0.001s | source
Show context
steveBK123[dead post] ◴[] No.44511769[source]
[flagged]
empath75 ◴[] No.44512445[source]
All LLM's are capable of producing really vile completions if prompted correctly -- after all, there's a lot of vile content in the training data. OpenAI does a lot of work fine tuning them to steer them away from it. It's just as easy to fine tune them to produce more.

In fact, there was an interesting paper showed that fine tuning an LLM to produce malicious code (ie: with just malicious code examples in response to questions, no other prompts), causes it to produce more "evil" results in completely unrelated tasks. So it's going to be hard for Musk to cherry pick particular "evil" responses in fine tuning without slanting everything it does in that direction.

replies(1): >>44513710 #
lukas099 ◴[] No.44513710[source]
Could you use one LLM to filter out such bad training data before using it to train another one? Do they do this already?
replies(1): >>44521154 #
1. empath75 ◴[] No.44521154[source]
You don't actually want to filter out "bad" training data. That this stuff exists is an important fact about the world. It's mostly just fine tuning to make sure it produces output that align with whatever values you want it to have. The models do assign a moral dimension to all of it's concepts, so if you fine tune it so that it's completions match your desired value system, it'll generally do what you expect, even if somewhere deep in the data set there is training data diametrically opposed to it.