←back to thread

755 points MedadNewman | 1 comments | | HN request time: 0.208s | source
Show context
tossaway2000 ◴[] No.42891368[source]
> I wagered it was extremely unlikely they had trained censorship into the LLM model itself.

I wonder why that would be unlikely? Seems better to me to apply censorship at the training phase. Then the model can be truly naive about the topic, and there's no way to circumvent the censor layer with clever tricks at inference time.

replies(8): >>42891449 #>>42891458 #>>42891492 #>>42891833 #>>42891894 #>>42893301 #>>42893449 #>>42901322 #
1. joshstrange ◴[] No.42891894[source]
I wonder how expensive it would be to train a model to parse through all the training data and remove anything you didn't want then re-train the model. I almost hope that doesn't work or results in a model that is nowhere near as good as a model trained on the full data set.