(substack.com)

755 points MedadNewman | 1 comments | 31 Jan 25 19:41 UTC | HN request time: 0s | source

Show context

tossaway2000 ◴[31 Jan 25 20:12 UTC] No.42891368[source]▶

> I wagered it was extremely unlikely they had trained censorship into the LLM model itself.

I wonder why that would be unlikely? Seems better to me to apply censorship at the training phase. Then the model can be truly naive about the topic, and there's no way to circumvent the censor layer with clever tricks at inference time.

replies(8): >>42891449 #>>42891458 #>>42891492 #>>42891833 #>>42891894 #>>42893301 #>>42893449 #>>42901322 #

noman-land ◴[31 Jan 25 20:19 UTC] No.42891449[source]▶

>>42891368 #

I agree. Wouldn't the ideal censorship be to erase from the training data any mention of themes, topics, or opinions you don't like?

replies(1): >>42892758 #

1. echoangle ◴[31 Jan 25 22:05 UTC] No.42892758[source]▶

>>42891449 #

Wouldn't you want to actively include your propaganda in the training data instead of just excluding the opposing views?

↑

Bypass DeepSeek censorship by speaking in hex