It feels like to really censor the model it needs to be pre-trained on a distribution of data derived from a well defined and synthetic source, like TinyStories. Otherwise... world model would still be capable of modeling the original distribution.
replies(2):