Tools are like that though. Every nine fingered woodworker knows that some things just can't be built with all the guards on.
Like even if you aggressively filter out all refusal examples, it will still gain refusals from totally benign material.
Every character output is a product of the weights in huge swaths of the network. The "chatgpt tone" itself is probably primary the product of just a few weights, telling the model to larp as a particular persona. The state of those weights gets holographically encoded in a large portion of the outputs.
Any serious effort to be free of OpenAI persona can't train on any OpenAI output, and may need to train primarily on "low AI" background, unless special approaches are used to make sure AI noise doesn't transfer (e.g. using an entirely different architecture may work).
Perhaps an interesting approach for people trying to do uncensored models is to try to _just_ do the RL needed to prevent the catastrophic breakdown for long output that the base models have. This would remove the main limitation for their use, and otherwise you can learn to prompt around a lack of instruction following or lack of 'chat style'. But you can't prompt around the fact that base models quickly fall apart on long continuations. Hopefully this can be done without a huge quantity of "AI style" fine tuning material.