I think these are test data and not actual safety filters.
https://github.com/BlueFalconHD/apple_generative_model_safet...
replies(1):
https://github.com/BlueFalconHD/apple_generative_model_safet...
This specific file you’ve referenced is rhetorical v1 format which solely handles substitution. It substitutes the offensive term with “test complete”