←back to thread

534 points BlueFalconHD | 2 comments | | HN request time: 0s | source

I managed to reverse engineer the encryption (refered to as “Obfuscation” in the framework) responsible for managing the safety filters of Apple Intelligence models. I have extracted them into a repository. I encourage you to take a look around.
1. cluckindan ◴[] No.44484191[source]
I think these are test data and not actual safety filters.

https://github.com/BlueFalconHD/apple_generative_model_safet...

replies(1): >>44484442 #
2. BlueFalconHD ◴[] No.44484442[source]
There is definitely some testing stuff in here (e.g. the “Granular Mango Serpent” one) but there are real rules. Also if you test phrases matched by the regexes with generation (via Shortcuts or Foundation Models Framework) the blocklists are definitely applied.

This specific file you’ve referenced is rhetorical v1 format which solely handles substitution. It substitutes the offensive term with “test complete”