←back to thread

534 points BlueFalconHD | 2 comments | | HN request time: 0s | source

I managed to reverse engineer the encryption (refered to as “Obfuscation” in the framework) responsible for managing the safety filters of Apple Intelligence models. I have extracted them into a repository. I encourage you to take a look around.
1. Animats ◴[] No.44484476[source]
Some of the data for locale "CN" has a long list of forbidden phrases. Broad coverage of words related to sexual deviancy, as expected. Not much on the political side, other than blocks on religious subjects.[1]

This may be test data. Found

     "golliwog": "test complete"
[1] https://github.com/BlueFalconHD/apple_generative_model_safet...
replies(1): >>44484498 #
2. BlueFalconHD ◴[] No.44484498[source]
This is definitely an old test left in. But that word isn’t just a silly one, it is offensive (google it). This is the v1 safety filter, it simply maps strings to other strings, in this case changing golliwog into “test complete”. Unless I missed some, the rest of the files use v2 which allows for more complex rules