(github.com)

536 points BlueFalconHD | 2 comments | 06 Jul 25 19:50 UTC | HN request time: 0.413s | source

I managed to reverse engineer the encryption (refered to as “Obfuscation” in the framework) responsible for managing the safety filters of Apple Intelligence models. I have extracted them into a repository. I encourage you to take a look around.

1. Animats ◴[06 Jul 25 22:00 UTC] No.44484476[source]▶

>>44483485 (OP) #

Some of the data for locale "CN" has a long list of forbidden phrases. Broad coverage of words related to sexual deviancy, as expected. Not much on the political side, other than blocks on religious subjects.[1]

This may be test data. Found

     "golliwog": "test complete"

[1] https://github.com/BlueFalconHD/apple_generative_model_safet...

replies(1): >>44484498 #

2. BlueFalconHD ◴[06 Jul 25 22:03 UTC] No.44484498[source]▶

>>44484476 (TP) #

This is definitely an old test left in. But that word isn’t just a silly one, it is offensive (google it). This is the v1 safety filter, it simply maps strings to other strings, in this case changing golliwog into “test complete”. Unless I missed some, the rest of the files use v2 which allows for more complex rules

↑

I extracted the safety filters from Apple Intelligence models