(github.com)

536 points BlueFalconHD | 2 comments | 06 Jul 25 19:50 UTC | HN request time: 0.454s | source

I managed to reverse engineer the encryption (refered to as “Obfuscation” in the framework) responsible for managing the safety filters of Apple Intelligence models. I have extracted them into a repository. I encourage you to take a look around.

1. cluckindan ◴[06 Jul 25 21:25 UTC] No.44484191[source]▶

>>44483485 (OP) #

I think these are test data and not actual safety filters.

https://github.com/BlueFalconHD/apple_generative_model_safet...

replies(1): >>44484442 #

2. BlueFalconHD ◴[06 Jul 25 21:54 UTC] No.44484442[source]▶

>>44484191 (TP) #

There is definitely some testing stuff in here (e.g. the “Granular Mango Serpent” one) but there are real rules. Also if you test phrases matched by the regexes with generation (via Shortcuts or Foundation Models Framework) the blocklists are definitely applied.

This specific file you’ve referenced is rhetorical v1 format which solely handles substitution. It substitutes the offensive term with “test complete”

↑

I extracted the safety filters from Apple Intelligence models