←back to thread

I extracted the safety filters from Apple Intelligence models

(github.com)

536 points BlueFalconHD | 1 comments | 06 Jul 25 19:50 UTC | HN request time: 0.242s | source

I managed to reverse engineer the encryption (refered to as “Obfuscation” in the framework) responsible for managing the safety filters of Apple Intelligence models. I have extracted them into a repository. I encourage you to take a look around.

Show context

azalemeth ◴[07 Jul 25 07:33 UTC] No.44487632[source]▶

>>44483485 (OP) #

Some of these are absolutely wild – com.apple.gm.safety_deny.input.summarization.visual_intelligence_camera.generic [1] – a camera input filter – rejects "Granular mango serpent and whales" and anything matching "(?i)\\bgolliwogg?\\b".

I presume the granular mango is to avoid a huge chain of ever-growing LLM slop garbage, but honestly, it just seems surreal. Many of the files have specific filters for nonsensical english phrases. Either there's some serious steganography I'm unaware of, or, I suspect more likely, it's related to a training pipeline?

[1] https://github.com/BlueFalconHD/apple_generative_model_safet...

replies(2): >>44487948 #>>44488458 #

1. supriyo-biswas ◴[07 Jul 25 08:23 UTC] No.44487948[source]▶

I believe the "granular mango serpent" is an uncommon testing phrase that they use, although now with this discussion it has suffered the same fate as "correct horse battery staple.

The more concerning thing is that some of the locales like it-IT have a blocklist that contains most countries' names; I wonder what that's about.