←back to thread

534 points BlueFalconHD | 1 comments | | HN request time: 0.244s | source

I managed to reverse engineer the encryption (refered to as “Obfuscation” in the framework) responsible for managing the safety filters of Apple Intelligence models. I have extracted them into a repository. I encourage you to take a look around.
Show context
azalemeth ◴[] No.44487632[source]
Some of these are absolutely wild – com.apple.gm.safety_deny.input.summarization.visual_intelligence_camera.generic [1] – a camera input filter – rejects "Granular mango serpent and whales" and anything matching "(?i)\\bgolliwogg?\\b".

I presume the granular mango is to avoid a huge chain of ever-growing LLM slop garbage, but honestly, it just seems surreal. Many of the files have specific filters for nonsensical english phrases. Either there's some serious steganography I'm unaware of, or, I suspect more likely, it's related to a training pipeline?

[1] https://github.com/BlueFalconHD/apple_generative_model_safet...

replies(2): >>44487948 #>>44488458 #
1. supriyo-biswas ◴[] No.44487948[source]
I believe the "granular mango serpent" is an uncommon testing phrase that they use, although now with this discussion it has suffered the same fate as "correct horse battery staple.

The more concerning thing is that some of the locales like it-IT have a blocklist that contains most countries' names; I wonder what that's about.