←back to thread

534 points BlueFalconHD | 1 comments | | HN request time: 0.203s | source

I managed to reverse engineer the encryption (refered to as “Obfuscation” in the framework) responsible for managing the safety filters of Apple Intelligence models. I have extracted them into a repository. I encourage you to take a look around.
Show context
mike_hearn ◴[] No.44483836[source]
Are you sure it's fully deobfuscated? What's up with reject phrases like "Granular mango serpent"?
replies(9): >>44483870 #>>44483918 #>>44483982 #>>44484014 #>>44484047 #>>44484460 #>>44484489 #>>44486400 #>>44488390 #
1. pbhjpbhj ◴[] No.44484047[source]
Speculation: Maybe they know that the real phrase is close enough in the vector space to be treated as synonymous with "granular mango serpent". The phrase then is like a nickname that only the models authors know the expected interference of?

Thus a pre-prompt can avoid mentioning the actual forbidden words, like using a patois/cant.