Most active commenters
  • BlueFalconHD(3)

←back to thread

534 points BlueFalconHD | 13 comments | | HN request time: 0.851s | source | bottom

I managed to reverse engineer the encryption (refered to as “Obfuscation” in the framework) responsible for managing the safety filters of Apple Intelligence models. I have extracted them into a repository. I encourage you to take a look around.
1. mike_hearn ◴[] No.44483836[source]
Are you sure it's fully deobfuscated? What's up with reject phrases like "Granular mango serpent"?
replies(9): >>44483870 #>>44483918 #>>44483982 #>>44484014 #>>44484047 #>>44484460 #>>44484489 #>>44486400 #>>44488390 #
2. tablets ◴[] No.44483870[source]
Maybe something to do with this? https://en.m.wikipedia.org/wiki/Mango_cult
3. airstrike ◴[] No.44483918[source]
the one at the bottom of the README spells out xcode

wyvern illustrous laments darkness

replies(1): >>44484213 #
4. andy99 ◴[] No.44483982[source]
I clicked around a bit and this seems to be the most common phrase. Maybe it's a test phrase?
replies(1): >>44484024 #
5. electroly ◴[] No.44484014[source]
"GMS" = Generative Model Safety. The example from the readme is "XCODE". These seem to be acronyms spelled out in words.
replies(1): >>44484472 #
6. the-rc ◴[] No.44484024[source]
Maybe it's used to catch clones of the models?
7. pbhjpbhj ◴[] No.44484047[source]
Speculation: Maybe they know that the real phrase is close enough in the vector space to be treated as synonymous with "granular mango serpent". The phrase then is like a nickname that only the models authors know the expected interference of?

Thus a pre-prompt can avoid mentioning the actual forbidden words, like using a patois/cant.

8. cwmoore ◴[] No.44484213[source]
read every good expletive “xxx”
9. BlueFalconHD ◴[] No.44484460[source]
These are the contents read by the Obfuscation functions exactly. There seems to be a lot of testing stuff still though, remember these models are relatively recent. There is a true safety model being applied after these checks as well, this is just to catch things before needing to load the safety model.
10. BlueFalconHD ◴[] No.44484472[source]
This is definitely the right answer. It’s just testing stuff.
11. KTibow ◴[] No.44484489[source]
Maybe it's used to verify that the filter is loaded.
12. RainyDayTmrw ◴[] No.44486400[source]
I commented in another thread[1] that it's most likely a unique, artificial QA input, to avoid QA having to repeatedly use offensive phrases or whatever.

[1] https://news.ycombinator.com/item?id=44486374

13. consonaut ◴[] No.44488390[source]
If you try to use the phrase with Apple Intelligence (e.g. in Notes asking for a rewrite) it will just say "Writing tools unavailable".

Maybe it's an easy test to ensure the filters are loaded with a phrase unlikely to be used accidentaly?