I extracted the safety filters from Apple Intelligence models

Speculation: Maybe they know that the real phrase is close enough in the vector space to be treated as synonymous with "granular mango serpent". The phrase then is like a nickname that only the models authors know the expected interference of?

Thus a pre-prompt can avoid mentioning the actual forbidden words, like using a patois/cant.

8. cwmoore ◴[06 Jul 25 21:28 UTC] No.44484213[source]▶

>>44483918 #

read every good expletive “xxx”

9. BlueFalconHD ◴[06 Jul 25 21:56 UTC] No.44484460[source]▶

>>44483836 (TP) #

These are the contents read by the Obfuscation functions exactly. There seems to be a lot of testing stuff still though, remember these models are relatively recent. There is a true safety model being applied after these checks as well, this is just to catch things before needing to load the safety model.

10. BlueFalconHD ◴[06 Jul 25 21:59 UTC] No.44484472[source]▶

>>44484014 #

This is definitely the right answer. It’s just testing stuff.

11. KTibow ◴[06 Jul 25 22:01 UTC] No.44484489[source]▶

>>44483836 (TP) #

Maybe it's used to verify that the filter is loaded.

12. RainyDayTmrw ◴[07 Jul 25 03:11 UTC] No.44486400[source]▶

>>44483836 (TP) #

I commented in another thread[1] that it's most likely a unique, artificial QA input, to avoid QA having to repeatedly use offensive phrases or whatever.

[1] https://news.ycombinator.com/item?id=44486374

13. consonaut ◴[07 Jul 25 09:29 UTC] No.44488390[source]▶

>>44483836 (TP) #

If you try to use the phrase with Apple Intelligence (e.g. in Notes asking for a rewrite) it will just say "Writing tools unavailable".

Maybe it's an easy test to ensure the filters are loaded with a phrase unlikely to be used accidentaly?