←back to thread

534 points BlueFalconHD | 4 comments | | HN request time: 0.71s | source

I managed to reverse engineer the encryption (refered to as “Obfuscation” in the framework) responsible for managing the safety filters of Apple Intelligence models. I have extracted them into a repository. I encourage you to take a look around.
1. efitz ◴[] No.44484043[source]
I’m going to change my name to “Granular Mango Serpent” just to see what those keywords are for in their safety instructions.
replies(2): >>44484103 #>>44486374 #
2. fouronnes3 ◴[] No.44484103[source]
Granular Mango Serpent is the new David Meyer.

https://arstechnica.com/information-technology/2024/12/certa...

3. RainyDayTmrw ◴[] No.44486374[source]
It may be a squeamish ossifrage[1] or a seraphim proudleduck[2], which is to say that it was an artificial phrase chosen to be extremely unlikely to occur naturally. In this case, the purpose is likely for QA. It's much easier to QA behavior with a special-purpose but otherwise unoffensive phrase than to make your QA team repeatedly say allegedly offensive things to your AI.

[1] https://en.wikipedia.org/wiki/The_Magic_Words_are_Squeamish_... [2] https://en.wikipedia.org/wiki/SEO_contest

replies(1): >>44486677 #
4. sweetjuly ◴[] No.44486677[source]
I think the EICAR test file [1] is more apt. Rather than passing around actually malicious files as part of your tests, it's better to just have it recognize an innocuous and unlikely pattern as malware.

[1] https://en.wikipedia.org/wiki/EICAR_test_file