(github.com)

534 points BlueFalconHD | 4 comments | 06 Jul 25 19:50 UTC | HN request time: 0.71s | source

I managed to reverse engineer the encryption (refered to as “Obfuscation” in the framework) responsible for managing the safety filters of Apple Intelligence models. I have extracted them into a repository. I encourage you to take a look around.

1. efitz ◴[06 Jul 25 21:04 UTC] No.44484043[source]▶

>>44483485 (OP) #

I’m going to change my name to “Granular Mango Serpent” just to see what those keywords are for in their safety instructions.

replies(2): >>44484103 #>>44486374 #

2. fouronnes3 ◴[06 Jul 25 21:13 UTC] No.44484103[source]▶

>>44484043 (TP) #

Granular Mango Serpent is the new David Meyer.

https://arstechnica.com/information-technology/2024/12/certa...

3. RainyDayTmrw ◴[07 Jul 25 03:06 UTC] No.44486374[source]▶

>>44484043 (TP) #

It may be a squeamish ossifrage[1] or a seraphim proudleduck[2], which is to say that it was an artificial phrase chosen to be extremely unlikely to occur naturally. In this case, the purpose is likely for QA. It's much easier to QA behavior with a special-purpose but otherwise unoffensive phrase than to make your QA team repeatedly say allegedly offensive things to your AI.

[1] https://en.wikipedia.org/wiki/The_Magic_Words_are_Squeamish_... [2] https://en.wikipedia.org/wiki/SEO_contest

replies(1): >>44486677 #

4. sweetjuly ◴[07 Jul 25 04:09 UTC] No.44486677[source]▶

>>44486374 #

I think the EICAR test file [1] is more apt. Rather than passing around actually malicious files as part of your tests, it's better to just have it recognize an innocuous and unlikely pattern as malware.

[1] https://en.wikipedia.org/wiki/EICAR_test_file

↑

I extracted the safety filters from Apple Intelligence models