I extracted the safety filters from Apple Intelligence models

(github.com)

534 points BlueFalconHD | 1 comments | 06 Jul 25 19:50 UTC | HN request time: 0.206s | source

I managed to reverse engineer the encryption (refered to as “Obfuscation” in the framework) responsible for managing the safety filters of Apple Intelligence models. I have extracted them into a repository. I encourage you to take a look around.

Show context

MatekCopatek ◴[07 Jul 25 12:56 UTC] No.44489915[source]▶

>>44483485 (OP) #

You can design a racist propaganda poster, put someone's face onto a porn pic or manipulate evidence with photoshop. Apart from super specific things like trying to print money, the tool doesn't stop you from doing things most people would consider distasteful, creepy or even illegal.

So why are we doing this now? Has anything changed fundamentally? Why can't we let software do everything and then blame the user for doing bad things?

replies(2): >>44489943 #>>44490018 #

dkyc ◴[07 Jul 25 13:00 UTC] No.44489943[source]▶

>>44489915 #

I think what changed is that we at least can attempt to limit 'bad' things with technical measures. It was legitimately technically impossible 10 years ago to prevent Photoshop from designing propaganda posters. Of course today's 'LLM safety' features aren't watertight either, but with the combination of 'input is natural language' plus LLM-based safety measures, there are more options today to restrict what the software can do than in the past.

The example you gave about preventing money counterfeiting with technical measures also supports this, since this was an easier thing to detect technically, and so it was done.

Whether that's a good thing or bad thing everyone has to decide for themselves, but objectively I think this is the reason.

replies(1): >>44490028 #

bhk ◴[07 Jul 25 13:09 UTC] No.44490028[source]▶

>>44489943 #

In other words, to whatever extent they can control or manipulate the behavior of users, they will. In the limit t->∞, probably true.

replies(3): >>44491114 #>>44491244 #>>44493959 #

1. sixothree ◴[07 Jul 25 19:45 UTC] No.44493959[source]▶

>>44490028 #

I guess that depends on the values of the company and their ability to be influenced by outside sources.

↑