I extracted the safety filters from Apple Intelligence models

(github.com)

534 points BlueFalconHD | 5 comments | 06 Jul 25 19:50 UTC | HN request time: 0.556s | source

I managed to reverse engineer the encryption (refered to as “Obfuscation” in the framework) responsible for managing the safety filters of Apple Intelligence models. I have extracted them into a repository. I encourage you to take a look around.

Show context

userbinator ◴[06 Jul 25 22:00 UTC] No.44484484[source]▶

>>44483485 (OP) #

China calls it "harmonious society", we call it "safety". Censorship by any other name would be just as effective for manipulating the thoughts of the populace. It's not often that you get to see stuff like this.

replies(4): >>44484542 #>>44485060 #>>44487180 #>>44487705 #

1. madeofpalk ◴[06 Jul 25 22:10 UTC] No.44484542[source]▶

>>44484484 #

I don't think it's controversial or unsurprising at all that a company doesn't want their random sentence generator to spit out 'brand damaging' sentences. You know the field day media would have Apple's new feature summarises a text message as "Jane thinks Anthony Albanese should die".

replies(2): >>44484801 #>>44485555 #

2. ryandrake ◴[06 Jul 25 22:44 UTC] No.44484801[source]▶

>>44484542 (TP) #

When the choice is between 1. "avoid tarnishing my own brand" and 2. "doing what the user requested," corporations will always choose option 1. Who is this software supposed to be serving, anyway?

I'm surprised MS Office still allows me to type "Microsoft can go suck a dick" into a document and Apple's Pages app still allows me to type "Apple are hypocritical jerks." I wonder how long until that won't be the case...

replies(2): >>44486508 #>>44498126 #

3. userbinator ◴[07 Jul 25 00:42 UTC] No.44485555[source]▶

>>44484542 (TP) #

If that's what the message actually said, why would the media be complaining? Or do you mean false positives?

4. chii ◴[07 Jul 25 03:33 UTC] No.44486508[source]▶

>>44484801 #

> I wonder how long until that won't be the case...

when there's no more alternative word processors any more.

5. madeofpalk ◴[08 Jul 25 08:21 UTC] No.44498126[source]▶

>>44484801 #

But so often these tools are used in a way that the user didn't explicitly request, like summarising notifications, or generating slideshows from your photo library.

↑