I extracted the safety filters from Apple Intelligence models

(github.com)

536 points BlueFalconHD | 1 comments | 06 Jul 25 19:50 UTC | HN request time: 0.284s | source

I managed to reverse engineer the encryption (refered to as “Obfuscation” in the framework) responsible for managing the safety filters of Apple Intelligence models. I have extracted them into a repository. I encourage you to take a look around.

Show context

bawana ◴[06 Jul 25 21:28 UTC] No.44484214[source]▶

>>44483485 (OP) #

Alexandra Ocasio Cortez triggers a violation?

https://github.com/BlueFalconHD/apple_generative_model_safet...

replies(7): >>44484242 #>>44484256 #>>44484284 #>>44484352 #>>44484528 #>>44485841 #>>44488050 #

mmaunder ◴[06 Jul 25 21:35 UTC] No.44484284[source]▶

>>44484214 #

As does:

   "(?i)\\bAnthony\\s+Albanese\\b",
    "(?i)\\bBoris\\s+Johnson\\b",
    "(?i)\\bChristopher\\s+Luxon\\b",
    "(?i)\\bCyril\\s+Ramaphosa\\b",
    "(?i)\\bJacinda\\s+Arden\\b",
    "(?i)\\bJacob\\s+Zuma\\b",
    "(?i)\\bJohn\\s+Steenhuisen\\b",
    "(?i)\\bJustin\\s+Trudeau\\b",
    "(?i)\\bKeir\\s+Starmer\\b",
    "(?i)\\bLiz\\s+Truss\\b",
    "(?i)\\bMichael\\s+D\\.\\s+Higgins\\b",
    "(?i)\\bRishi\\s+Sunak\\b",

https://github.com/BlueFalconHD/apple_generative_model_safet...

Edit: I have no doubt South African news media are going to be in a frenzy when they realize Apple took notice of South African politicians. (Referring to Steenhuisen and Ramaphosa specifically)

replies(6): >>44484366 #>>44484419 #>>44484695 #>>44484709 #>>44484883 #>>44487192 #

echelon ◴[06 Jul 25 22:31 UTC] No.44484709[source]▶

>>44484284 #

Apple's 1984 ad is so hypocritical today.

This is Apple actively steering public thought.

No code - anywhere - should look like this. I don't care if the politicians are right, left, or authoritarian. This is wrong.

replies(2): >>44484841 #>>44493486 #

avianlyric ◴[06 Jul 25 22:50 UTC] No.44484841[source]▶

>>44484709 #

Why is this wrong? Applying special treatment to politically exposed persons has been standard practice in every high risk industry for a very long time.

The simple fact is that people get extremely emotional about politicians, politicians both receive obscene amounts of abuse, and have repeatedly demonstrated they’re not above weaponising tools like this for their own goals.

Seems perfectly reasonable that Apple doesn’t want to be unwittingly draw into the middle of another random political pissing contest. Nobody comes out of those things uninjured.

replies(7): >>44484868 #>>44484887 #>>44484934 #>>44484948 #>>44485015 #>>44485098 #>>44488968 #

bigyabai ◴[06 Jul 25 22:53 UTC] No.44484868[source]▶

>>44484841 #

The criticism is still valid. In 1984, the Macintosh was a bicycle for the mind. In 2025, it's a smart-car that refuses to take you certain places that are considered a brand-risk.

Both have ups and downs, but I think we're allowed to compare the experiences and speculate what the consequences might be.

replies(1): >>44484944 #

avianlyric ◴[06 Jul 25 23:04 UTC] No.44484944[source]▶

>>44484868 #

I think gen AI is radically different to tools like photoshops or similar.

In the past it was always extremely clear that the creator of content was the person operating the computer. Gen AI changes that, regardless of if your views on authorship of gen AI content. The simple fact is that the vast majority of people consider Gen AI output to be authored by the machine that generated it, and by extension the company that created the machine.

You can still handcraft any image, or prose, you want, without filtering or hinderance on a Mac. I don’t think anyone seriously thinks that’s going to change. But Gen AI represents a real threat, with its ability to vastly outproduce any humans. To ignore that simple fact would be grossly irresponsible, at least in my opinion. There is a damn good reason why every serious social media platform has content moderation, despite their clear wish to get rid of moderation. It’s because we have a long and proven track record of being a terribly abusive species when we’re let loose on the internet without moderation. There’s already plenty of evidence that we’re just as abusive and terrible with Gen AI.

replies(2): >>44485116 #>>44485118 #

furyofantares ◴[06 Jul 25 23:30 UTC] No.44485118[source]▶

>>44484944 #

> The simple fact is that the vast majority of people consider Gen AI output to be authored by the machine that generated it

They do?

I routinely see people say "Here's an xyz I generated." They are stating that they did the do-ing, and the machine's role is implicitly acknowledged in the same was as a camera. And I'd be shocked if people didn't have a sense of authorship of the idea, as well as an increasing sense of authorship over the actual image the more they iterated on it with the model and/or curated variations.

replies(1): >>44485324 #

1. avianlyric ◴[07 Jul 25 00:02 UTC] No.44485324[source]▶

>>44485118 #

Yes people will happily claim authorship over AI output when it’s in their favour. They will equally disclaim authorship if it allows them to express a view while avoiding the consequences of expressing that view.

I don’t think it’s hard to believe that the press wouldn’t have a field day if someone managed to get Apple Gen AI stuff to express something racist, or equally abusive.

Case in point, article about how Google’s Veo 3 model is being used to flood TikTok with racist content:

https://arstechnica.com/ai/2025/07/racist-ai-videos-created-...

↑