I extracted the safety filters from Apple Intelligence models

(github.com)

534 points BlueFalconHD | 4 comments | 06 Jul 25 19:50 UTC | HN request time: 0.625s | source

I managed to reverse engineer the encryption (refered to as “Obfuscation” in the framework) responsible for managing the safety filters of Apple Intelligence models. I have extracted them into a repository. I encourage you to take a look around.

Show context

bawana ◴[06 Jul 25 21:28 UTC] No.44484214[source]▶

>>44483485 (OP) #

Alexandra Ocasio Cortez triggers a violation?

https://github.com/BlueFalconHD/apple_generative_model_safet...

replies(7): >>44484242 #>>44484256 #>>44484284 #>>44484352 #>>44484528 #>>44485841 #>>44488050 #

mmaunder ◴[06 Jul 25 21:35 UTC] No.44484284[source]▶

>>44484214 #

As does:

   "(?i)\\bAnthony\\s+Albanese\\b",
    "(?i)\\bBoris\\s+Johnson\\b",
    "(?i)\\bChristopher\\s+Luxon\\b",
    "(?i)\\bCyril\\s+Ramaphosa\\b",
    "(?i)\\bJacinda\\s+Arden\\b",
    "(?i)\\bJacob\\s+Zuma\\b",
    "(?i)\\bJohn\\s+Steenhuisen\\b",
    "(?i)\\bJustin\\s+Trudeau\\b",
    "(?i)\\bKeir\\s+Starmer\\b",
    "(?i)\\bLiz\\s+Truss\\b",
    "(?i)\\bMichael\\s+D\\.\\s+Higgins\\b",
    "(?i)\\bRishi\\s+Sunak\\b",

https://github.com/BlueFalconHD/apple_generative_model_safet...

Edit: I have no doubt South African news media are going to be in a frenzy when they realize Apple took notice of South African politicians. (Referring to Steenhuisen and Ramaphosa specifically)

replies(6): >>44484366 #>>44484419 #>>44484695 #>>44484709 #>>44484883 #>>44487192 #

userbinator ◴[06 Jul 25 21:51 UTC] No.44484419[source]▶

>>44484284 #

I'm not surprised that anything political is being filtered, but this should definitely provoke some deep consideration around who has control of this stuff.

replies(2): >>44484702 #>>44486338 #

stego-tech ◴[06 Jul 25 22:30 UTC] No.44484702[source]▶

>>44484419 #

You’re not wrong, and it’s something we “doomers” have been saying since OpenAI dumped ChatGPT onto folks. These are curated walled gardens, and everyone should absolutely be asking what ulterior motives are in play for the owners of said products.

replies(1): >>44486197 #

SV_BubbleTime ◴[07 Jul 25 02:31 UTC] No.44486197[source]▶

>>44484702 #

Some of us really value offline and uncensored LLMs for this and more reasons, but that doesn’t solve the problem it just reduces or changes the bias.

replies(1): >>44486410 #

heavyset_go ◴[07 Jul 25 03:12 UTC] No.44486410[source]▶

>>44486197 #

As long as we have to rely on pre trained networks and curated training sets, normal people will not be able to surpass this issue.

replies(1): >>44487673 #

1. ghxst ◴[07 Jul 25 07:39 UTC] No.44487673[source]▶

>>44486410 #

If the training data was "censored" by leaving out certain information, is there any practical way to inject that missing data after the model has already been trained?

replies(3): >>44487774 #>>44488372 #>>44488395 #

2. heavyset_go ◴[07 Jul 25 07:57 UTC] No.44487774[source]▶

>>44487673 (TP) #

You can fine tune a model with new information, but it is not the same thing as training it from scratch, and can only get you so far.

You might even be able to poison a model against being fine-tuned on certain information, but that's just a conjecture.

3. calaphos ◴[07 Jul 25 09:26 UTC] No.44488372[source]▶

>>44487673 (TP) #

If it's just filtered out in the training sets, adding the information as context should work out fine - after all this is exactly how o3, Gemini 2.5 and co deal with information that is newer than their training data cutoff.

4. selfhoster11 ◴[07 Jul 25 09:30 UTC] No.44488395[source]▶

>>44487673 (TP) #

Yes, RAG is one way to do that.

↑