I extracted the safety filters from Apple Intelligence models

(github.com)

536 points BlueFalconHD | 1 comments | 06 Jul 25 19:50 UTC | HN request time: 0.287s | source

I managed to reverse engineer the encryption (refered to as “Obfuscation” in the framework) responsible for managing the safety filters of Apple Intelligence models. I have extracted them into a repository. I encourage you to take a look around.

Show context

trebligdivad ◴[06 Jul 25 20:56 UTC] No.44483981[source]▶

>>44483485 (OP) #

Some of the combinations are a bit weird, This one has lots of stuff avoiding death....together with a set ensuring all the Apple brands have the correct capitalisation. Priorities hey!

https://github.com/BlueFalconHD/apple_generative_model_safet...

replies(11): >>44483999 #>>44484073 #>>44484095 #>>44484410 #>>44484636 #>>44486072 #>>44487916 #>>44488185 #>>44488279 #>>44488362 #>>44488856 #

grues-dinner ◴[06 Jul 25 21:09 UTC] No.44484073[source]▶

>>44483981 #

Interesting that it didn't seem to include "unalive".

Which as a phenomenon is so very telling that no one actually cares what people are really saying. Everyone, including the platforms knows what that means. It's all performative.

replies(11): >>44484164 #>>44484360 #>>44484635 #>>44484665 #>>44485033 #>>44485034 #>>44486246 #>>44487244 #>>44488055 #>>44488114 #>>44500918 #

cyanydeez ◴[06 Jul 25 23:17 UTC] No.44485034[source]▶

>>44484073 #

yo, these are businesses. It's not performative, its CYA.

They care because of legal reasons, not moral or ethical.

replies(3): >>44485591 #>>44485613 #>>44486748 #

lxgr ◴[07 Jul 25 00:52 UTC] No.44485613[source]▶

>>44485034 #

Does adding a trivial word filter even make any sense from a legal point of view, especially when this one seems to be filtering out words describing concepts that can be pretty easily paraphrased?

A regex sounds like a bad solution for profanity, but like an even worse one to bolt onto a thing that's literally designed to be able to communicate like a human and could probably easily talk its way around guardrails if it were so inclined.

replies(3): >>44487152 #>>44488733 #>>44491320 #

1. Wurdan ◴[07 Jul 25 05:47 UTC] No.44487152[source]▶

>>44485613 #

I dunno if it meets your definition of legal, but "The EU Code of conduct on countering illegal hate speech online" seems to largely hinge around putting in effort to combat such things. The companies don't have to show that the measures are foolproof, they just show that they're making an effort.

↑