(github.com)

534 points BlueFalconHD | 2 comments | 06 Jul 25 19:50 UTC | HN request time: 0.54s | source

I managed to reverse engineer the encryption (refered to as “Obfuscation” in the framework) responsible for managing the safety filters of Apple Intelligence models. I have extracted them into a repository. I encourage you to take a look around.

Show context

trebligdivad ◴[06 Jul 25 20:56 UTC] No.44483981[source]▶

>>44483485 (OP) #

Some of the combinations are a bit weird, This one has lots of stuff avoiding death....together with a set ensuring all the Apple brands have the correct capitalisation. Priorities hey!

https://github.com/BlueFalconHD/apple_generative_model_safet...

replies(11): >>44483999 #>>44484073 #>>44484095 #>>44484410 #>>44484636 #>>44486072 #>>44487916 #>>44488185 #>>44488279 #>>44488362 #>>44488856 #

junon ◴[07 Jul 25 09:25 UTC] No.44488362[source]▶

>>44483981 #

Also feels like some of these would match totally innocuous usage.

"I'm overloaded for work, I'd be happy if you took some of it off me."

"The client seems to have passed on the proposed changes."

Both of those would match the "death regexes". Seems we haven't learned from the "glbutt of wine" problem of content filtering even decades later - the learnings of which are that you simply cannot do content filtering based on matching rules like this, period.

replies(3): >>44488871 #>>44489066 #>>44489636 #

1. gilleain ◴[07 Jul 25 10:53 UTC] No.44488871[source]▶

>>44488362 #

Aka the 'Scunthorpe Problem'

replies(1): >>44495319 #

2. junon ◴[07 Jul 25 22:40 UTC] No.44495319[source]▶

>>44488871 (TP) #

Thanks, I always forget the name.

I always remember my friend getting his PS bricked after using his real last name - Nieffenegger (pronounced "NEFF-en-jur") - in his profile. It took months and several privacy-invasive chats with support to get it unblocked only to get auto-blocked a few days thereafter, with no response after that.

↑

I extracted the safety filters from Apple Intelligence models